Summary

Overview

Work History

Education

Skills

Certification

Achievements Awards

Client Engagements

Roles And Responsibilities

Languages

Work Authorization

References

Timeline

Anton Puthenpurackal Jackson

Arlington Heights,IL

Summary

Dynamic PySpark Developer with proven expertise at Cognizant Technology Solutions, specializing in big data solutions and ETL processes. Recognized for innovative contributions that saved clients up to $200K. Proficient in data transformation and agile collaboration, driving project success through effective communication and technical excellence.

Overview

2

2

years of professional experience

2

2

Certification

Work History

PySpark Developer

Cognizant

04.2023 - 03.2025

Company Overview: Client: A Leading U.S.-Based Financial Services Company
Develop PySpark scripts for complex data transformations, cleansing, and aggregation on large datasets
Design and implement scalable big data solutions using PySpark and the Hadoop ecosystem (HDFS, Hive, Spark SQL)
Create and manage database schemas, tables, and views for optimized data processing
Optimize PySpark jobs for performance and scalability, ensuring efficient resource usage
Integrate PySpark applications with other data processing tools and platforms
Develop and execute unit tests to validate PySpark scripts
Conduct integration and performance testing to identify and resolve bottlenecks
Debug and troubleshoot PySpark applications using appropriate tools
Participate in code reviews to ensure adherence to best practices
Manage GitLab branches and merges, resolving code conflicts effectively
Work closely with data engineers, senior developers, and product owners to understand data requirements
Maintain comprehensive documentation for PySpark applications
Communicate regularly with stakeholders on project progress and issue resolution
Mentor junior developers by sharing knowledge and best practices
Stay updated with latest advancements in PySpark, big data technologies, and industry best practices
Identify opportunities for process improvements and implement enhancements
Explore new technologies and innovative solutions to improve data processing capabilities
Ensure compliance with data security and privacy regulations
Implement measures to protect sensitive data and prevent unauthorized access
Research emerging technologies and methodologies to enhance data processing
Develop proof-of-concept solutions to explore new approaches for optimizing workflows
Client: A Leading U.S.-Based Financial Services Company
Certificate of Excellence - Q1 2024: Awarded at the Cognizant Q1 2024 Townhall for active participation in client exchanges and team-building activities
Certificate of Excellence - Q2 2024: Recognized for HCode Scan Utility Innovation and winning the most points in the Client Hackathon
Certificate of Excellence - Q3 2024: Awarded for introducing Automated Data Profiling of Source Files, leading to cost savings of $120K-$200K
Star of Sprint Award: Recognized for significant contributions to feature development and timely deployments

Intern

Cognizant

03.2022 - 08.2022

Assisted in designing, developing, and maintaining cloud-based data solutions using Azure services.
Worked with PySpark to process large datasets efficiently in a distributed environment.
Developed ETL (Extract, Transform, Load) pipelines to ingest and transform data from various sources.
Created and optimized SQL queries in Teradata and Snowflake for data retrieval and reporting.
Automated repetitive database tasks using Python scripting to improve efficiency.
Assisted in migrating on-premises databases to cloud platforms, such as Azure SQL Database and Snowflake.
Performed data validation and quality checks to ensure accuracy and consistency.
Collaborated with senior data engineers and analysts to troubleshoot performance issues and optimize queries.
Gained hands-on experience in CI/CD pipelines for automating data workflows in the cloud.
Documented processes, best practices, and troubleshooting steps to streamline future development.

Education

Bachelor of Engineering - Electrical And Electronics Engineering

Anna University

Chennai, India

03-2022

Skills

Big Data
ETL
PySpark
Hadoop
Cloudera
Hive
ETL tools
Python
SQL

MySQL
RDBMS
AWS
GitLab
Data transformation
Data cleansing
Data aggregation
Unix Scripting
Spark SQL

Certification

Google Data Analytics Professional Certification
Google IT Support Professional Certification

Achievements Awards

Certificate of Excellence - Q1 2024, Awarded at the Cognizant Q1 2024 Townhall for active participation in client exchanges and team-building activities.
Certificate of Excellence - Q2 2024, Recognized for HCode Scan Utility Innovation and winning the most points in the Client Hackathon.
Certificate of Excellence - Q3 2024, Awarded for introducing Automated Data Profiling of Source Files, leading to cost savings of $120K-$200K.
Star of Sprint Award, Recognized for significant contributions to feature development and timely deployments.

Client Engagements

A Leading U.S.-Based Financial Services Company, Data Engineering and Analytics, PySpark Developer, 04/01/23 - 03/14/25

Roles And Responsibilities

1) Development:

Develop PySpark scripts for complex data transformations, cleansing, and aggregation on large datasets.
Design and implement scalable big data solutions using PySpark and the Hadoop ecosystem (HDFS, Hive, Spark SQL).
Create and manage database schemas, tables, and views for optimized data processing.
Optimize PySpark jobs for performance and scalability, ensuring efficient resource usage.
Integrate PySpark applications with other data processing tools and platforms.

2) Testing:

Develop and execute unit tests to validate PySpark scripts.
Conduct integration and performance testing to identify and resolve bottlenecks.
Debug and troubleshoot PySpark applications using appropriate tools.

3) Code Management:

Participate in code reviews to ensure adherence to best practices.
Manage GitLab branches and merges, resolving code conflicts effectively.

4) Collaboration & Communication:

Work closely with data engineers, senior developers, and product owners to understand data requirements.
Maintain comprehensive documentation for PySpark applications.
Communicate regularly with stakeholders on project progress and issue resolution.
Mentor junior developers by sharing knowledge and best practices.

5) Continuous Improvement:

Stay updated with latest advancements in PySpark, big data technologies, and industry best practices.
Identify opportunities for process improvements and implement enhancements.
Explore new technologies and innovative solutions to improve data processing capabilities.

6) Security & Compliance:

Ensure compliance with data security and privacy regulations.
Implement measures to protect sensitive data and prevent unauthorized access.

7) Innovation & Research:

Research emerging technologies and methodologies to enhance data processing.
Develop proof-of-concept solutions to explore new approaches for optimizing workflows.

Languages

English

Full Professional

Malayalam

Native/ Bilingual

Work Authorization

Authorized to work in the U.S. (Green Card) – No sponsorship required now or in the future

References

References available upon request.

Timeline

PySpark Developer

Cognizant

04.2023 - 03.2025

Intern

Cognizant

03.2022 - 08.2022

Bachelor of Engineering - Electrical And Electronics Engineering

Anna University

Similar Profiles

Navya VootlaNavya Vootla
Associate at CognizantAssociate at Cognizant
Vyshali Santhosh KumarVyshali Santhosh Kumar
Support Engineer at CognizantSupport Engineer at Cognizant
Preethi RPreethi R
Programmer Analyst at CognizantProgrammer Analyst at Cognizant
MANISHA REDDY KUCHANPALLYMANISHA REDDY KUCHANPALLY
Infra Dev Specialist at CognizantInfra Dev Specialist at Cognizant
Alexa SohnAlexa Sohn
Merchandise Sales Representative at Broadway In ChicagoMerchandise Sales Representative at Broadway In Chicago