Summary
Overview
Work History
Education
Skills
Certification
Achievements Awards
Client Engagements
Roles And Responsibilities
Languages
Work Authorization
References
Timeline
Generic
Anton Puthenpurackal Jackson

Anton Puthenpurackal Jackson

Arlington Heights,IL

Summary

Dynamic PySpark Developer with proven expertise at Cognizant Technology Solutions, specializing in big data solutions and ETL processes. Recognized for innovative contributions that saved clients up to $200K. Proficient in data transformation and agile collaboration, driving project success through effective communication and technical excellence.

Overview

2
2
years of professional experience
2
2
Certification

Work History

PySpark Developer

Cognizant
04.2023 - 03.2025
  • Company Overview: Client: A Leading U.S.-Based Financial Services Company
  • Develop PySpark scripts for complex data transformations, cleansing, and aggregation on large datasets
  • Design and implement scalable big data solutions using PySpark and the Hadoop ecosystem (HDFS, Hive, Spark SQL)
  • Create and manage database schemas, tables, and views for optimized data processing
  • Optimize PySpark jobs for performance and scalability, ensuring efficient resource usage
  • Integrate PySpark applications with other data processing tools and platforms
  • Develop and execute unit tests to validate PySpark scripts
  • Conduct integration and performance testing to identify and resolve bottlenecks
  • Debug and troubleshoot PySpark applications using appropriate tools
  • Participate in code reviews to ensure adherence to best practices
  • Manage GitLab branches and merges, resolving code conflicts effectively
  • Work closely with data engineers, senior developers, and product owners to understand data requirements
  • Maintain comprehensive documentation for PySpark applications
  • Communicate regularly with stakeholders on project progress and issue resolution
  • Mentor junior developers by sharing knowledge and best practices
  • Stay updated with latest advancements in PySpark, big data technologies, and industry best practices
  • Identify opportunities for process improvements and implement enhancements
  • Explore new technologies and innovative solutions to improve data processing capabilities
  • Ensure compliance with data security and privacy regulations
  • Implement measures to protect sensitive data and prevent unauthorized access
  • Research emerging technologies and methodologies to enhance data processing
  • Develop proof-of-concept solutions to explore new approaches for optimizing workflows
  • Client: A Leading U.S.-Based Financial Services Company
  • Certificate of Excellence - Q1 2024: Awarded at the Cognizant Q1 2024 Townhall for active participation in client exchanges and team-building activities
  • Certificate of Excellence - Q2 2024: Recognized for HCode Scan Utility Innovation and winning the most points in the Client Hackathon
  • Certificate of Excellence - Q3 2024: Awarded for introducing Automated Data Profiling of Source Files, leading to cost savings of $120K-$200K
  • Star of Sprint Award: Recognized for significant contributions to feature development and timely deployments

Intern

Cognizant
03.2022 - 08.2022
  • Assisted in designing, developing, and maintaining cloud-based data solutions using Azure services.
  • Worked with PySpark to process large datasets efficiently in a distributed environment.
  • Developed ETL (Extract, Transform, Load) pipelines to ingest and transform data from various sources.
  • Created and optimized SQL queries in Teradata and Snowflake for data retrieval and reporting.
  • Automated repetitive database tasks using Python scripting to improve efficiency.
  • Assisted in migrating on-premises databases to cloud platforms, such as Azure SQL Database and Snowflake.
  • Performed data validation and quality checks to ensure accuracy and consistency.
  • Collaborated with senior data engineers and analysts to troubleshoot performance issues and optimize queries.
  • Gained hands-on experience in CI/CD pipelines for automating data workflows in the cloud.
  • Documented processes, best practices, and troubleshooting steps to streamline future development.

Education

Bachelor of Engineering - Electrical And Electronics Engineering

Anna University
Chennai, India
03-2022

Skills

  • Big Data
  • ETL
  • PySpark
  • Hadoop
  • Cloudera
  • Hive
  • ETL tools
  • Python
  • SQL
  • MySQL
  • RDBMS
  • AWS
  • GitLab
  • Data transformation
  • Data cleansing
  • Data aggregation
  • Unix Scripting
  • Spark SQL

Certification

  • Google Data Analytics Professional Certification
  • Google IT Support Professional Certification

Achievements Awards

  • Certificate of Excellence - Q1 2024, Awarded at the Cognizant Q1 2024 Townhall for active participation in client exchanges and team-building activities.
  • Certificate of Excellence - Q2 2024, Recognized for HCode Scan Utility Innovation and winning the most points in the Client Hackathon.
  • Certificate of Excellence - Q3 2024, Awarded for introducing Automated Data Profiling of Source Files, leading to cost savings of $120K-$200K.
  • Star of Sprint Award, Recognized for significant contributions to feature development and timely deployments.

Client Engagements

A Leading U.S.-Based Financial Services Company, Data Engineering and Analytics, PySpark Developer, 04/01/23 - 03/14/25

Roles And Responsibilities

1) Development:

  • Develop PySpark scripts for complex data transformations, cleansing, and aggregation on large datasets.
  • Design and implement scalable big data solutions using PySpark and the Hadoop ecosystem (HDFS, Hive, Spark SQL).
  • Create and manage database schemas, tables, and views for optimized data processing.
  • Optimize PySpark jobs for performance and scalability, ensuring efficient resource usage.
  • Integrate PySpark applications with other data processing tools and platforms.

2) Testing:

  • Develop and execute unit tests to validate PySpark scripts.
  • Conduct integration and performance testing to identify and resolve bottlenecks.
  • Debug and troubleshoot PySpark applications using appropriate tools.

3) Code Management:

  • Participate in code reviews to ensure adherence to best practices.
  • Manage GitLab branches and merges, resolving code conflicts effectively.

4) Collaboration & Communication:

  • Work closely with data engineers, senior developers, and product owners to understand data requirements.
  • Maintain comprehensive documentation for PySpark applications.
  • Communicate regularly with stakeholders on project progress and issue resolution.
  • Mentor junior developers by sharing knowledge and best practices.

5) Continuous Improvement:

  • Stay updated with latest advancements in PySpark, big data technologies, and industry best practices.
  • Identify opportunities for process improvements and implement enhancements.
  • Explore new technologies and innovative solutions to improve data processing capabilities.

6) Security & Compliance:

  • Ensure compliance with data security and privacy regulations.
  • Implement measures to protect sensitive data and prevent unauthorized access.

7) Innovation & Research:

  • Research emerging technologies and methodologies to enhance data processing.
  • Develop proof-of-concept solutions to explore new approaches for optimizing workflows.

Languages

English
Full Professional
Malayalam
Native/ Bilingual

Work Authorization

Authorized to work in the U.S. (Green Card) – No sponsorship required now or in the future

References

References available upon request.

Timeline

PySpark Developer

Cognizant
04.2023 - 03.2025

Intern

Cognizant
03.2022 - 08.2022

Bachelor of Engineering - Electrical And Electronics Engineering

Anna University
Anton Puthenpurackal Jackson