Summary

Overview

Work History

Education

Skills

Certification

Timeline

Chetan Kumar

ALPHARETTA,GA

Summary

Professional AWS Data Engineering specialist with strong expertise in cloud architecture, data pipeline design, and ETL processes. Adept at leveraging AWS services to build scalable and secure data solutions. Known for team collaboration and adaptability, consistently achieving project goals and driving impactful results. Skilled in Python, SQL, and big data technologies, ensuring reliable and efficient data management.

Overview

years of professional experience

Certification

Work History

Sr. AWS Data Engineer

Wells Fargo

10.2023 - Current

It is a Data Migration project where OLTP data resided on big data platform along with traditional databases and migrated to AWS Warehouse including implementations and scheduling activities
Create, manage, and optimize Extract, Transform, Load (ETL) processes using complex SQL queries and Data bricks Spark-based platform
Designed and implemented complex SQL queries for data extraction, transformation, and validation, ensuring data accuracy and optimizing performance for large-scale datasets
Created and maintained interactive notebooks in Data bricks for collaborative data analysis and visualization
Extract banking data from on-premises databases and AWS services using Py Spark
Clean and preprocess raw data using Py Spark, SQL, and Hadoop ecosystem tools and finally migrated to Redshift
Develop and maintain data models for banking applications using MySQL database
Implement machine learning models with Py Spark for tasks like credit risk assessment, fraud detection, and customer churn prediction
Utilize Spark, integrated within Snowflake, to enhance the ETL pipeline, applying parallel processing and in-memory computations to handle transactional data effectively
Implemented Snowflake's features such as data sharing, cloning, and time travel to manage data with high flexibility and minimal storage overhead
Developed and managed ETL workflows using Apache Airflow, ensuring efficient scheduling, and monitoring of complex data pipelines
Implemented custom operators in Airflow to integrate with various data sources and systems, enhancing pipeline flexibility and functionality
Implemented and managed Kafka clusters to ensure seamless data streaming and real-time analytics
Integrated Kafka with various data sources and sinks, including databases, data lakes, and cloud storage
Developed custom Kafka Connectors to facilitate data ingestion and extraction from diverse systems
Implemented and maintained DynamoDB tables to ensure high availability and low-latency performance for large-scale data applications
I Leveraged DynamoDB's fully managed service to streamline database administration tasks, including replication, and backups
Orchestrated complex data pipelines and workflows using AWS Glue's job scheduling and dependency management features, ensuring timely and efficient data processing and orchestration across multiple ETL tasks
Continuously monitored and fine-tuned the performance of data workflows in Databricks
Implemented data integration solutions within the Genesys Engage and Genesys Cloud environments to enhance customer experience analytics
Create visualizations using tools like Matplotlib, Seaborn, and Plotly to communicate insights from banking data
Generate reports and dashboards on key banking metrics using BI tools like Tableau and Power BI
Implemented data governance policies and procedures to ensure data quality, security, and compliance with banking regulations such as GDPR, PCI-DSS, and Basel III
Utilize AWS services like AWS Glue for metadata management and AWS KMS for encryption
Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, Big Data model techniques using Python and Java
ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL writing SQL queries against Snowflake
Perform database administration tasks such as backup and recovery, performance monitoring, and capacity planning for MySQL database supporting banking applications
Use version control systems like Git to track changes to code and settings for data analytics processes and SQL queries while working with teams with different functions
Utilized Erwin to design, maintain, and optimize logical and physical data models for enterprise applications
Develop Jenkins CI/CD pipelines to streamline deployment processes for data analytics solutions
Implement automated testing and validation for PySpark jobs and SQL scripts
Orchestrate complex data workflows using tools like Apache Airflow, scheduling ETL jobs, and data processing tasks involving PySpark, AWS services, and Hadoop ecosystem components
Deploy trained machine learning models into production environments using AWS SageMaker or AWS Lambda, integrating with MySQL databases for real-time predictions
Utilize AWS services such as Amazon EMR for big data processing, Amazon RDS for managed MySQL databases, and Amazon S3 for data storage
Plan and implement scalable data analytics solutions using Amazon EC2, Amazon ECS, or AWS Lambda, ensuring optimal resource utilization and cost efficiency
Participate in security training and awareness programs to stay updated on the latest threats and mitigation techniques
Monitor data engineering pipelines and infrastructure for performance issues using AWS CloudWatch, proactively identifying bottlenecks, and implementing optimizations
Environment: Python, SQL, PySpark, Hadoop ecosystem (Apache Hive, Apache Pig, Apache Kafka), AWS (Amazon EMR, Amazon RDS, Amazon S3, Amazon Redshift, Kinesis, AWS Glue, AWS KMS, AWS Lambda, Amazon EC2, Amazon ECS), Matplotlib, Seaborn, Plotly, Tableau, Power BI, Git, Jenkins, Django,Apache Airflow, AWS SageMaker, MySQL, GDPR, PCI-DSS, Basel III, Confluence, Jira, Data analytics, Jenkins, Grafana, Lucid Chart, Datastage, SSIS

Azure Data Engineer

Aetna

02.2022 - 09.2023

Implemented data integration solutions to consolidate healthcare data from various sources such as EMR (Electronic Medical Records), claims data, and IoT devices
Developed C# codebase for ASP.Net Core APIs and design patterns to ensure efficient integration and seamless data exchange across diverse systems and platforms
I collaborated with healthcare domain experts to understand data requirements and translate them into technical specifications
Designed and implemented RESTful APIs with Django Rest Framework, enabling seamless communication between front-end and back-end systems
Developed and maintained ETL pipelines to seamlessly transfer data between various systems and the enesys platforms
Built and Managed data warehouses on Azure SQL Data Warehouse for storing and querying large volumes of healthcare data
Enhanced performance tuning for ETLs and SQL queries, resulting in significant improvements in data processing efficiency and system performance
Developed and maintained relational and hierarchical databases, efficiently handling structured and semi-structured data for analytical processing
Developed complex SQL queries and stored procedures in Snowflake to support business intelligence and data analytics initiatives
Utilized Airflow's built-in logging and monitoring tools to troubleshoot issues and maintain pipeline health, ensuring minimal downtime
Addressing and resolving security complaints by investigating issues and implementing corrective measures
Collaborating with security teams to ensure all complaints are logged, tracked, and resolved in a timely manner
Implemented role-based access control (RBAC) and other security measures in Airflow to protect sensitive data and maintain compliance with organizational policies
Implemented data security measures such as encryption, access controls, and data masking to protect sensitive healthcare information
Developed and implemented risk mitigation strategies to reduce identified risks to an acceptable level
Designed and developed Kafka-based data pipelines to support scalable, fault-tolerant data processing
I create reusable frameworks and tools to streamline the ingestion process, ensuring consistency, reliability, and performance across different geospatial data sets
Develop and maintain healthcare Ontologies to model medical concepts, relationships, and entities accurately
Develop SPARQL endpoints to facilitate efficient querying and retrieval of healthcare data
Optimize queries to ensure they perform well with large datasets and complex Ontologies
Developed custom ETL (Extract, Transform, Load) processes using Azure databricks and Azure Synapse Analytics
Automated data workflows and processes using Azure Data Factory and Azure Logic Apps to improve efficiency and reduce manual intervention
Employed Kafka Streams API to build real-time data processing applications and complex event processing solutions
Experience with Data Ingestion to cloud data warehouses like Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW)
Built and documented end-to-end data pipelines, emphasizing integration architecture to enable seamless data ingestion, transformation, and consumption across systems
Provided technical guidance and support to junior data engineers and analysts within the healthcare organization
Performed data analysis to identify patterns, trends, and anomalies in healthcare data
Ensured compliance with regulatory requirements such as HIPAA (Health Insurance Portability and Accountability Act) and PHI (Protected Health Information) regulations
Created Spark Jobs to extract data from Hive tables and process the same using Data proc
Documented data architecture, data flows, and processes for knowledge sharing and future reference
Collaborated with cross-functional teams including data scientists, clinicians, and business stakeholders to understand data requirements and deliver actionable insights
I monitor and maintain the data ingestion workflows, proactively identifying and resolving any issues to ensure seamless and continuous data flow
Implemented data governance policies and procedures to ensure data privacy, security, and compliance
I participated in design reviews and architectural discussions to provide input on data engineering with the best practices and standards
Implemented data visualization solutions using tools like Power BI and Tableau to create dashboards and reports for healthcare analytics
Document the data architecture, Ontologies, data models, and integration processes clearly and comprehensively
Conducted training sessions and workshops to educate healthcare staff on data engineering concepts and best practices
Continuously monitored industry trends and emerging technologies in healthcare informatics and data engineering to drive innovation and process improvements within the organization
Environment: Python, Ab Initio, Azure Monitor, Azure Arc, Apache Spark, SPARQL, Azure Data Factory triggers, Azure Data Lake, snowflake, Azure databricks, Azure DW, SQL Server, controIM, veracode, aquaSec, pySpark, Tableau, powerBI, Hive, CI/CD Pipelines, Agile methodology, Data proc, ADF pipelines, databricks, Docker, Kubernetes.

Data Engineer

Arcadis

08.2019 - 11.2021

Designed, built, and maintained data pipelines using Python and SQL for extracting, transforming, and loading data from various sources into data warehouses and data lakes
Architected and maintained data pipelines in Snowflake, optimizing data loading and processing for efficient analytics
Deployed and managed data pipelines using Docker containers and Kubernetes, ensuring scalability and reliability of the ETL process
Optimize data processing workflows and performance by fine-tuning Ab Initio graphs, employing best practices, and implementing efficient data processing techniques to handle big data challenges
Optimized data workflows and processes to improve performance and reduce latency in data retrieval for real-time analytics
Utilized DynamoDB's security features to enforce encryption, fine-grained access control, and compliance with industry standards
Implemented Change Data Capture (CDC) solutions using hudi to ensure real-time data synchronization and consistency across various data sources
Developed and optimized hudi-based pipelines for incremental data processing, significantly improving data ingestion and update performance
Enhanced Databricks jobs performance by using techniques like spark caching, partition pruning, broadcasting, and parallel workflow execution
Developed and implemented data quality controls to ensure data accuracy, completeness, and consistency
Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines; monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notification of failure pipelines
Working on Azure Databricks to run Spark-Python Notebooks through ADF pipelines
Collaborated with data scientists, data analysts, and business stakeholders to understand their data requirements and designed data pipelines that meet those needs using Python and SQL
Monitored the performance of data pipelines and implemented optimizations using Python and SQL to improve performance
Prepare for and participate in security audits to ensure compliance with internal and external security standards
Worked with data security teams to implement data security measures, such as encryption and access controls, to protect sensitive financial information
Documented the design and implementation of data pipelines, including data dictionaries, flow diagrams, and technical specifications, to ensure knowledge transferability
Scheduling Pipelines and monitoring the data movement from source to destinations
Using Query editor in Power BI performed certain Operations like fetching Data from a Different file
Hands on in multiple software paradigms, to support team’s technical infrastructure with project leader
Environment: Python, Ab initio, Snowflake, Apache Spark, Azure Databricks, Azure Data Factory, Azure Data Lake, Azure Synapse Analytics, Azure Databricks, C#, ASP.NET, Docker, Kubernetes, SQL Server, Veracode, AquaSec, PySpark, Tableau, PowerBI, Agile Methodology, SQL stored procedures, GitHub.

Data analyst & Cloud Engineer

EPIQ Systems

11.2017 - 06.2019

Migrated the existing SQL Code to Data Lake and sent the extracted reports to the consumers
Created and maintained efficient data pipelines using Databricks, leveraging Apache Spark to process and transform large datasets
Created PySpark Engines processing huge environmental data load within minutes implementing various business logics
Worked extensively on Data mapping to understand the source to target mapping rules
Developed data pipelines using python, PySpark, Hive, Pig and HBase
Migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1N2)
Implemented best security practices for Kafka, including SSL encryption, Kerberos authentication, and ACLs
Migrating an entire oracle database to Big Query and using PowerBI for reporting
Monitored and optimized Airflow performance, ensuring high availability and scalability of data workflows in production environments
Performed data analysis and data profiling using SQL on various extracts
Created reports of analyzed and validated data using Apache Hue and Hive, generated graphs for data analytics
Utilized Informatica PowerCenter for extracting, transforming, and loading (ETL) large volumes of data from diverse sources into data warehouses, ensuring data accuracy and consistency
Utilized Informatica's version control features to manage ETL codebase, ensuring code consistency, traceability, and easy collaboration within the development team
Worked on data migration into HDFS and Hive using Sqoop
Written multiple batch processes in python and pyspark processing huge amount of time series data which created reports and scheduled these reports to industries
Created analytical reports on this real time environmental data using Tableau
Generated final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector
Conducting internal audits to identify and address potential security gaps before external audits
Used HBase for storing the Meta data of files and maintaining the file patterns
Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation, and support) using Agile Methodologies (Jira)
Environment: Python, Tableau, Pyspark, SQL, Hadoop, DB’s, Hive, Map Reduce, Sqoop, Data Analytics, Oracle, DB2, MongoDB, ADLS, ADF, Jira, ETL.

IT Analyst & Operational Engineer

ATG Pvt.Ltd

06.2014 - 11.2017

Supported teams in monitoring infrastructure, servers, storage, and applications proactively avoiding critical issues that impact client services
Monitored Network and Storage Infrastructure essential to drive applications using industry standard UPTRENDS web application monitoring tool
Used Synthetic Monitoring method to proactively find issues in Relativity, TMX, ITOMS applications with troubleshooting for quick resolution
Implementing infrastructure as code using tools like Terraform and AWS CloudFormation for Automation and Orchestration
Leveraged Docker Swarm for container orchestration, effectively managing clusters of Docker hosts and ensuring high availability of applications
Worked on SSL scan to run scripts using PowerShell
Acted as the primary point of contact for ServiceNow-related inquiries, providing technical support and troubleshooting assistance to end-users and stakeholders across the organization
Collaborated closely with the infrastructure team to plan and execute testing activities following patches, upgrades, and migrations of critical systems and applications
Participated in post-implementation reviews and retrospectives to assess the effectiveness of the testing process and identify opportunities for improvement in future change management activities
Setting up AWS Cloud Watch to track the performance and health of cloud services
Implemented robust monitoring and alerting via AWS Cloud watch to maintain pipeline health, ensuring adherence to security, access controls, encryption, and compliance standards monitored through AWS CloudTrail
Setting up CI/CD pipelines by using Jenkins and AWS Code Pipeline
Customized existing dashboards tile, CRUD operations for monitors and action items in transaction monitors
Created alert definitions, integrations, requesting uptime reports for URLs/instances, moving instances from staging to production and vice versa
Sending Monthly Reports for dedicated clients
Worked with Alto Cloud team to move the HSF/AWS applications into Cloud
Played a key role in the deployment and configuration of SolarWinds environments, including the installation of SolarWinds Network Performance Monitor (NPM) and other relevant modules
Managed Relativity servers and dedicated client servers/nodes across the globe
Fixed server disk space and storage issues by using VM horizon client
Ran Regression test scripts to identify the defects
Worked on Server Patching & Upgrades in Relativity Applications
Performed Smoke Testing for post patching and upgrades for testing applications
Environment: ITIL-ServiceNow, Relativity, Uptrends Monitoring tool, VMWare- SolarWinds, AWS (EC2, S3, AWS Cloud Watch), Manual testing (integration testing, smoke testing), Automation Testing (Java & Selenium, PowerShell), SQL Server, Dockerswarm, Jenkins, Terraform, Regression testing.

Education

Master of Science - Computers And Information Science

University of South Alabama

Mobile, AL

12.2022

Bachelor of Science - Information Technology

Sree Vidyanikethan Engineering College

Tirupati

05.2014

Skills

Lambda functions
Data lake management
Python programming
Data modeling techniques
Tableau visualization
Big data processing
Hadoop ecosystem
Data migration strategies
Data pipeline development
SQL querying
ETL design and implementation
Data warehousing solutions

PowerBI reporting
Apache Spark mastery
Continuous integration and deployment
Infrastructure as Code
API development
Machine learning integration
NoSQL databases
DynamoDB experience
Amazon S3 proficiency
AWS redshift expertise
AWS glue ETL management

Certification

Amazon Web Services (AWS) Certified Solutions Architect

Timeline

Sr. AWS Data Engineer

Wells Fargo

10.2023 - Current

Azure Data Engineer

Aetna

02.2022 - 09.2023

Data Engineer

Arcadis

08.2019 - 11.2021

Data analyst & Cloud Engineer

EPIQ Systems

11.2017 - 06.2019

IT Analyst & Operational Engineer

ATG Pvt.Ltd

06.2014 - 11.2017

Master of Science - Computers And Information Science

University of South Alabama

Bachelor of Science - Information Technology

Sree Vidyanikethan Engineering College

Chetan Kumar

Summary

Overview

Work History

Sr. AWS Data Engineer

Azure Data Engineer

Data Engineer

Data analyst & Cloud Engineer

IT Analyst & Operational Engineer

Education

Master of Science - Computers And Information Science

Bachelor of Science - Information Technology

Skills

Certification

Timeline

Sr. AWS Data Engineer

Azure Data Engineer

Data Engineer

Data analyst & Cloud Engineer

IT Analyst & Operational Engineer

Master of Science - Computers And Information Science

Bachelor of Science - Information Technology

Similar Profiles

PATRICIA M. SCHERERPATRICIA M. SCHERER

Andrew S. HollenbachAndrew S. Hollenbach

Laxmikanth ChittampallyLaxmikanth Chittampally

Mayank OberoiMayank Oberoi

Dinesh Reddy PinnamulDinesh Reddy Pinnamul