Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Chetan Kumar

ALPHARETTA,GA

Summary

Professional AWS Data Engineering specialist with strong expertise in cloud architecture, data pipeline design, and ETL processes. Adept at leveraging AWS services to build scalable and secure data solutions. Known for team collaboration and adaptability, consistently achieving project goals and driving impactful results. Skilled in Python, SQL, and big data technologies, ensuring reliable and efficient data management.

Overview

11
11
years of professional experience
1
1
Certification

Work History

Sr. AWS Data Engineer

Wells Fargo
10.2023 - Current
  • It is a Data Migration project where OLTP data resided on big data platform along with traditional databases and migrated to AWS Warehouse including implementations and scheduling activities
  • Create, manage, and optimize Extract, Transform, Load (ETL) processes using complex SQL queries and Data bricks Spark-based platform
  • Designed and implemented complex SQL queries for data extraction, transformation, and validation, ensuring data accuracy and optimizing performance for large-scale datasets
  • Created and maintained interactive notebooks in Data bricks for collaborative data analysis and visualization
  • Extract banking data from on-premises databases and AWS services using Py Spark
  • Clean and preprocess raw data using Py Spark, SQL, and Hadoop ecosystem tools and finally migrated to Redshift
  • Develop and maintain data models for banking applications using MySQL database
  • Implement machine learning models with Py Spark for tasks like credit risk assessment, fraud detection, and customer churn prediction
  • Utilize Spark, integrated within Snowflake, to enhance the ETL pipeline, applying parallel processing and in-memory computations to handle transactional data effectively
  • Implemented Snowflake's features such as data sharing, cloning, and time travel to manage data with high flexibility and minimal storage overhead
  • Developed and managed ETL workflows using Apache Airflow, ensuring efficient scheduling, and monitoring of complex data pipelines
  • Implemented custom operators in Airflow to integrate with various data sources and systems, enhancing pipeline flexibility and functionality
  • Implemented and managed Kafka clusters to ensure seamless data streaming and real-time analytics
  • Integrated Kafka with various data sources and sinks, including databases, data lakes, and cloud storage
  • Developed custom Kafka Connectors to facilitate data ingestion and extraction from diverse systems
  • Implemented and maintained DynamoDB tables to ensure high availability and low-latency performance for large-scale data applications
  • I Leveraged DynamoDB's fully managed service to streamline database administration tasks, including replication, and backups
  • Orchestrated complex data pipelines and workflows using AWS Glue's job scheduling and dependency management features, ensuring timely and efficient data processing and orchestration across multiple ETL tasks
  • Continuously monitored and fine-tuned the performance of data workflows in Databricks
  • Implemented data integration solutions within the Genesys Engage and Genesys Cloud environments to enhance customer experience analytics
  • Create visualizations using tools like Matplotlib, Seaborn, and Plotly to communicate insights from banking data
  • Generate reports and dashboards on key banking metrics using BI tools like Tableau and Power BI
  • Implemented data governance policies and procedures to ensure data quality, security, and compliance with banking regulations such as GDPR, PCI-DSS, and Basel III
  • Utilize AWS services like AWS Glue for metadata management and AWS KMS for encryption
  • Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, Big Data model techniques using Python and Java
  • ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL writing SQL queries against Snowflake
  • Perform database administration tasks such as backup and recovery, performance monitoring, and capacity planning for MySQL database supporting banking applications
  • Use version control systems like Git to track changes to code and settings for data analytics processes and SQL queries while working with teams with different functions
  • Utilized Erwin to design, maintain, and optimize logical and physical data models for enterprise applications
  • Develop Jenkins CI/CD pipelines to streamline deployment processes for data analytics solutions
  • Implement automated testing and validation for PySpark jobs and SQL scripts
  • Orchestrate complex data workflows using tools like Apache Airflow, scheduling ETL jobs, and data processing tasks involving PySpark, AWS services, and Hadoop ecosystem components
  • Deploy trained machine learning models into production environments using AWS SageMaker or AWS Lambda, integrating with MySQL databases for real-time predictions
  • Utilize AWS services such as Amazon EMR for big data processing, Amazon RDS for managed MySQL databases, and Amazon S3 for data storage
  • Plan and implement scalable data analytics solutions using Amazon EC2, Amazon ECS, or AWS Lambda, ensuring optimal resource utilization and cost efficiency
  • Participate in security training and awareness programs to stay updated on the latest threats and mitigation techniques
  • Monitor data engineering pipelines and infrastructure for performance issues using AWS CloudWatch, proactively identifying bottlenecks, and implementing optimizations
  • Environment: Python, SQL, PySpark, Hadoop ecosystem (Apache Hive, Apache Pig, Apache Kafka), AWS (Amazon EMR, Amazon RDS, Amazon S3, Amazon Redshift, Kinesis, AWS Glue, AWS KMS, AWS Lambda, Amazon EC2, Amazon ECS), Matplotlib, Seaborn, Plotly, Tableau, Power BI, Git, Jenkins, Django,Apache Airflow, AWS SageMaker, MySQL, GDPR, PCI-DSS, Basel III, Confluence, Jira, Data analytics, Jenkins, Grafana, Lucid Chart, Datastage, SSIS

Azure Data Engineer

Aetna
02.2022 - 09.2023
  • Implemented data integration solutions to consolidate healthcare data from various sources such as EMR (Electronic Medical Records), claims data, and IoT devices
  • Developed C# codebase for ASP.Net Core APIs and design patterns to ensure efficient integration and seamless data exchange across diverse systems and platforms
  • I collaborated with healthcare domain experts to understand data requirements and translate them into technical specifications
  • Designed and implemented RESTful APIs with Django Rest Framework, enabling seamless communication between front-end and back-end systems
  • Developed and maintained ETL pipelines to seamlessly transfer data between various systems and the enesys platforms
  • Built and Managed data warehouses on Azure SQL Data Warehouse for storing and querying large volumes of healthcare data
  • Enhanced performance tuning for ETLs and SQL queries, resulting in significant improvements in data processing efficiency and system performance
  • Developed and maintained relational and hierarchical databases, efficiently handling structured and semi-structured data for analytical processing
  • Developed complex SQL queries and stored procedures in Snowflake to support business intelligence and data analytics initiatives
  • Utilized Airflow's built-in logging and monitoring tools to troubleshoot issues and maintain pipeline health, ensuring minimal downtime
  • Addressing and resolving security complaints by investigating issues and implementing corrective measures
  • Collaborating with security teams to ensure all complaints are logged, tracked, and resolved in a timely manner
  • Implemented role-based access control (RBAC) and other security measures in Airflow to protect sensitive data and maintain compliance with organizational policies
  • Implemented data security measures such as encryption, access controls, and data masking to protect sensitive healthcare information
  • Developed and implemented risk mitigation strategies to reduce identified risks to an acceptable level
  • Designed and developed Kafka-based data pipelines to support scalable, fault-tolerant data processing
  • I create reusable frameworks and tools to streamline the ingestion process, ensuring consistency, reliability, and performance across different geospatial data sets
  • Develop and maintain healthcare Ontologies to model medical concepts, relationships, and entities accurately
  • Develop SPARQL endpoints to facilitate efficient querying and retrieval of healthcare data
  • Optimize queries to ensure they perform well with large datasets and complex Ontologies
  • Developed custom ETL (Extract, Transform, Load) processes using Azure databricks and Azure Synapse Analytics
  • Automated data workflows and processes using Azure Data Factory and Azure Logic Apps to improve efficiency and reduce manual intervention
  • Employed Kafka Streams API to build real-time data processing applications and complex event processing solutions
  • Experience with Data Ingestion to cloud data warehouses like Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW)
  • Built and documented end-to-end data pipelines, emphasizing integration architecture to enable seamless data ingestion, transformation, and consumption across systems
  • Provided technical guidance and support to junior data engineers and analysts within the healthcare organization
  • Performed data analysis to identify patterns, trends, and anomalies in healthcare data
  • Ensured compliance with regulatory requirements such as HIPAA (Health Insurance Portability and Accountability Act) and PHI (Protected Health Information) regulations
  • Created Spark Jobs to extract data from Hive tables and process the same using Data proc
  • Documented data architecture, data flows, and processes for knowledge sharing and future reference
  • Collaborated with cross-functional teams including data scientists, clinicians, and business stakeholders to understand data requirements and deliver actionable insights
  • I monitor and maintain the data ingestion workflows, proactively identifying and resolving any issues to ensure seamless and continuous data flow
  • Implemented data governance policies and procedures to ensure data privacy, security, and compliance
  • I participated in design reviews and architectural discussions to provide input on data engineering with the best practices and standards
  • Implemented data visualization solutions using tools like Power BI and Tableau to create dashboards and reports for healthcare analytics
  • Document the data architecture, Ontologies, data models, and integration processes clearly and comprehensively
  • Conducted training sessions and workshops to educate healthcare staff on data engineering concepts and best practices
  • Continuously monitored industry trends and emerging technologies in healthcare informatics and data engineering to drive innovation and process improvements within the organization
  • Environment: Python, Ab Initio, Azure Monitor, Azure Arc, Apache Spark, SPARQL, Azure Data Factory triggers, Azure Data Lake, snowflake, Azure databricks, Azure DW, SQL Server, controIM, veracode, aquaSec, pySpark, Tableau, powerBI, Hive, CI/CD Pipelines, Agile methodology, Data proc, ADF pipelines, databricks, Docker, Kubernetes.

Data Engineer

Arcadis
08.2019 - 11.2021
  • Designed, built, and maintained data pipelines using Python and SQL for extracting, transforming, and loading data from various sources into data warehouses and data lakes
  • Architected and maintained data pipelines in Snowflake, optimizing data loading and processing for efficient analytics
  • Deployed and managed data pipelines using Docker containers and Kubernetes, ensuring scalability and reliability of the ETL process
  • Optimize data processing workflows and performance by fine-tuning Ab Initio graphs, employing best practices, and implementing efficient data processing techniques to handle big data challenges
  • Optimized data workflows and processes to improve performance and reduce latency in data retrieval for real-time analytics
  • Utilized DynamoDB's security features to enforce encryption, fine-grained access control, and compliance with industry standards
  • Implemented Change Data Capture (CDC) solutions using hudi to ensure real-time data synchronization and consistency across various data sources
  • Developed and optimized hudi-based pipelines for incremental data processing, significantly improving data ingestion and update performance
  • Enhanced Databricks jobs performance by using techniques like spark caching, partition pruning, broadcasting, and parallel workflow execution
  • Developed and implemented data quality controls to ensure data accuracy, completeness, and consistency
  • Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines; monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notification of failure pipelines
  • Working on Azure Databricks to run Spark-Python Notebooks through ADF pipelines
  • Collaborated with data scientists, data analysts, and business stakeholders to understand their data requirements and designed data pipelines that meet those needs using Python and SQL
  • Monitored the performance of data pipelines and implemented optimizations using Python and SQL to improve performance
  • Prepare for and participate in security audits to ensure compliance with internal and external security standards
  • Worked with data security teams to implement data security measures, such as encryption and access controls, to protect sensitive financial information
  • Documented the design and implementation of data pipelines, including data dictionaries, flow diagrams, and technical specifications, to ensure knowledge transferability
  • Scheduling Pipelines and monitoring the data movement from source to destinations
  • Using Query editor in Power BI performed certain Operations like fetching Data from a Different file
  • Hands on in multiple software paradigms, to support team’s technical infrastructure with project leader
  • Environment: Python, Ab initio, Snowflake, Apache Spark, Azure Databricks, Azure Data Factory, Azure Data Lake, Azure Synapse Analytics, Azure Databricks, C#, ASP.NET, Docker, Kubernetes, SQL Server, Veracode, AquaSec, PySpark, Tableau, PowerBI, Agile Methodology, SQL stored procedures, GitHub.

Data analyst & Cloud Engineer

EPIQ Systems
11.2017 - 06.2019
  • Migrated the existing SQL Code to Data Lake and sent the extracted reports to the consumers
  • Created and maintained efficient data pipelines using Databricks, leveraging Apache Spark to process and transform large datasets
  • Created PySpark Engines processing huge environmental data load within minutes implementing various business logics
  • Worked extensively on Data mapping to understand the source to target mapping rules
  • Developed data pipelines using python, PySpark, Hive, Pig and HBase
  • Migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1N2)
  • Implemented best security practices for Kafka, including SSL encryption, Kerberos authentication, and ACLs
  • Migrating an entire oracle database to Big Query and using PowerBI for reporting
  • Monitored and optimized Airflow performance, ensuring high availability and scalability of data workflows in production environments
  • Performed data analysis and data profiling using SQL on various extracts
  • Created reports of analyzed and validated data using Apache Hue and Hive, generated graphs for data analytics
  • Utilized Informatica PowerCenter for extracting, transforming, and loading (ETL) large volumes of data from diverse sources into data warehouses, ensuring data accuracy and consistency
  • Utilized Informatica's version control features to manage ETL codebase, ensuring code consistency, traceability, and easy collaboration within the development team
  • Worked on data migration into HDFS and Hive using Sqoop
  • Written multiple batch processes in python and pyspark processing huge amount of time series data which created reports and scheduled these reports to industries
  • Created analytical reports on this real time environmental data using Tableau
  • Generated final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector
  • Conducting internal audits to identify and address potential security gaps before external audits
  • Used HBase for storing the Meta data of files and maintaining the file patterns
  • Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation, and support) using Agile Methodologies (Jira)
  • Environment: Python, Tableau, Pyspark, SQL, Hadoop, DB’s, Hive, Map Reduce, Sqoop, Data Analytics, Oracle, DB2, MongoDB, ADLS, ADF, Jira, ETL.

IT Analyst & Operational Engineer

ATG Pvt.Ltd
06.2014 - 11.2017
  • Supported teams in monitoring infrastructure, servers, storage, and applications proactively avoiding critical issues that impact client services
  • Monitored Network and Storage Infrastructure essential to drive applications using industry standard UPTRENDS web application monitoring tool
  • Used Synthetic Monitoring method to proactively find issues in Relativity, TMX, ITOMS applications with troubleshooting for quick resolution
  • Implementing infrastructure as code using tools like Terraform and AWS CloudFormation for Automation and Orchestration
  • Leveraged Docker Swarm for container orchestration, effectively managing clusters of Docker hosts and ensuring high availability of applications
  • Worked on SSL scan to run scripts using PowerShell
  • Acted as the primary point of contact for ServiceNow-related inquiries, providing technical support and troubleshooting assistance to end-users and stakeholders across the organization
  • Collaborated closely with the infrastructure team to plan and execute testing activities following patches, upgrades, and migrations of critical systems and applications
  • Participated in post-implementation reviews and retrospectives to assess the effectiveness of the testing process and identify opportunities for improvement in future change management activities
  • Setting up AWS Cloud Watch to track the performance and health of cloud services
  • Implemented robust monitoring and alerting via AWS Cloud watch to maintain pipeline health, ensuring adherence to security, access controls, encryption, and compliance standards monitored through AWS CloudTrail
  • Setting up CI/CD pipelines by using Jenkins and AWS Code Pipeline
  • Customized existing dashboards tile, CRUD operations for monitors and action items in transaction monitors
  • Created alert definitions, integrations, requesting uptime reports for URLs/instances, moving instances from staging to production and vice versa
  • Sending Monthly Reports for dedicated clients
  • Worked with Alto Cloud team to move the HSF/AWS applications into Cloud
  • Played a key role in the deployment and configuration of SolarWinds environments, including the installation of SolarWinds Network Performance Monitor (NPM) and other relevant modules
  • Managed Relativity servers and dedicated client servers/nodes across the globe
  • Fixed server disk space and storage issues by using VM horizon client
  • Ran Regression test scripts to identify the defects
  • Worked on Server Patching & Upgrades in Relativity Applications
  • Performed Smoke Testing for post patching and upgrades for testing applications
  • Environment: ITIL-ServiceNow, Relativity, Uptrends Monitoring tool, VMWare- SolarWinds, AWS (EC2, S3, AWS Cloud Watch), Manual testing (integration testing, smoke testing), Automation Testing (Java & Selenium, PowerShell), SQL Server, Dockerswarm, Jenkins, Terraform, Regression testing.

Education

Master of Science - Computers And Information Science

University of South Alabama
Mobile, AL
12.2022

Bachelor of Science - Information Technology

Sree Vidyanikethan Engineering College
Tirupati
05.2014

Skills

  • Lambda functions
  • Data lake management
  • Python programming
  • Data modeling techniques
  • Tableau visualization
  • Big data processing
  • Hadoop ecosystem
  • Data migration strategies
  • Data pipeline development
  • SQL querying
  • ETL design and implementation
  • Data warehousing solutions
  • PowerBI reporting
  • Apache Spark mastery
  • Continuous integration and deployment
  • Infrastructure as Code
  • API development
  • Machine learning integration
  • NoSQL databases
  • DynamoDB experience
  • Amazon S3 proficiency
  • AWS redshift expertise
  • AWS glue ETL management

Certification

  • Amazon Web Services (AWS) Certified Solutions Architect


Timeline

Sr. AWS Data Engineer

Wells Fargo
10.2023 - Current

Azure Data Engineer

Aetna
02.2022 - 09.2023

Data Engineer

Arcadis
08.2019 - 11.2021

Data analyst & Cloud Engineer

EPIQ Systems
11.2017 - 06.2019

IT Analyst & Operational Engineer

ATG Pvt.Ltd
06.2014 - 11.2017

Master of Science - Computers And Information Science

University of South Alabama

Bachelor of Science - Information Technology

Sree Vidyanikethan Engineering College
Chetan Kumar