Summary

Overview

Work History

Education

Skills

Education

Certification

Skills Summary

Timeline

NAVEEN CHAPPIDI

Frisco,TX

Summary

Experienced Data Engineer with 7 years of expertise in building and managing data solutions using AWS and Azure services. Proficient in AWS services including EMR, S3, EC2, IAM, Lambda, RDS, Route 53, Glue, CloudWatch, and Redshift, as well as Azure services like Data Factory, ADLS, AppInsights, GraphQL API, and REST API. Skilled in programming with Python, Spark, Scala, PySpark, and Databricks. Proficient in setting up and optimizing CI/CD pipelines using Jenkins and Blue Ocean, ensuring efficient and automated development workflows. Additionally, well-versed in utilizing data tools such as Dremio, Snowflake, Looker, and Infogix to enhance data processing and analysis. My expertise in both AWS and Azure, along with a strong foundation in programming and data tools, positions me as a versatile and capable Data Engineer for complex data projects and analytics solutions.

Overview

years of professional experience

Certification

Work History

Sr Data Engineer

Avant

03.2020 - Current

Build scalable databases capable of ETL processes using Python and Py Spark
Experience in reducing the latency of spark jobs for faster data processing by tweaking spark configurations and other optimization techniques
Experience working with tools like Airflow for scheduling the jobs and ad hoc manual jobs
Experience working with project managers to understand and design the workflows architectures as per requirements and data scientists to assist on feature engineering
Experience in building scalable distributed data solutions using an EMR cluster environment with Amazon EMR
Experience in performing ETL operations and debugging and fixing issues like memory exceeds on EMR clusters
Developed python scripts for data ingestion process in Apache Spark using Py Spark
Experience in containerization technology like Docker for the runtime environment of the system to build, test & deploy
Experience using different AWS services like EC2, EMR, DynamoDB, Aurora, S3, IAM, CloudWatch, Glue
Worked on various POC’s to adopt new technologies like Apache Airflow, Snowflake and Terraform for infrastructure management
Experience with Continuous Integration process and Automated deployments using Git, Jenkins, Docker by developing scripts using python, bash scripting
Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
Hands-on experience with Big Data application phases like data ingestion, data analytics and data visualization
Analyzed the SQL scripts and designed the solution to implement using PySpark for faster processing of data
Experience on writing python code to schedule the jobs dynamically in Apache Airflow
Extensively worked on various data analysis and transformation BI tools like Dremio and Looker
Developed PySpark and Spark SQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed
Created Source to Target Mappings (STM) for the required tables by understanding the business requirements for the reports
Encoded and decoded Json objects using PySpark framework to create and modify the data frames in Apache Spark
Experience working in Agile Methodology by actively participating in grooming and planning sessions on sprint basis
Environment: AWS EMR, EC2, S3, Quick sight, Apache Spark, Airflow, Docker, PySpark, Spark SQL, Python, SQL, UNIX, Shell scripting, Dremio, Looker, Infogix, Alation.

Data Engineer

Cox Automotive

08.2018 - 03.2020

Translate business propositions into quantitative queries and collect or clean the necessary data
Experience Creating hive queries for extracting data and sending them to clients
Experience developing Scala programs to develop reports for Business users
Transformation and Analysis in Hive/Pig, Parsing the raw data using Map reduce and SPARK
Proactively involved in ongoing maintenance, support, and improvements in Hadoop cluster
Worked on capturing transactional changes in the data using MapReduce and HBase
Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestions converting to Hadoop using MapReduce, Hive, Spark, Sqoop and Pig Latin
Worked with Sqoop import and export functionalities to handle large data set transfer between DB2 database and HDFS
Experience working with SQL to perform complex SQL queries
Supported multiple application extracts coming out of Big Data Platform
Followed Agile methodology during project delivery
Managed and distributed tasks, responsibilities between offshore and onsite team members
Design & Develop ETL workflow using Oozie which includes automating the extraction of data from different databases into HDFS using Sqoop scripts, Transformation and Analysis in Hive/Pig
Worked on parsing the raw data using Map reduce and ingesting data from different sources
Developed Map reduce programs and hive UDF for data extraction, data manipulation
Worked with different file formats and compression techniques in Hadoop
Performed data analytics in Hive and then exported these metrics back to Oracle Database using Sqoop
Performance tuning of Hive queries, Map reduce programs for different applications
Worked with big data developers, designers and data scientists in troubleshooting map reduce, hive jobs and tuned them to give high performance
Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements
Environment: Hadoop, CDH, Map Reduce, Hive, Pig, SQOOP, Kafka, Java, Spark, OOZIE, Python, UNIX, Shell scripting, JCL.

Data Engineer

T-Mobile

05.2017 - 08.2018

Analyze, design and build Modern data solutions using Azure PaS service to support visualization of data
Understand current Production state of application and determine the impact of new implementation on existing business processes
Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns
Extract Transform and Load data from sources Systems to Azure Data Storage services using a combination of Azure Data factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
Data ingestion to one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks Experienced in creating and maintaining the functions, procedures, tasks, and views in Snowflake
Hands-on creating the data factory pipelines using the Azure data bricks notebooks and then loading the data into snowflake tables
Develop Spark applications using PySpark and Spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insight into customer usage patterns
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster Ability to apply the spark Data Frame API to complete Data manipulation within spark session Good understanding of Spark Architecture including spark core, spark SQL, Data Frame, Spark streaming, Driver Node, Worker Node, Stages, Executors and Tasks, Deployment modes, the Execution hierarchy, fault tolerance, and collection Experience analyzing data from Azure Data Storages using Databricks for deriving insights using Spark cluster capabilities
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
Experienced building the GitHub workflows for build and deploying the changes
Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the SQL Activity and is experienced in publishing the data to the event hub for the downstream teams to consume it
Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services
Performed logical and Physical Data Structure designs and DDL generation to facilitate the implementation of database tables and columns out to the DB2, Oracle SQL Server, Snowflake and Oracle DB Schema environment using Erwin Data Modeler Model Mart Repository version 9.6
Environment: Python 3.6, Scala 2.12, PySpark, Azure Services like Azure Blob storage, Cosmos DB, Azure Data Lake, Azure HDInsight, Azure Databricks, Azure SQL Data Warehouse, Hive, SQL, PowerBI, PyCharm, Microsoft Visual Code, Linux, Shell Scripting

Education

Master of Science -

Fairleigh Dickinson University

Teaneck, NJ

Skills

Data Curating
Data Programming
Data Operations
Data Repositories

RDBMS
SQL and Databases
Advanced Analytics
Technology Leadership Work Streams

Education

Electrical and Electronics Engineering

Certification

Azure Data Engineer Associate
Databricks Developer Foundations
Snow Pro core certification

Skills Summary

EMR, S3, IAM, Lambda, CloudWatch, ECR, ELB, EC2, Glue, SNS, Redshift, Azure Databricks, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Event Hubs, Azure App, Python, Scala, Java script, React, SQL, Shell, Bash, Cobol, MySQL, PostgreSQL, Oracle, HBase, Nettezza, Cassandra, Apache Spark, Spark SQL, PySpark, Scala, Dremio, Looker, Redshift, Snowflake, Tableau, MS Excel, Jenkins, Airflow, Kubernetes, Docker, Ansible, AWS Glue, Informatica, Hue, Jupyter, Infogix, Alation, GitHub, Bit Bucket, SVN, VSTS, Rally, Jira, Django, Flask, Spring boot, Infogix, Alation, Docker, Kubernetes, Jupiter, Amazon Web Services (AWS), Azure Data Factory, Terraform

Timeline

Sr Data Engineer

Avant

03.2020 - Current

Data Engineer

Cox Automotive

08.2018 - 03.2020

Data Engineer

T-Mobile

05.2017 - 08.2018

Master of Science -

Fairleigh Dickinson University

NAVEEN CHAPPIDI

Summary

Overview

Work History

Sr Data Engineer

Data Engineer

Data Engineer

Education

Master of Science -

Skills

Education

Certification

Skills Summary

Timeline

Sr Data Engineer

Data Engineer

Data Engineer

Master of Science -

Similar Profiles

Arya BazaadArya Bazaad

Justin PenrodJustin Penrod

Toby ParmleyToby Parmley

Kayla MasseyKayla Massey

James SummerallJames Summerall