Summary
Overview
Work History
Education
Skills
Education
Certification
Skills Summary
Timeline
Generic

NAVEEN CHAPPIDI

Frisco,TX

Summary

Experienced Data Engineer with 7 years of expertise in building and managing data solutions using AWS and Azure services. Proficient in AWS services including EMR, S3, EC2, IAM, Lambda, RDS, Route 53, Glue, CloudWatch, and Redshift, as well as Azure services like Data Factory, ADLS, AppInsights, GraphQL API, and REST API. Skilled in programming with Python, Spark, Scala, PySpark, and Databricks. Proficient in setting up and optimizing CI/CD pipelines using Jenkins and Blue Ocean, ensuring efficient and automated development workflows. Additionally, well-versed in utilizing data tools such as Dremio, Snowflake, Looker, and Infogix to enhance data processing and analysis. My expertise in both AWS and Azure, along with a strong foundation in programming and data tools, positions me as a versatile and capable Data Engineer for complex data projects and analytics solutions.

Overview

7
7
years of professional experience
1
1
Certification

Work History

Sr Data Engineer

Avant
03.2020 - Current
  • Build scalable databases capable of ETL processes using Python and Py Spark
  • Experience in reducing the latency of spark jobs for faster data processing by tweaking spark configurations and other optimization techniques
  • Experience working with tools like Airflow for scheduling the jobs and ad hoc manual jobs
  • Experience working with project managers to understand and design the workflows architectures as per requirements and data scientists to assist on feature engineering
  • Experience in building scalable distributed data solutions using an EMR cluster environment with Amazon EMR
  • Experience in performing ETL operations and debugging and fixing issues like memory exceeds on EMR clusters
  • Developed python scripts for data ingestion process in Apache Spark using Py Spark
  • Experience in containerization technology like Docker for the runtime environment of the system to build, test & deploy
  • Experience using different AWS services like EC2, EMR, DynamoDB, Aurora, S3, IAM, CloudWatch, Glue
  • Worked on various POC’s to adopt new technologies like Apache Airflow, Snowflake and Terraform for infrastructure management
  • Experience with Continuous Integration process and Automated deployments using Git, Jenkins, Docker by developing scripts using python, bash scripting
  • Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
  • Hands-on experience with Big Data application phases like data ingestion, data analytics and data visualization
  • Analyzed the SQL scripts and designed the solution to implement using PySpark for faster processing of data
  • Experience on writing python code to schedule the jobs dynamically in Apache Airflow
  • Extensively worked on various data analysis and transformation BI tools like Dremio and Looker
  • Developed PySpark and Spark SQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed
  • Created Source to Target Mappings (STM) for the required tables by understanding the business requirements for the reports
  • Encoded and decoded Json objects using PySpark framework to create and modify the data frames in Apache Spark
  • Experience working in Agile Methodology by actively participating in grooming and planning sessions on sprint basis
  • Environment: AWS EMR, EC2, S3, Quick sight, Apache Spark, Airflow, Docker, PySpark, Spark SQL, Python, SQL, UNIX, Shell scripting, Dremio, Looker, Infogix, Alation.

Data Engineer

Cox Automotive
08.2018 - 03.2020
  • Translate business propositions into quantitative queries and collect or clean the necessary data
  • Experience Creating hive queries for extracting data and sending them to clients
  • Experience developing Scala programs to develop reports for Business users
  • Transformation and Analysis in Hive/Pig, Parsing the raw data using Map reduce and SPARK
  • Proactively involved in ongoing maintenance, support, and improvements in Hadoop cluster
  • Worked on capturing transactional changes in the data using MapReduce and HBase
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestions converting to Hadoop using MapReduce, Hive, Spark, Sqoop and Pig Latin
  • Worked with Sqoop import and export functionalities to handle large data set transfer between DB2 database and HDFS
  • Experience working with SQL to perform complex SQL queries
  • Supported multiple application extracts coming out of Big Data Platform
  • Followed Agile methodology during project delivery
  • Managed and distributed tasks, responsibilities between offshore and onsite team members
  • Design & Develop ETL workflow using Oozie which includes automating the extraction of data from different databases into HDFS using Sqoop scripts, Transformation and Analysis in Hive/Pig
  • Worked on parsing the raw data using Map reduce and ingesting data from different sources
  • Developed Map reduce programs and hive UDF for data extraction, data manipulation
  • Worked with different file formats and compression techniques in Hadoop
  • Performed data analytics in Hive and then exported these metrics back to Oracle Database using Sqoop
  • Performance tuning of Hive queries, Map reduce programs for different applications
  • Worked with big data developers, designers and data scientists in troubleshooting map reduce, hive jobs and tuned them to give high performance
  • Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements
  • Environment: Hadoop, CDH, Map Reduce, Hive, Pig, SQOOP, Kafka, Java, Spark, OOZIE, Python, UNIX, Shell scripting, JCL.

Data Engineer

T-Mobile
05.2017 - 08.2018
  • Analyze, design and build Modern data solutions using Azure PaS service to support visualization of data
  • Understand current Production state of application and determine the impact of new implementation on existing business processes
  • Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns
  • Extract Transform and Load data from sources Systems to Azure Data Storage services using a combination of Azure Data factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Data ingestion to one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks Experienced in creating and maintaining the functions, procedures, tasks, and views in Snowflake
  • Hands-on creating the data factory pipelines using the Azure data bricks notebooks and then loading the data into snowflake tables
  • Develop Spark applications using PySpark and Spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insight into customer usage patterns
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster Ability to apply the spark Data Frame API to complete Data manipulation within spark session Good understanding of Spark Architecture including spark core, spark SQL, Data Frame, Spark streaming, Driver Node, Worker Node, Stages, Executors and Tasks, Deployment modes, the Execution hierarchy, fault tolerance, and collection Experience analyzing data from Azure Data Storages using Databricks for deriving insights using Spark cluster capabilities
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
  • Experienced building the GitHub workflows for build and deploying the changes
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the SQL Activity and is experienced in publishing the data to the event hub for the downstream teams to consume it
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services
  • Performed logical and Physical Data Structure designs and DDL generation to facilitate the implementation of database tables and columns out to the DB2, Oracle SQL Server, Snowflake and Oracle DB Schema environment using Erwin Data Modeler Model Mart Repository version 9.6
  • Environment: Python 3.6, Scala 2.12, PySpark, Azure Services like Azure Blob storage, Cosmos DB, Azure Data Lake, Azure HDInsight, Azure Databricks, Azure SQL Data Warehouse, Hive, SQL, PowerBI, PyCharm, Microsoft Visual Code, Linux, Shell Scripting

Education

Master of Science -

Fairleigh Dickinson University
Teaneck, NJ

Skills

  • Data Curating
  • Data Programming
  • Data Operations
  • Data Repositories
  • RDBMS
  • SQL and Databases
  • Advanced Analytics
  • Technology Leadership Work Streams

Education

Electrical and Electronics Engineering

Certification

  • Azure Data Engineer Associate
  • Databricks Developer Foundations
  • Snow Pro core certification

Skills Summary

EMR, S3, IAM, Lambda, CloudWatch, ECR, ELB, EC2, Glue, SNS, Redshift, Azure Databricks, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Event Hubs, Azure App, Python, Scala, Java script, React, SQL, Shell, Bash, Cobol, MySQL, PostgreSQL, Oracle, HBase, Nettezza, Cassandra, Apache Spark, Spark SQL, PySpark, Scala, Dremio, Looker, Redshift, Snowflake, Tableau, MS Excel, Jenkins, Airflow, Kubernetes, Docker, Ansible, AWS Glue, Informatica, Hue, Jupyter, Infogix, Alation, GitHub, Bit Bucket, SVN, VSTS, Rally, Jira, Django, Flask, Spring boot, Infogix, Alation, Docker, Kubernetes, Jupiter, Amazon Web Services (AWS), Azure Data Factory, Terraform

Timeline

Sr Data Engineer

Avant
03.2020 - Current

Data Engineer

Cox Automotive
08.2018 - 03.2020

Data Engineer

T-Mobile
05.2017 - 08.2018

Master of Science -

Fairleigh Dickinson University
  • Azure Data Engineer Associate
  • Databricks Developer Foundations
  • Snow Pro core certification
NAVEEN CHAPPIDI