Summary
Overview
Work History
Education
Skills
Timeline
Generic

AJAY KRISHNA DOPPALAPUDI

Cincinnati,OH

Summary

Experienced professional with over 5 years of expertise in developing data-intensive applications leveraging the Hadoop Ecosystem, Big Data, Cloud Data Engineering, Data Warehousing, and Data Visualization. Skilled in end-to-end solution implementation on various Big Data platforms including Cloudera and Hortonworks. Proficient in Hadoop and its ecosystem tools such as MapReduce, Pig, Spark, Hive, Sqoop, Flume, HBase, Cassandra, MongoDB, Kafka, Zookeeper, and Oozie for ETL operations and Big Data analysis. Proficiency in Scala and Apache Spark, along with data analysis using SQL, Hive, and Spark SQL. Extensive use of Python libraries like NumPy, Pandas, PySpark, Matplotlib, and Scikit-learn. Hands-on experience in developing data pipelines using AWS services and IICS Data Integration. I am skilled in real-time data streaming and pipeline creation using Kafka and Spark, with knowledge of Google Cloud Platform. Expertise in SQL, database design, and migration to Azure Data services. Experience in Impala views, Erwin data modeling, and Delta Lake. Proficient in Shell scripting, MapReduce jobs, Hive analytics, and Tableau for data visualization. Capable in Python scripting for data manipulation and statistical analysis. Familiarity with Kubernetes and Docker for CI/CD systems and experience with ETL tools like Informatica, DataStage, and Snowflake. Well-versed in database normalization and denormalization techniques for optimal performance.

Overview

7
7
years of professional experience

Work History

Data Engineer

JP Morgan Chase
Plano, TX
09.2023 - Current
  • Participated in all phases of software development, including requirements gathering and business analysis
  • Devised PL/SQL Stored Procedures, Functions, Triggers, Views, and packages
  • Designed data models for AWS Lambda applications and analytical reports
  • Built a full-service catalog system using Elasticsearch, Logstash, Kibana, Kinesis, and CloudWatch with the effective use of MapReduce
  • Utilized Indexing, Aggregation, and Materialized views to optimize query performance
  • Implemented Python and Scala code for data processing and analytics leveraging inbuilt libraries
  • Utilized various Spark Transformations, including mapToPair, filter, flat Map, groupByKey, sortByKey, join, cogroup, union, repetition, coalesce, distinct, intersection, map Partitions with Index, and Actions for cleansing input data
  • Developed PySpark code used to compare data between HDFS and S3
  • Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) utilizing AWS Lambda to create functions in Python for specific events based on use cases
  • Created scripts to read CSV, JSON, and parquet files from S3 buckets using Python, executed SQL operations, and loaded data into AWS S3, DynamoDB, and Snowflake, utilizing AWS Glue with the crawler
  • Designed the Staging and Operational Data Storage (ODS) environment for the enterprise data warehouse (Snowflake), including Dimension and fact table design following Kimball's Star Schema approach
  • Unit tested data between Redshift and Snowflake
  • Implemented scalable data storage solutions optimized for handling security-related datasets in a SaaS product context
  • Implemented DBT workflows to optimize data modeling, enhance pipeline performance, and seamlessly integrate DBT into the data engineering workflow through cross-functional collaboration
  • Utilized Unity Catalog in Azure Databricks to optimize and manage Delta Lake for streamlined data operations
  • Employed bash shell scripts and UNIX utilities for data processing and automation tasks
  • Utilize Data Science algorithms and ML ops techniques to optimize data processing and analysis
  • Developed predictive models and converted SAS programs into Python to enhance efficiency and scalability
  • Developed and implemented advanced algorithms in MATLAB to analyze and visualize complex data sets, ensuring efficient data processing
  • Reviewed system specifications related to DataStage ETL and developed functions in AWS Lambda for event-driven processing
  • Developed and optimized scripts T-SQL for ETL processes, collaborated with teams to maintain databases, resolved issues promptly, and stayed updated with latest T-SQL advancements
  • Drafted reports using Tableau Desktop to extract data for analysis using filters based on business use case

Data Engineer

Walmart Global Tech India
01.2020 - 07.2022
  • Utilized AWS services to architect, analyze, and develop enterprise data warehouse and business intelligence solutions, ensuring optimal architecture, scalability, flexibility, availability, and performance for better decision-making
  • Developed Scala scripts and User Defined Functions (UDFs) using data frames/SQL and Resilient Distributed Datasets (RDD) in Spark for data aggregation, querying, and writing back into the S3 bucket
  • Executed data cleansing and data mining operations
  • Programmed, compiled, and executed programs using Apache Spark in Scala for ETL jobs with ingested data
  • Crafted Spark application programs for data validation, cleansing, transformation, and custom aggregation, employing Spark engine and Spark SQL for data analysis, provided to data scientists for further analysis
  • Automated ingestion processes using Python and Scala, pulling data from various sources such as API, AWS S3, Teradata, and Snowflake
  • Designed and developed Spark workflows using Scala for data extraction from AWS S3 bucket and Snowflake, applying transformations
  • Designed and implemented ETL pipelines between various Relational Databases and Data Warehouse using Apache Airflow
  • Implemented continuous integration and continuous deployment (CI/CD) pipelines, integrating data engineering workflows seamlessly into DevOps practices for efficient software delivery
  • Developed Custom ETL Solutions and Real-Time data ingestion pipelines to move data in and out of Hadoop uses Python and shell Script
  • Utilized GCP Dataproc, GCS, Cloud Functions, and BigQuery for data processing
  • Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark
  • Implemented Spark RDD transformations to map business analysis and applied actions on top of transformations
  • Installed and configured Apache Airflow, automating resulting scripts to ensure daily execution in production
  • Created Directed Acyclic Graphs (DAG) utilizing Email Operator, Bash Operator, and Spark Livy Operator for execution in EC2
  • Developed scripts to read CSV, JSON, and parquet files from S3 buckets in Python and load them into AWS S3, DynamoDB, and Snowflake
  • Ingested real-time data streams to the Spark streaming platform, saving data in HDFS and HIVE through GCP
  • Implemented AWS Lambda functions to execute scripts in response to events in Amazon DynamoDB table or S3 bucket or HTTP requests using Amazon API Gateway
  • Worked on Snowflake Schemas and Data Warehousing, processing batch, and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket
  • Profiled structured, unstructured, and semi-structured data across various sources to identify patterns and implement data quality metrics using necessary queries or Python scripts based on the source
  • Demonstrated proficiency in the Microsoft Suite (PowerPoint, Excel, etc.) to efficiently create presentations and streamline data analysis processes

Big Data Developer

Abbott Laboratories India
01.2018 - 12.2019
  • Experience in working multiple projects and teams involved in analytics and cloud platforms
  • Experience in building scalable data pipelines in Azure cloud platform using different tools
  • Developed multiple optimized PySpark applications using Azure Databricks
  • Developed data pipelines using Azure Data Factory that process cosmos activity
  • Implemented reporting stats on top of real time data using Power BI
  • Developed ETL solutions using SSIS, Azure Data Factory and Azure Data Bricks
  • Expert in working continuous integration and deployment using Jenkins
  • Developed real time ingestion data pipelines from Event Hub into different tools
  • Experience in building ETL solutions using Hive and Spark with Python and Scala
  • Expert in working on optimizing applications built using tools like Spark and Hive
  • Developed job automations from different clusters using Airflow Scheduler
  • Worked on Talend integration with prem cluster and Azure Cloud SQL for data migrations
  • Developed code coverage and test cases integrations using sonar and Mockito
  • Developed lambda functions on AWS to automate the process of daily manual adhocs
  • Deployed Stream sets an application using a Docker container on the application platform
  • Worked on streaming tools like Stream set to stream live data into HDFS
  • Worked on automation tools like Oozie to automate the Hive and spark workflows

Education

Master of Science - Information Technology

University of Cincinnati
Cincinnati, OH

Skills

  • Programming languages: Python, R, C, C#, SQL, Scala, SAS, Java, JavaScript, MATLAB, and HTML5
  • Database Management Systems (DBMS): MySQL, PostgreSQL, Oracle, SQL Server, T-SQL, MongoDB, RDS, Cassandra
  • Data Warehousing: Amazon Redshift, Google BigQuery, Snowflake, Apache Hive, Teradata
  • ETL Tools: Apache Spark, Apache Airflow, Talend, Informatica, Apache NiFi, Data Build Tool (DBT)
  • Big Data Technologies: Apache Hadoop, Apache Kafka, Apache HBase, Apache Flink, Apache Storm, EMR, Kinesis
  • Cloud Platforms: AWS, Azure, GCP, including services like Amazon S3 and Azure Data Lake Storage
  • Version Control Systems: Git, GitHub
  • Data Visualization: Tableau, Power BI, Qlik, Alteryx, and Cognos
  • Machine Learning/AI: TensorFlow, PyTorch; Operating Systems: Linux/Unix, Windows, macOS; Containerization and Orchestration: Docker, Kubernetes
  • Tools and IDEs: Git, IntelliJ, Visual Studio Code, Jupyter Notebook, and PyCharm
  • Hadoop Ecosystem: HDFS, YARN, MapReduce
  • Monitoring and Logging: Prometheus, Grafana, ELK Stack

Timeline

Data Engineer

JP Morgan Chase
09.2023 - Current

Data Engineer

Walmart Global Tech India
01.2020 - 07.2022

Big Data Developer

Abbott Laboratories India
01.2018 - 12.2019

Master of Science - Information Technology

University of Cincinnati
AJAY KRISHNA DOPPALAPUDI