Summary

Overview

Work History

Education

Skills

Timeline

AJAY KRISHNA DOPPALAPUDI

Cincinnati,OH

Summary

Experienced professional with over 5 years of expertise in developing data-intensive applications leveraging the Hadoop Ecosystem, Big Data, Cloud Data Engineering, Data Warehousing, and Data Visualization. Skilled in end-to-end solution implementation on various Big Data platforms including Cloudera and Hortonworks. Proficient in Hadoop and its ecosystem tools such as MapReduce, Pig, Spark, Hive, Sqoop, Flume, HBase, Cassandra, MongoDB, Kafka, Zookeeper, and Oozie for ETL operations and Big Data analysis. Proficiency in Scala and Apache Spark, along with data analysis using SQL, Hive, and Spark SQL. Extensive use of Python libraries like NumPy, Pandas, PySpark, Matplotlib, and Scikit-learn. Hands-on experience in developing data pipelines using AWS services and IICS Data Integration. I am skilled in real-time data streaming and pipeline creation using Kafka and Spark, with knowledge of Google Cloud Platform. Expertise in SQL, database design, and migration to Azure Data services. Experience in Impala views, Erwin data modeling, and Delta Lake. Proficient in Shell scripting, MapReduce jobs, Hive analytics, and Tableau for data visualization. Capable in Python scripting for data manipulation and statistical analysis. Familiarity with Kubernetes and Docker for CI/CD systems and experience with ETL tools like Informatica, DataStage, and Snowflake. Well-versed in database normalization and denormalization techniques for optimal performance.

Overview

years of professional experience

Work History

Data Engineer

JP Morgan Chase

Plano, TX

09.2023 - Current

Participated in all phases of software development, including requirements gathering and business analysis
Devised PL/SQL Stored Procedures, Functions, Triggers, Views, and packages
Designed data models for AWS Lambda applications and analytical reports
Built a full-service catalog system using Elasticsearch, Logstash, Kibana, Kinesis, and CloudWatch with the effective use of MapReduce
Utilized Indexing, Aggregation, and Materialized views to optimize query performance
Implemented Python and Scala code for data processing and analytics leveraging inbuilt libraries
Utilized various Spark Transformations, including mapToPair, filter, flat Map, groupByKey, sortByKey, join, cogroup, union, repetition, coalesce, distinct, intersection, map Partitions with Index, and Actions for cleansing input data
Developed PySpark code used to compare data between HDFS and S3
Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) utilizing AWS Lambda to create functions in Python for specific events based on use cases
Created scripts to read CSV, JSON, and parquet files from S3 buckets using Python, executed SQL operations, and loaded data into AWS S3, DynamoDB, and Snowflake, utilizing AWS Glue with the crawler
Designed the Staging and Operational Data Storage (ODS) environment for the enterprise data warehouse (Snowflake), including Dimension and fact table design following Kimball's Star Schema approach
Unit tested data between Redshift and Snowflake
Implemented scalable data storage solutions optimized for handling security-related datasets in a SaaS product context
Implemented DBT workflows to optimize data modeling, enhance pipeline performance, and seamlessly integrate DBT into the data engineering workflow through cross-functional collaboration
Utilized Unity Catalog in Azure Databricks to optimize and manage Delta Lake for streamlined data operations
Employed bash shell scripts and UNIX utilities for data processing and automation tasks
Utilize Data Science algorithms and ML ops techniques to optimize data processing and analysis
Developed predictive models and converted SAS programs into Python to enhance efficiency and scalability
Developed and implemented advanced algorithms in MATLAB to analyze and visualize complex data sets, ensuring efficient data processing
Reviewed system specifications related to DataStage ETL and developed functions in AWS Lambda for event-driven processing
Developed and optimized scripts T-SQL for ETL processes, collaborated with teams to maintain databases, resolved issues promptly, and stayed updated with latest T-SQL advancements
Drafted reports using Tableau Desktop to extract data for analysis using filters based on business use case

Data Engineer

Walmart Global Tech India

01.2020 - 07.2022

Utilized AWS services to architect, analyze, and develop enterprise data warehouse and business intelligence solutions, ensuring optimal architecture, scalability, flexibility, availability, and performance for better decision-making
Developed Scala scripts and User Defined Functions (UDFs) using data frames/SQL and Resilient Distributed Datasets (RDD) in Spark for data aggregation, querying, and writing back into the S3 bucket
Executed data cleansing and data mining operations
Programmed, compiled, and executed programs using Apache Spark in Scala for ETL jobs with ingested data
Crafted Spark application programs for data validation, cleansing, transformation, and custom aggregation, employing Spark engine and Spark SQL for data analysis, provided to data scientists for further analysis
Automated ingestion processes using Python and Scala, pulling data from various sources such as API, AWS S3, Teradata, and Snowflake
Designed and developed Spark workflows using Scala for data extraction from AWS S3 bucket and Snowflake, applying transformations
Designed and implemented ETL pipelines between various Relational Databases and Data Warehouse using Apache Airflow
Implemented continuous integration and continuous deployment (CI/CD) pipelines, integrating data engineering workflows seamlessly into DevOps practices for efficient software delivery
Developed Custom ETL Solutions and Real-Time data ingestion pipelines to move data in and out of Hadoop uses Python and shell Script
Utilized GCP Dataproc, GCS, Cloud Functions, and BigQuery for data processing
Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark
Implemented Spark RDD transformations to map business analysis and applied actions on top of transformations
Installed and configured Apache Airflow, automating resulting scripts to ensure daily execution in production
Created Directed Acyclic Graphs (DAG) utilizing Email Operator, Bash Operator, and Spark Livy Operator for execution in EC2
Developed scripts to read CSV, JSON, and parquet files from S3 buckets in Python and load them into AWS S3, DynamoDB, and Snowflake
Ingested real-time data streams to the Spark streaming platform, saving data in HDFS and HIVE through GCP
Implemented AWS Lambda functions to execute scripts in response to events in Amazon DynamoDB table or S3 bucket or HTTP requests using Amazon API Gateway
Worked on Snowflake Schemas and Data Warehousing, processing batch, and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket
Profiled structured, unstructured, and semi-structured data across various sources to identify patterns and implement data quality metrics using necessary queries or Python scripts based on the source
Demonstrated proficiency in the Microsoft Suite (PowerPoint, Excel, etc.) to efficiently create presentations and streamline data analysis processes

Big Data Developer

Abbott Laboratories India

01.2018 - 12.2019

Experience in working multiple projects and teams involved in analytics and cloud platforms
Experience in building scalable data pipelines in Azure cloud platform using different tools
Developed multiple optimized PySpark applications using Azure Databricks
Developed data pipelines using Azure Data Factory that process cosmos activity
Implemented reporting stats on top of real time data using Power BI
Developed ETL solutions using SSIS, Azure Data Factory and Azure Data Bricks
Expert in working continuous integration and deployment using Jenkins
Developed real time ingestion data pipelines from Event Hub into different tools
Experience in building ETL solutions using Hive and Spark with Python and Scala
Expert in working on optimizing applications built using tools like Spark and Hive
Developed job automations from different clusters using Airflow Scheduler
Worked on Talend integration with prem cluster and Azure Cloud SQL for data migrations
Developed code coverage and test cases integrations using sonar and Mockito
Developed lambda functions on AWS to automate the process of daily manual adhocs
Deployed Stream sets an application using a Docker container on the application platform
Worked on streaming tools like Stream set to stream live data into HDFS
Worked on automation tools like Oozie to automate the Hive and spark workflows

Education

Master of Science - Information Technology

University of Cincinnati

Cincinnati, OH

Skills

Programming languages: Python, R, C, C#, SQL, Scala, SAS, Java, JavaScript, MATLAB, and HTML5
Database Management Systems (DBMS): MySQL, PostgreSQL, Oracle, SQL Server, T-SQL, MongoDB, RDS, Cassandra
Data Warehousing: Amazon Redshift, Google BigQuery, Snowflake, Apache Hive, Teradata
ETL Tools: Apache Spark, Apache Airflow, Talend, Informatica, Apache NiFi, Data Build Tool (DBT)
Big Data Technologies: Apache Hadoop, Apache Kafka, Apache HBase, Apache Flink, Apache Storm, EMR, Kinesis
Cloud Platforms: AWS, Azure, GCP, including services like Amazon S3 and Azure Data Lake Storage

Version Control Systems: Git, GitHub
Data Visualization: Tableau, Power BI, Qlik, Alteryx, and Cognos
Machine Learning/AI: TensorFlow, PyTorch; Operating Systems: Linux/Unix, Windows, macOS; Containerization and Orchestration: Docker, Kubernetes
Tools and IDEs: Git, IntelliJ, Visual Studio Code, Jupyter Notebook, and PyCharm
Hadoop Ecosystem: HDFS, YARN, MapReduce
Monitoring and Logging: Prometheus, Grafana, ELK Stack

Timeline

Data Engineer

JP Morgan Chase

09.2023 - Current

Data Engineer

Walmart Global Tech India

01.2020 - 07.2022

Big Data Developer

Abbott Laboratories India

01.2018 - 12.2019

Master of Science - Information Technology

University of Cincinnati

AJAY KRISHNA DOPPALAPUDI

Summary

Overview

Work History

Data Engineer

Data Engineer

Big Data Developer

Education

Master of Science - Information Technology

Skills

Timeline

Data Engineer

Data Engineer

Big Data Developer

Master of Science - Information Technology

Similar Profiles

Lisa Olivia GrenviczLisa Olivia Grenvicz

Taylor SimsTaylor Sims

Miranda GuerreroMiranda Guerrero

Mahesh TelgadiMahesh Telgadi

Lance GillLance Gill