Expertise in developing scalable data pipelines and leveraging AWS services such as Redshift, S3, Glue, and Lambda. Proficient in implementing real-time data processing solutions with Apache Spark and Kubernetes, ensuring optimal performance and reliability,Possess strong skills in data quality enhancement and dashboard design, utilizing tools like Informatica and Looker to drive actionable insights.
Overview
12
12
years of professional experience
Work History
Senior AWS Data Engineer
Equifax
11.2022 - Current
Developed automated shell scripts for data pipeline orchestration on UNIX, streamlining ETL processes for faster data transformation from DB2 and S3 sources to Redshift
Worked on building the aggregate tables & de-normalized tables, populating the data using ETL to improve the looker analytical dashboard performance and to help data scientist and analysts to speed up the ML model training & analysis
Utilized Kubernetes to efficiently deploy and manage containerized applications, enhancing resource allocation and operational scalability for ETL pipeline
Developed and maintained data warehouse using Snowflake to enable ad-hoc querying and reporting for business users
Used Terraform to automate deployment of data pipelines involving EMR clusters, S3, and Redshift, ensuring high availability and fault tolerance
Optimized Apache Iceberg tables using AWS Glue optimizers like Compaction, Snapshot retention and Orhpan file deletion
Designed and built Spark RDD transformations in Scala to process structured and semi-structured data, improving pipeline performance for machine learning models
Troubleshot issues like Crawler errors when crawler is using Lake Formation permissions, Ray errors and AWS Glue machine learning exceptions
Automated data analysis and reporting tasks using Django to aggregate and visualize data from diverse sources, providing insights to stakeholders
Performed performance tuning on Tableau workbooks and dashboards by optimizing query execution, managing extract refresh schedules, and reducing dashboard load times
Set up real-time data processing with Spark streaming, which handled clickstream data from Kinesis and stored it in S3 and Redshift
Developed Splunk dashboards, reports, and alerts to monitor health and performance of data pipelines, enabling real-time tracking of ETL workflows and immediate detection of issues across Redshift, S3, and Lambda
Provided technical leadership and delivered innovative products and services to address customer specific requirements.
AWS Data Engineer
LSEG
09.2019 - 11.2022
Promoted customer success in building and migrating applications, software, and services on to AWS Cloud platform using IAM, S3, AWS EMR, Lambda, SNS, Data Lakes, Glue, Athena, and Redshift
Built Redshift clusters on Amazon for rapid access to reporting requirements
Converted Ab-initio Jobs into Spark Applications using Scala and used AWS EC2 machines to run them in Cloud Infrastructure
Leveraged AWS Lambda to trigger Java functions in response to cloud events, automating data workflows and reducing manual intervention in data processing pipelines
Designed and managed Kubernetes-based environments for deploying containerized ETL workflows and real-time data processing applications, ensuring scalability and reliability
Utilized Hudi's copy-on-write and merge-on-read storage models to optimize for read-heavy and write-heavy data processing scenarios
Developed and optimized ETL pipelines using PySpark within AWS Glue for data transformation, aggregation, and loading into Redshift, resulting in efficient large-scale data migration.
Collaborated on data governance and security measures, including access control and encryption across AWS resources using IAM and CloudWatch
Integrated Tableau with different data sources like SQL databases, Excel files, Salesforce, SAP, and cloud-based platforms such as Amazon Redshift, Snowflake
Worked on Lambda functions that returns the data from incoming events and then stores the result in Amazon DynamoDB
Involved in event enrichment, data aggregation, de-normalization and data preparation needed for downstream model learning and reporting
Designed and developed transactions and persistence layers to save/retrieve/modify data for application functionalities using Django
AWS Data Engineer
Ascena
06.2018 - 09.2019
Designed and implemented end-to-end data migration strategies from legacy systems to S3 and Redshift, ensuring minimal downtime and seamless transition during the data warehouse modernization
Implemented Glue Crawlers to automate metadata management, ensuring accurate data discovery and cataloging through Glue Data Catalog for easier access
Automated ETL workflows using Lambda and Step Functions, improving the processing of sales, inventory, and customer data, significantly reducing manual intervention
Developed Athena queries for customer segmentation, product performance, and campaign analysis, reducing reliance on complex ETL processes for business teams
Integrated Lambda with CloudWatch Events for automated monitoring, improving operational response times and ensuring real-time notifications for retail systems
Designed sales and inventory trend analysis dashboards in Tableau, empowering retail teams to monitor and optimize performance
Streamlined data pipelines by implementing auto-scaling in EMR clusters, optimizing compute costs during off-peak hours while maintaining performance during heavy loads
Hadoop Developer/Spark
Cloud, Elegant Embedded Solution PVT LTD
02.2015 - 08.2016
Loaded data from different sources such as HDFS or HBase into Spark RDD and implemented in-memory data computation to generate output response
Collected metrics about data pipelines, stored indexes in Elasticsearch & and created Kibana dashboards which facilitated us in quickly identifying data loss & and anomalies
Automated data transfer from MySQL to HDFS using Sqoop, and optimized Hive queries for efficient data extraction and reporting
Managed and processed large-scale data sets using Hadoop tools including HDFS, YARN, and MapReduce, ensuring optimal performance and scalability
Designed and deployed AWS Glue-based ETL pipelines to streamline data migration and processing, optimizing resource utilization and reducing operational costs
Implemented data anonymization techniques for PII data within the Data Lake, ensuring compliance with privacy regulations like GDPR
Streamlined log management using AWS CloudWatch Logs and Lambda, enabling automated log analysis and real-time performance monitoring
Developed and optimized NIFI workflows for efficient data integration and processing across Kafka, HDFS, and AWS S3
Hadoop Developer
RamInfo
09.2012 - 02.2015
Built scalable distributed data solution on a 30-node Hadoop cluster in AWS Cloud, processing 25+ Terabytes of data for deep analytics on customer behavior
Developed MapReduce and Spark programs to analyze and transform data, uncovering insights into customer usage patterns
Implemented custom scripts in Python and Scala to automate the data ingestion process from various sources into HDFS, significantly reducing the manual overhead and improving data pipeline reliability
Contributed to setting up and managing Hadoop ecosystem components, including HDFS, YARN, Hive, Pig, and HBase, ensuring smooth operations across the cluster
Configured periodic incremental imports from DB2 to HDFS using Sqoop to maintain up-to-date data in the Hadoop ecosystem
Monitored AWS services using AWS CloudWatch to ensure operational efficiency and troubleshoot issues across the cloud infrastructure
Education
Master of Science - Information Systems Technologies
Wilmington University
New Castle, DE
05-2018
Bachelor of Science - Electronics And Communications Engineering
JNTUH
Hyderabad
06-2010
Skills
Data security practices
Performance optimization
Python programming
NoSQL databases
API development
Timeline
Senior AWS Data Engineer
Equifax
11.2022 - Current
AWS Data Engineer
LSEG
09.2019 - 11.2022
AWS Data Engineer
Ascena
06.2018 - 09.2019
Hadoop Developer/Spark
Cloud, Elegant Embedded Solution PVT LTD
02.2015 - 08.2016
Hadoop Developer
RamInfo
09.2012 - 02.2015
Master of Science - Information Systems Technologies
Wilmington University
Bachelor of Science - Electronics And Communications Engineering
JNTUH
Similar Profiles
Eswar VallepuEswar Vallepu
DevOps Engineer at EquifaxDevOps Engineer at Equifax