Summary
Overview
Work History
Education
Skills
Certification
Walmart Global Award for Consistent performance
Languages
Timeline
Generic

Preetham Kumar Madupuri

Chennai,TN

Summary

Seasoned Data Engineer with over 10 years of experience, starting as a Java developer and progressively expanding expertise into advanced areas such as machine learning, natural language processing (NLP), and state-of-the-art data engineering. Proven track record in designing and implementing large-scale data pipelines, integrating complex machine learning models, and building distributed systems. Adept at working with a wide range of technologies, including Apache Spark, Kafka, Python, and TensorFlow, to deliver end-to-end data solutions. Skilled at deploying and managing machine learning models in production environments, optimizing for performance, and ensuring data integrity. Strong foundation in Java-based applications with deep expertise in cloud platforms like AWS. A results-driven professional who combines a solid software engineering background with cutting-edge data science capabilities.

Overview

9
9
years of professional experience
1
1
Certification

Work History

Technology Lead

Encora Technologies
04.2022 - 05.2023
  • Built Machine Learning models using state-of-the art ML algorithms like Deep Neural Nets, SVM, Logistic regression etc.
  • Architected and implemented distributed data pipelines using Apache Spark for large-scale data processing, achieving improved scalability and reduced processing times for complex transformations.
  • Developed real-time data streaming solutions leveraging Apache Kafka, enabling low-latency ingestion and processing of high-throughput event streams.
  • Designed and deployed cloud-native data workflows on AWS using services like EMR, S3, Lambda, and RDS, enabling seamless integration and automated ETL processes.
  • Implemented fault-tolerant and highly available data pipelines leveraging distributed processing frameworks like Spark and Kafka Streams, ensuring data consistency and durability across the system.
  • Working experience with large scale machine learning environments in building and supporting which includes planning, designing, installing, configuring, performance tuning and monitoring.

Staff Software Engineer

Cloudera
06.2020 - 08.2021

• Built data pipelines using python to enable exploratory data analysis for NLP around speech-to-text from

Cloudera support calls.

• Significantly improved the performance of EMR jobs by 50% by implementing the latest developments around Spark SQL and EMR clusters

• Implemented hot-warm data architecture using AWS S3 and Apache Spark and this helped improve the

performance of call stats by ~70% and also reduced storage costs

• Ingested data from disparate data sources using a combination of Spark SQL, Google Analytics API, python to create data views to be used in BI tools like Tableau/Looker

• Designed and implemented batch and streaming pipelines with robust high availability to ensure product

uptime (delivered product uptime of ~99.999%)

Senior Software Engineer

Walmart Labs
04.2017 - 06.2020
  • Developed and maintained complex ETL workflows in Java and Scala, processing billions of records daily to support business intelligence and machine learning initiatives.
  • Integrated data pipelines with AWS services such as Redshift, Athena, and Glue, automating data processing and transformation for analytics and reporting needs.
  • Managed Kafka clusters for stream processing and real-time data ingestion, ensuring optimal performance, partitioning strategies, and consumer group management.
  • Improved system efficiency by tuning Spark jobs, partitioning strategies, and optimizing resource allocation, reducing compute costs by 20% while maintaining processing speed.
  • Collaborated with cross-functional teams including data scientists, analysts, and product managers to deliver scalable and reliable data solutions tailored to business requirements
  • Enhanced query performance by introducing partitioning & Bucketing in Hive.

Software Engineer

Praedicat
09.2014 - 04.2017

• Some highlights are: - Automated ETL processes, making it easier to wrangle data and reducing time by as much as 40% - Increased the efficiency of the data fetching by using query optimization and indexing

• Constructed a data pipeline that helps detect entities from scientific articles and modified the entity extraction

process to help prepare training data and thereby helping in building classifiers using state-of-the-art Natural Language Processing techniques such as NER.

• Built a Django based web application tool

• Used Spark to implement scalable Machine Learning topic algorithms on large datasets.

• Improved company nomination system between various entities of interest that nominates links between companies, chemicals and harms

• Detected harms from boxed warnings using a combination of semantics and natural language techniques.

Education

Master of Science - Computer Science

The University of Texas At Dallas
01-2014

Bachelor of Engineering - Computer Science

SSN College of Engineering
Chennai, TN
05-2012

Skills

  • Machine Learning
  • Big Data Analytics
  • API Development
  • Technical leadership
  • Java
  • Python
  • Scala
  • SQL
  • Amazon Web Services
  • Kubernetes
  • Docker

Certification

  • Apache Spark Databricks Certified Developer
  • Apache Hadoop Hortonworks Certified Developer

Walmart Global Award for Consistent performance

This award was presented in recognition of my consistent high performance within the team over several quarters.

Languages

English
Full Professional

Timeline

Technology Lead

Encora Technologies
04.2022 - 05.2023

Staff Software Engineer

Cloudera
06.2020 - 08.2021

Senior Software Engineer

Walmart Labs
04.2017 - 06.2020

Software Engineer

Praedicat
09.2014 - 04.2017

Master of Science - Computer Science

The University of Texas At Dallas

Bachelor of Engineering - Computer Science

SSN College of Engineering
Preetham Kumar Madupuri