Summary
Overview
Work History
Skills
Timeline
background-images

Sri

Carrollton,TX

Summary

Results-driven data engineer with 6+ years of experience in Gen AI, Python, Java, Scala, PySpark, and AWS Cloud technologies. Proven expertise in building scalable data pipelines, optimizing Spark jobs (achieving up to 25% performance improvement), and developing distributed systems using Scala and AKKA. Skilled in implementing Retrieval-Augmented Generation (RAG), fine-tuning large language models, and building intelligent chatbots with LangChain. Hands-on experience with Terraform-based infrastructure automation, CI/CD pipelines (Jenkins), and cloud-native solutions leveraging AWS services (Glue, Lambda, Step Functions, S3, EC2). Adept at cost optimization strategies, advanced data modeling, and workflow automation to drive efficiency and business value. Strong collaborator with cross-functional teams, delivering high-impact technical solutions aligned with organizational goals.

Overview

4
4
years of professional experience

Work History

AI/ML Data Engineer

T-Mobile
10.2024 - Current
  • Designed and implemented an advanced Retrieval-Augmented Generation (RAG) framework to predict and explain customer churn in the telecommunications sector.
  • Integrated diverse data sources — competitor pricing from market analysis, customer tenure, subscription plans, and demographic churn ratios — into the retrieval pipeline for churn reasoning.
  • Built hybrid semantic + keyword search pipelines using LangChain + FAISS, enabling retrieval from both structured datasets (SQL, customer DB) and unstructured sources (market reports, competitor pricing PDFs).
  • Developed scalable ETL workflows with PySpark + Airflow, orchestrated on AWS (S3, Glue, Step Functions, Lambda) to ingest, clean, and transform telecom datasets.
  • Applied advanced prompt engineering, reasoning chains, and context-aware query expansion to generate explainable churn insights with reduced hallucinations.
  • Implemented CI/CD pipelines using GitHub Actions + Jenkins for continuous integration and automated deployment of retrieval modules, ensuring fast iteration on model updates.
  • Containerized retrieval services with Docker + Kubernetes (EKS) for scalability and high availability, handling concurrent churn prediction queries in real-time.
  • Monitored system performance with Prometheus + Grafana, tracking retrieval accuracy, response latency, and churn model drift.
  • Delivered 25–30% improvement in interpretability and accuracy compared to baseline ML churn models.
  • Enabled data-driven retention campaigns by providing churn explanations directly consumable by marketing teams, boosting customer lifetime value (CLV).

Data Engineer

Bank of America
08.2022 - 09.2024
  • Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs.
  • Spearheaded Terraform-based infrastructure automation and maintenance, orchestrating efficient deployments using infrastructure as code (IAC) principles. Proficient in transforming manual processes into automated workflows and ensuring seamless integration of various AWS services— Glue, Lambda, Step functions, Data sync, Batch and Terraform to align project deliverables.
  • Create Pyspark frame to bring data from DB2 to Amazon S3.
  • Provide guidance to development team working on PySpark as ETL platform
  • Worked in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs,Scala.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods
  • Gained proficiency in Scala for developing Glue jobs. Executed effective cost reduction strategies
  • Wrote coding highly flexible, scalable, & distributed applications using Scala.
  • Successfully maintaining project costs within budget and timely reduction by implementing optimized scripts and retention rules.
  • Created and consuming RESTful Web services in Scala, Play using AKKA.
  • Optimization of Hive queries using best practices and right parameters and using technologies like Hadoop, YARN, Python, PySpark.
  • Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Worked on orchestrating the deployment of project in Cloud using Jenkins CI/CD tool, delivering the Spark project, proactively identifying and resolving issues ensuring the seamless onboard of clients.
  • Environment: Scala, PySpark, Spark SQL, Apache Spark (RDDs, Streaming), Hadoop (HDFS, Hive), Elasticsearch, Play Framework, Akka, Airflow, Jenkins CI/CD, SQL, JSON/Parquet/ORC, Linux, Core Framework
  • Monitored database performance and resolved issues to ensure reliability and stability of data systems.

TTS

10.2021 - 02.2022
  • Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using PySpark.
  • Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
  • Implementing a search microservice (Scala, REST, PlayFramework, ElasticSearch)
  • Designed a distributed system using Scala and the AKKA Actor Model that runs on multi-core machines. The server and
  • Leveraged Data engineering skills working with Core framework, Airflow and spark jobs. Optimized spark job performance through advanced Scala techniques and performance tuning, delivering significant
  • Improvements in processing time and resource usage. Collaborated cross-functionally with product and business teams to deliver high-impact technical solutions aligned with business goals.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, PySpark and Scala.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark. Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment implemented in Scala.
  • Created various Parser programs to extract data from Business Objects, XML, Informatica, Java, and database views using Scala
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Environment: PySpark, Scala, Python, Hadoop (MapReduce, HDFS, Hive, Pig, Sqoop, Flume), Spark Streaming, Kafka, Cassandra, MongoDB, HBase, Impala, MLlib, AWS (Glue, Lambda, Step Functions, DataSync, Batch, S3), Kubernetes, Terraform, RESTful APIs, Play Framework, Akka, Linux/Windows

Skills

  • Programming Languages: Java, Python, Scala, SQL
  • Cloud Platforms & Services: Amazon Web Services (Glue, Lambda, IAM, DataSync, Batch, Step Functions, S3, EC2)
  • Big Data & Distributed Systems: Apache Spark, Hadoop (MapReduce, HDFS, Hive, Pig, Sqoop, Flume), Kafka, Spark Streaming, Impala, Cassandra, MongoDB, HBase
  • Infrastructure & DevOps: Terraform, Jenkins, Kubernetes, Maven, Git, Apache Airflow
  • Frameworks & Tools: Play Framework, Akka, LangChain, RESTful APIs, Elasticsearch
  • Data Engineering & Machine Learning: PySpark, Spark MLlib, Data Quality & ETL Frameworks, Retrieval-Augmented Generation (RAG), LLM Fine-tuning
  • Coursework / Academic Foundations: Data Structures & Algorithms, Database Management Systems (DBMS), Machine Learning

Timeline

AI/ML Data Engineer

T-Mobile
10.2024 - Current

Data Engineer

Bank of America
08.2022 - 09.2024

TTS

10.2021 - 02.2022
Sri