Summary
Overview
Work History
Education
Skills
Timeline
Generic

Srividya Mandalapu

Richardson,Texas

Summary

Over 6 years of technology career, invested in crafting elegant software at all layers of the technology stack. Accumulated a diverse background of languages, frameworks, design patterns and always ready to learn more. diverse experience of working with different technologies comprise of Azure cloud, Flink, Python, Spark, Scala, Spark Streaming, Big Data/Hadoop platform with good hands-on experience in end-end implementation and integration with multiple platforms. A Team Player, quick learner, and self-starter with effective communication, motivation and organizational skills combined with attention to details and business process improvements.

Overview

6
6
years of professional experience

Work History

Software Development Engineer II

Warner Brothers and discovery
Dallas, TX
06.2023 - Current
  • Designing, developing, and maintaining scalable data pipelines using SQL, Python, Apache Spark, and Databricks.
  • Implementing data quality checks to ensure data accuracy and integrity throughout the pipeline.
  • Optimizing and tuning data pipelines for performance and efficiency.
  • Deploying and managing data pipelines using tools such as Jenkins and Airflow.
  • Developing and maintaining data ingestion processes to seamlessly transfer data from various sources into cloud storage.
  • Monitoring pipeline performance and troubleshooting issues to ensure timely resolution and minimize disruptions.
  • Continuously improving data engineering processes and workflows through iteration and optimization.
  • Spearheaded optimization of ad campaign analytics, resulting in a 15% improvement in ROI.
  • Utilized statistical learning algorithms to analyze customer behavior and demographics for
    personalized ad targeting.
  • Integrated data from various channels to create comprehensive customer profiles for effective cross-channel advertising

Data Engineer

AT&T
Plano, TX
04.2021 - 03.2023
  • Maintenance and deployment of the end-to-end pipelines that feeds the ML models with the features and gets the score.
  • Worked with the Kafka consumers to load the data to hive tables.
  • Responsible for testing and deployment of the ML models.
  • Created various automated processes which are being run as daily batch jobs that are used in model scoring pipelines.
  • As a part of Azure migration, I worked on the on-prem processes to migrate to Azure data bricks and scheduled them.
  • Created and configured own cluster and maintained it in Azure.
  • Extract Transform and Load data from sources Systems to Azure Data Storage services using a combination of Azure Data factory, Spark SQL. Data ingestion to one or more Azure services (Azure Delta Lake, Azure Storage, Azure SQL) and processing the data in Azure Data bricks.
  • Developed PySpark scripts for doing multiple transformations and aggregations on the Dataset.
  • Data Ingestion implemented using Spark, loading data from various CSV, parquet, XML files.
  • Data cleansing, transformation tasks are handled using SPARK using Scala and Hive.
  • Used Spark Data Frame Operations to perform required Validations
  • Responsible for handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during Ingestion process itself.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Involved in creating Hive Tables, loading with data, and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Responsible for loading and transforming large sets of structured, semi structured, and unstructured data.
  • Implemented Flink aggregation on streaming data streams and check pointing on streaming services
  • Created docker containers to build, ship and run the images to deploy the applications, and worked on several docker components like Docker Engine, Docker-Hub, Docker-Compose.
  • Experience in Kubernetes to deploy scale, load balance and manage docker containers with multiple namespace versions and good understanding of open shift platform in managing docker containers and Kubernetes clusters.

Data Engineer

AT&T
Plano, TX
02.2020 - 04.2021
  • Responsible for creating reports in Palantir.
  • Developed Pyspark programs and created the data frames and worked on transformations.
  • Used Contour data analysis tool to access datasets, conduct common analytical and logical operations in sequence to explore data, debug data quality, cleanse and transform data, and create reports.
  • Scheduled jobs using monocle in Palantir for the smooth run of pipeline.
  • Used Data Lineage web application to see graphical representation of what transformations are done on a given dataset, including which datasets are used to create it and where that data goes once it has been transformed.
  • Created python UDF’s to extend the functionality of database.
  • Resolved missing fields in Data Frame rows using filtering and imputation.
  • Integrated visualizations into a excel using Databricks and popular visualization libraries (matplotlib, openpyxl, xlsxwriter).
  • Performed pre-processing on a dataset prior to training, including standardization, normalization.
  • Created pipelines to create a processing pipeline including transformations, estimations, evaluation of analytical models.
  • Worked on Sqoop incremental imports for ingesting data produced on daily basis by scheduling and monitoring the jobs using Autosys and Cron.
  • Developed performance tuning in spark program for different source systems domains and inserted into harmonized layer.
  • Application performance tuning to optimize resource and time utilization.
  • Design application flow and implement end to end from gathering requirements, Build Code, perform testing, deploying into production
  • Worked spark transformations on source files to load the data into in HDFS.
  • Used Jira as an agile tool to keep track of the stories that were worked on using the Agile methodology.

Big Data Developer

BBVA BANK
Birmingham, AL
09.2018 - 02.2020
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
  • Identified areas of improvement in existing business by unearthing insights by analyzing vast amount of data using machine learning techniques.
  • Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Hive and produce summary results from Hadoop to downstream systems.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in creating Hive tables, loading, and analyzing data using Hive Queries.
  • Load and transform large sets of structured, semi structured, and unstructured data
  • Tested raw data and executed performance scripts.

Education

Master of Science - Computer Science

Governors State University
Park Forest, IL
05.2018

Bachelor of Science - Computer Science

JNT University
INDIA
05.2014

Skills

  • Python
  • SQL/Hive
  • Spark
  • Microsoft Azure Cloud
  • Flink
  • kubernetes
  • Azure DataBricks
  • Java
  • Scala
  • Docker
  • Palantir
  • Airflow
  • Jenkins

Timeline

Software Development Engineer II

Warner Brothers and discovery
06.2023 - Current

Data Engineer

AT&T
04.2021 - 03.2023

Data Engineer

AT&T
02.2020 - 04.2021

Big Data Developer

BBVA BANK
09.2018 - 02.2020

Master of Science - Computer Science

Governors State University

Bachelor of Science - Computer Science

JNT University
Srividya Mandalapu