Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

APURVA ASHOK SARODE

Akron,USA

Summary

Over 5 years of IT experience in Data Engineering and Software Development across various business domains. Strong expertise in building ETL pipelines for batch and streaming data using Scala-Spark, PySpark, and SparkSQL. Extensive experience with AWS and Azure cloud technologies, including EC2, EMR, S3, Lambda, Athena, Redshift, ADLS, Azure Databricks, ADF, and Blob Storage. Skilled in Data Engineering technologies like Hadoop 2, Spark, and Elastic MapReduce. Innovative data scientist with a robust background in machine learning, statistical analysis, and predictive modeling. Skilled in translating complex datasets into actionable insights that drive decision-making and business strategy improvements. Demonstrates strong problem-solving abilities and mastery of Python, R, SQL, and data visualization tools. Previous work has led to significant enhancements in operational efficiency and revenue growth through data-driven strategies.

Overview

7
7
years of professional experience
1
1
Certification

Work History

Data Engineer

NeueHealth
Minneapolis, USA
12.2024 - Current
  • Design, develop, and maintain scalable and robust data pipelines using Scala and Apache Spark
  • Implement ETL (Extract, Transform, Load) processes for ingesting, transforming, and storing large datasets
  • Integrate data from multiple sources into a unified, accessible format for business and analytical use
  • Use Spark for distributed data processing, including batch and real-time streaming applications
  • Monitor and troubleshoot data pipelines and Spark jobs to ensure high performance and reliability
  • Develop and deploy solutions on Azure Cloud Services, leveraging tools like Azure Data Lake Storage, Azure Databricks, Azure Synapse Analytics, and Blob Storage
  • Work closely with data scientists, analysts, and product teams to deliver data solutions tailored to business needs

Student Data Scientist

Cleveland State University
Cleveland, USA
01.2023 - 05.2024
  • Designed, developed, and maintained scalable data pipelines using modern data engineering tools and frameworks (e.g., Apache Spark, Apache Kafka, AWS Glue) to support business intelligence, analytics, and data science initiatives.
  • Trained AI models with labeled datasets using frameworks like TensorFlow, PyTorch, and Keras.
  • Developed and deployed machine learning models using Python frameworks like TensorFlow and PyTorch.
  • Integrated pre-trained ML and AI models into web applications using frameworks such as TensorFlow or PyTorch.

Data Science Associate

TheMathCompany
Bangalore, India
11.2021 - 07.2022
  • Company Overview: Client: The Home Depot.
  • Handled both structured and unstructured data, performing cleaning, descriptive analysis, and dataset preparation.
  • Set up storage and data analysis tools within the Amazon Web Services (AWS) cloud infrastructure.
  • Developed and maintained large-scale data pipelines using Apache Spark, Hadoop, and Kafka to process and analyze terabytes of structured and unstructured data efficiently.
  • Implemented supervised and unsupervised learning algorithms such as random forest, KNN, SVM, logistic regression in Python and R environment for predictive analytics projects.
  • Tested, validated and reformulated models to foster accurate prediction of outcomes.
  • Performed exploratory analysis on large datasets using SQL and Python libraries such as Pandas, NumPy, Scikit-Learn, and Matplotlib.
  • Applied feature selection algorithms to predict potential outcomes.
  • Developed predictive models using machine learning, natural language and statistical analysis methods.
  • Developed and maintained predictive models to identify customer segments for targeted marketing campaigns.

Data Analytics Developer

Clairvoyant.ai
Hyderabad, India
08.2020 - 11.2021
  • Company Overview: Client: PayPal, GSG.
  • Constructed a data pipeline to process semi-structured data using Apache Spark, integrating 100 million raw records from 14 different data sources.
  • Developed and deployed data ingestion frameworks using Apache Kafka Connect and Apache Flume, automating data integration from various sources into the data lake.
  • Used statistical software to analyze and process large data sets.
  • Assessed current business and technological resources to improve project plans.
  • Developed and tested deep learning models for natural language processing applications.
  • Built custom tools for managing large datasets used in training models.
  • Researched state-of-the-art architectures and algorithms used in neural networks and convolutional networks.

Associate Software Engineer (Big Data)

Circana LLC
Pune, India
08.2018 - 01.2020
  • Collaborated with a team of four to establish a cloud-first data ingestion system, utilizing Azure, Apache Spark, and Kafka to ingest data from diverse sources, enhancing data processing speed by 74%
  • Integrated data from internal and external sources to ensure seamless data flow and enable comprehensive data analysis
  • Designed SQL tables with referential integrity and developed advanced queries using stored procedures and functions in SQL Server Management Studio
  • Environment: Hadoop, Spark, NoSQL, Oracle, SQL Server, Azure, Apache Spark, Kafka, Spark, Hive

Education

Master of Science - Computer Science

Cleveland State University
Cleveland, OH
05.2024

Bachelor of Engineering - Computer Science

Pune University
India
05.2017

Skills

  • Python
  • Java
  • Scala
  • SQL
  • TensorFlow
  • AWS Glue
  • Apache Airflow
  • EC2
  • EMR
  • S3
  • Lambda
  • Athena
  • Redshift
  • Oracle
  • MS SQL Server
  • Keras
  • PyTorch
  • Scikit-learn
  • MySQL
  • AWS
  • Azure
  • GCP
  • REST API
  • Apache Spark
  • Apache Kafka
  • Hadoop
  • Data pipeline design
  • ETL development
  • Machine learning

Certification

  • AWS Certified Data Engineer – Associate (DEA-C01)
  • Azure for Data Engineers Certified

Timeline

Data Engineer

NeueHealth
12.2024 - Current

Student Data Scientist

Cleveland State University
01.2023 - 05.2024

Data Science Associate

TheMathCompany
11.2021 - 07.2022

Data Analytics Developer

Clairvoyant.ai
08.2020 - 11.2021

Associate Software Engineer (Big Data)

Circana LLC
08.2018 - 01.2020

Master of Science - Computer Science

Cleveland State University

Bachelor of Engineering - Computer Science

Pune University
APURVA ASHOK SARODE