Summary
Overview
Work History
Education
Skills
Academic Projects
Timeline
Generic

Sonali Misra

Austin,TX

Summary

Results-driven Data Engineer specializing in performance optimization and ETL pipeline integration. Proficient in Python and Apache Spark, I excel in problem-solving and enhancing data quality. Achieved significant cost savings through efficient job migrations, demonstrating strong analytical and technical skills.

Overview

7
7
years of professional experience

Work History

Data Engineer

Expedia Group
01.2023 - Current
  • Worked on Performance Optimization of Batch Pipeline by bringing down the runtime by 40% without any additional compute cost.
  • Played a key role in Integrating a new dataset to existing ETL pipeline.
  • Worked on analyzing and fixing several data quality issues in the ETL pipeline.
  • Improved alerting by implementing real time data analysis using metric collector and Datadog dashboards.
  • Migrated 10+ critical jobs from Qubole to Dataproc, achieving $20K annual cost savings within one week.

Data Engineer Intern

Expedia Group
05.2022 - 07.2022
  • Designed and developed scripts using Python and Shell to extract the data from EG Data Lake.
  • Delivered high-quality documentation detailing technical specifications, user guides, troubleshooting steps, and best practices for utilizing developed solution efficiently.

Software Engineer

Accenture
05.2018 - 12.2020
  • Created a DevOps Pipeline for a Java application which has stages maven build, maven test, creation of docker image, scanning of docker image for vulnerabilities and then deployment to SAP Cloud Platform.
  • Developed a Terraform script for creating EKS cluster in AWS cloud and Microsoft Azure cloud platform.
  • Implemented scripts using Python to extract required data from the entire Jenkins console data and then sent this data to ELK stack and built dashboards to analyze data trends using visualization tools like Kibana and Grafana.
  • Designed and Implemented an automated solution to transfer all the data dynamically from Jenkins console to Azure cloud platform and reduced the data transfer time by 60%
  • Helped the team in building an ML Model which predicts the average build duration, average success rate, average failure rate and 50 other important metrics of Jenkins jobs.
  • Built and deployed overall service infrastructure utilizing several AWS stack(Including EC2, ECS, Route53, CloudFront, RDS, IAM) focusing on high availability, fault tolerance and auto-scaling.

Education

Masters in Computer Science -

SUNY At Buffalo
Buffalo, NY
12-2022

Bachelor of Technology in Electronics -

Sreenidhi Institute of Science And Technology
Hyderabad
05-2018

Skills

  • Programming Skills : Scala, Java, Python, Bash, HTML, CSS
  • Databases : Hive DB, Scylla DB,MongoDB, MySQL
  • Tools & Frameworks : Apache Spark(Spark SQL), Airflow, Datadog, Grafana, Jenkins
  • Version & Control Systems : Github, Git, SVN
  • Containerization : Docker, Kubernetes
  • Cloud : AWS, Azure

Academic Projects

Analysis and Prediction of Graduation Rate and Dropout Rate in New York

State(NumPy,Pandas,Matplotlib,sklearn)

• Collected the csv data on graduation rates and dropout rates in several counties of New York State from 2016 to 2019, pre‑processed the data and analyzed the important metrics that would help in predicting the future graduation rates and dropout rates

• Built an ML model that predicts the counties with high graduation rates and high dropout rates

IMDB Database Management System(SQL)

• Created a IMDB management system to store, transform and search the ratings, crew info of several movies and TV shows released on or before 2017

• Built an Entity‑Relationship Model and translated the model to relational schema for the IMDB management system using SQL

Information Retrieval system Over Twitter Data(Python,AWS,Solr,Flask)

• A complete search and analytic solution for a corpus of twitter data that has been crawled over a period of time along with a user

interface that enables users to search and analyze the corpus and do extensive filtering on tweets based on language, country and

topics

Timeline

Data Engineer

Expedia Group
01.2023 - Current

Data Engineer Intern

Expedia Group
05.2022 - 07.2022

Software Engineer

Accenture
05.2018 - 12.2020

Masters in Computer Science -

SUNY At Buffalo

Bachelor of Technology in Electronics -

Sreenidhi Institute of Science And Technology