Summary
Overview
Work History
Education
Skills
Academic Projects
Timeline
Generic

Nischala Madupuri

San Jose,CA

Summary

Accomplished Data Engineer with a passion for delivering valuable insights through advanced data processing and retrieval methods. Committed to driving company growth by developing strategic data pipelines and ETL processes based on robust data architectures. Proven track record in managing complex data sets, ensuring data quality, and serving as a reliable advisor for data-driven decision-making.

Overview

2
2
years of professional experience

Work History

Data Engineer

Centric Software
San Jose, CA
04.2023 - Current
  • Collaborated with stakeholders from multiple departments to develop, implement, and launch various initiatives on the data platform.
  • Designed and Developed various databases for small applications by creating different data models based on user requirements.
  • Effectively communicated key findings through well-prepared reports, analysis, and presentations to various stakeholders.
  • Implemented effective methodologies to streamline data ingestion and pipeline workflows through best practice design.
  • Enhanced data pipeline efficiency for processing a daily volume of over 100 TB of data.
  • Achieved a 20% increase in data extraction speed by implementing an efficient ETL process.

Data Engineer Intern

Slesha Inc
Irving, TX
05.2022 - 08.2022
  • Engineered and executed ETL process using Python and Talend that improved data ingestion speed by 40% and reduced data loss by 20% across 1,000+ datasets.
  • Cleaned unstructured data for data analysis and Confidential tasks and pipelines.
  • Validated data fields across multiple sources to ensure uniformity and accuracy, reducing errors by 25%.
  • Collaborated with cross-functional teams to identify and troubleshoot L3 issues, resulting in a 50% decrease in downtime.

Education

Master of Science - Data Science And Applications

University At Buffalo, SUNY
Buffalo, NY
02-2023

Bachelor of Science - Computer Science

Gandhi Institute of Technology And Management
Hyderabad, IN
06-2021

Skills

  • Programming Languages: Python, R, Pyspark, Scala
  • ML Frameworks: NumPy, Pandas, Matplotlib, Seabron, Pytorch
  • Query Languages: PostgreSQL, SQL
  • Database: Amazon Redshift, Google Big Query, MySQL, Oracle
  • ETL Tools: Tableau Prep, Informatica PowerCenter
  • Visualization Tools: Tableau Desktop, Power BI, Data Studio

Academic Projects

Machine Learning on Amazon Customer Reviews:
Collaborated
with 3 team members, applied sentiment analysis, and built a classifier that can determine a review'ssentiment.
Conducted thorough exploratory data analysis of the US Traffic Accidents dataset using Pandas and NumPy,identifying key trends and patterns in the data that were previously unknown.
Analyzed and split data into train and test data using Kaggle data set.
Utilized Matplotlib and Scikit-Learn for graphical representation of the data. Key Achievements:• Classified the amazon reviews into positive, negative, and neutral categories to aid the selection of a product.News Classification based on Headline:
Built
a cutting-edge NLP model utilizing Python and TensorFlow to classify news articles with 98.99% accuracy, surpassingindustry benchmarks by 20%.
Developed a training set of over 200,000 labeled news articles using active learning techniques, enabling the model tocontinuously improve performance and adapt to evolving topics.
Implemented NLP approach (branch of AI) to develop and train a model to understand, interpret, process, and manipulatenatural language to classify news.
Utilized key techniques like data pre-processing, model training, finding best learning rate, finding best epochs, freezing& transfer learning.Key Achievements:Analyzed the interest of readers and recommended the news to community newspaper board.Covid-19 Analysis:
Developed a data pipeline using Python to extract and load COVID-19 data into Google BigQuery, resulting in a 50%May 2022-August 2022increase in data accuracy.

  • Involved in Data preparation, Exploratory analysis, Feature engineering using Supervised and unsupervised modeling.
  • Implemented advanced statistical analysis on COVID-19 infection trends using Google Data Studio, resulting in theidentification of key patterns and insights to better inform policy decisions.
  • Designed and developed a real-time dashboard that integrated with hospital inventory systems, allowing for efficientallocation of medical supplies based on predicted demand, resulting in a reduction of wasted resources by 40%.Key Achievements:

• Initiated and performed exploratory data analysis on the US Traffic Accidents data set to find trends and patterns using Python and Jupiter.

Provided the analyzed data to counties and local hospitals to better assess and judge the need of medical supplies.

Exploratory Data Analysis of US Traffic Accidents:

  • Processed and parsed the raw data set and loaded it into the sqlite3 database.
  • Queried the parsed data in the sqlite3 database using SQL commands.
  • Queried the parsed data in the sqlite3 database using SQL commands.Key Achievements:

Explained the Data trends using data visualization python libraries through graphical plots.

Timeline

Data Engineer

Centric Software
04.2023 - Current

Data Engineer Intern

Slesha Inc
05.2022 - 08.2022

Master of Science - Data Science And Applications

University At Buffalo, SUNY

Bachelor of Science - Computer Science

Gandhi Institute of Technology And Management
Nischala Madupuri