Summary
Overview
Work History
Education
Skills
Timeline
Generic

PRATHYUSHA DORA

Summary

Results-driven Data Engineer with a proven ability to develop scalable data pipelines and improve data quality at Magna. Utilizes Python and AWS technologies to deliver cost-effective big data solutions while demonstrating expertise in Apache Spark. Excellent collaboration skills enhance team productivity, supported by strong analytical capabilities. Aiming to leverage advanced technical skills to drive innovative data solutions in future projects.

Overview

6
6
years of professional experience

Work History

Data Engineer

Magna
05.2023 - 08.2023
  • Designed and implemented data ingestion pipelines using AWS Timestream for time-series data storage and analytics
  • Integrated PySpark with AWS services like S3 and Redshift for seamless data integration and analysis, enabling scalable and cost-effective big data solutions
  • Designed and implemented data pipelines to collect, preprocess, and store large datasets for machine learning projects, ensuring data quality and reliability
  • Developed automated data pipelines using PySpark and Python scripting, enabling scheduled and event-triggered data processing and transformations
  • Implemented CI/CD pipelines using tools like Jenkins and GitLab CI/CD to automate deployment and testing of data engineering workflows and applications
  • Implemented cluster autoscaling policies on AWS EMR to dynamically adjust cluster capacity based on workload demand, optimizing resource utilization and reducing costs during off-peak periods
  • Designed and implemented data warehouse solutions using Amazon Redshift to support high-performance analytics and reporting requirements
  • Leveraged AWS Glue Crawlers to automatically discover and catalog data stored in various sources, enabling seamless integration with AWS data lakes and analytics services
  • Conducted performance benchmarking tests for Hive, Kafka, and Spark jobs to compare different configurations and identify the best setup for specific use cases
  • Designed and maintained Hive databases and tables for structured and semi-structured data, enabling efficient data querying and analysis using familiar SQL-like syntax
  • Designed, developed, and maintained complex data ingestion and transformation workflows using Apache NiFi to efficiently process and route data from various sources to AWS data stores
  • Built data ingestion processes to capture real-time streaming data from various sources and transform it for storage in AWS Timestream
  • Applied Python programming skills and PySpark DataFrame operations to clean, transform, and manipulate raw data for further analysis and modeling using Data wrangler
  • Worked collaboratively with teams using Databricks collaborative features and Git for version control
  • Designed and maintained data warehousing solutions within Databricks to store structured data efficiently

Data Engineer Intern

Jay Consulting Services
Durham, NC
06.2022 - 08.2022
  • Assisted in the development and implementation of data pipelines to ingest, transform, and load data from various sources into a centralized data warehouse
  • Supported the design and optimization of database schemas and data models to ensure efficient storage and retrieval of information
  • Collaborated with senior data engineers to troubleshoot and resolve issues related to data quality, reliability, and performance
  • Contributed to the documentation of data engineering processes, including workflow diagrams, technical specifications, and best practices
  • Gained hands-on experience with data engineering tools and technologies such as Apache Spark, Hadoop, SQL, and Python
  • Assisted in the development of data visualization dashboards and reports to communicate insights and findings to stakeholders
  • Participated in team meetings and brainstorming sessions to discuss project requirements, priorities, and strategies
  • Actively engaged in learning opportunities and self-directed projects to expand knowledge and skills in data engineering concepts and practice
  • Demonstrated a proactive and eager attitude towards learning and contributing to the team's success in a dynamic internship environment

Data Analyst

Nexava Info Private Limited
Hyderabad, India
09.2020 - 06.2021
  • Monitored and analyzed real-time data streams using tools like Apache Kafka and Spark Streaming to provide timely insights and alerts
  • Developed and maintained real-time dashboards to visualize live data trends and performance metrics for immediate decision-making
  • Implemented anomaly detection algorithms to identify and respond to critical events in real-time, minimizing downtime and optimizing operations
  • Conducted continuous monitoring of data quality and integrity in real-time, proactively identifying and resolving issues to ensure accurate and reliable analysis
  • Collaborated with software engineers to integrate real-time data sources into analytics platforms, enabling seamless data flow and analysis
  • Utilized real-time data analytics to optimize marketing campaigns, pricing strategies, and customer engagement initiatives, driving immediate business impact
  • Responded to ad-hoc data requests and queries in real-time, providing on-demand insights and analysis to support time-sensitive decision-making
  • Leveraged real-time sentiment analysis of social media data to gauge customer perceptions and sentiment, informing rapid response strategies
  • Conducted A/B testing and real-time experimentation to evaluate the effectiveness of product features and marketing campaigns, iterating quickly based on real-time feedback

Data Analyst

Capgen Soft Private Limited
Hyderabad, India
06.2017 - 01.2019
  • Conducted in-depth analysis of large datasets using statistical methods and data visualization tools to uncover insights and trends
  • Developed and maintained automated reports and dashboards to track key performance indicators and support data-driven decision-making
  • Collaborated with cross-functional teams to define data requirements, extract relevant information, and communicate findings effectively
  • Identified data quality issues and implemented solutions to ensure accuracy and reliability of analytical results
  • Utilized SQL queries and data manipulation techniques to extract, clean, and transform raw data from various sources
  • Applied machine learning algorithms and predictive modeling techniques to identify patterns and make actionable recommendations
  • Presented findings and recommendations to stakeholders in clear and concise reports, presentations, and visualizations
  • Conducted ad-hoc analysis to support business initiatives and address specific questions or challenges
  • Stayed updated on industry trends and best practices in data analysis, continuously enhancing skills and knowledge
  • Contributed to the development and optimization of data infrastructure and processes to improve efficiency and scalability

Education

Master’s - computer science

University of Missouri Saint Louis
Saint Louis, Missouri
05.2023

Bachelor’s - computer science engineering

Jawaharlal Nehru Technological University Hyderabad
Hyderabad, India
05.2017

Skills

  • Tableau
  • Amazon QuickSight
  • Python
  • Scala
  • SQL
  • PL/SQL
  • Shell scripting
  • UNIX
  • Windows
  • Linux
  • AWS EC2
  • Glue
  • EMR
  • S3
  • Oracle
  • SQL Server
  • MySQL
  • MongoDB
  • Hadoop
  • Apache Spark

Timeline

Data Engineer

Magna
05.2023 - 08.2023

Data Engineer Intern

Jay Consulting Services
06.2022 - 08.2022

Data Analyst

Nexava Info Private Limited
09.2020 - 06.2021

Data Analyst

Capgen Soft Private Limited
06.2017 - 01.2019

Master’s - computer science

University of Missouri Saint Louis

Bachelor’s - computer science engineering

Jawaharlal Nehru Technological University Hyderabad
PRATHYUSHA DORA