Summary
Overview
Work History
Education
Skills
Websites
Projects
Timeline
Generic

Prathyusha Duddyala

Summary

Results-driven Data Engineer known for high productivity and efficient task completion. Skilled in big data processing frameworks like Hadoop and Apache Spark, database management using SQL, and data visualization with tools such as Tableau. Excel in problem-solving, collaboration, and adaptability to leverage technical skills in developing innovative data solutions across diverse environments.

Overview

6
6
years of professional experience

Work History

Data Engineer

WatchGuard
, Seattle
02.2023 - 01.2024
  • Improved data pipeline efficiency using Azure Data Factory, Synapse Analytics, and Self-Hosted Integration Runtime (SHIR), resulting in faster data processing and enhanced accessibility.
  • Led ETL development and automation using Databricks, PySpark, Docker, and Jenkins, streamlining data processing, improving data accuracy, and reducing errors in production.
  • Built and managed data solutions with Azure Data Lake, Snowflake, and Spark-SQL, enhancing data integration, accessibility, and reporting capabilities.

Data Engineer

Cognizant
Hyderabad, India
07.2021 - 05.2022
  • Worked with AWS services like Glue, S3, Redshift, Databricks, and EC2 to enhance data processing, storage, and integration. Developed Spark applications and used Spark DataFrame and SparkSQL APIs to streamline data tasks from various RDBMS sources.
  • Improved Spark application performance in Databricks using techniques like multithreading and multiprocessing. Orchestrated jobs in Databricks, developed UDFs, optimized data processing for various file formats (JSON, CSV, Parquet), and improved data availability in AWS Redshift for business intelligence.
  • Centralized data into an enterprise cloud warehouse using Redshift, and integrated data with Cassandra and Kafka for real-time processing.
  • Configured Kafka managers, developed data models, and enhanced performance of Cassandra clusters for efficient data querying.

Junior Data Engineer

Wipro
Hyderabad, India
05.2019 - 06.2021
  • Developed applications using Apache Spark, Scala, Python, and AWS EMR for efficient data processing and distribution. Ingested data from various RDBMS sources into HDFS using Sqoop and loaded datasets into Hive and Cassandra for querying and analysis, improving data accessibility and processing efficiency.
  • Streamed real-time data using Kafka and performed transformations with Kafka Streams. Migrated HQL code to PySpark, optimized Spark applications with Scala, and improved cluster performance with Kubernetes, enhancing system reliability and data processing speed.
  • Automated reporting processes using shell scripts, integrated applications with Maven and Jenkins for job management, and collaborated with Data Science teams to develop BI dashboards in Tableau, improving data accessibility, reporting efficiency, and decision-making capabilities.

Junior Data Engineer

247[ai]
Hyderabad, India
06.2018 - 05.2019
  • Managed Hadoop clusters and developed Spark scripts using Java, Python, and Scala to process and analyze data from relational databases. Optimized Hive queries with Spark SQL and DataFrames, improving query execution and data processing efficiency.
  • Ingested data from databases into HDFS using Sqoop, processed large datasets with Hive and Pig scripts, and automated tasks using Shell scripts for big data tools like Sqoop, Impala, and MapReduce, streamlining data flow and analysis.
  • Managed MongoDB and Cassandra clusters, executed data transformations with MapReduce and CQL, and implemented big data solutions on AWS using EMR and S3. Enhanced system scalability and performance, ensuring efficient data retrieval and storage.

Education

Master of Science - Business Analytics, Cybersecurity

University of New Mexico
12.2023

Bachelor of Technology - Electronics Communication Engineering

Guru Nanak Institutions
04.2018

Skills

  • Python ,Java
  • SQL,T-SQL,PL/SQL
  • Scala ,Hadoop
  • Apache Spark
  • Hive,AWS,Azure
  • MySQL,PostgreSQL
  • MS SQL Server
  • MongoDB ,Apache Airflow
  • Tableau,Power BI,Excel
  • Docker,Kubernetes
  • Git,Jira

Projects

Automated Data Pipeline for E-commerce Analytics                                                     Sep 2022 - Jan 2023

• Developed an automated data pipeline using Apache Airflow to extract, transform, and load e-commerce  data into Redshift.

• Implemented data quality checks and validation scripts using Python and Pandas, ensuring data accuracy.

• Created Tableau dashboards to visualize sales performance, customer behavior, and product trends.

Real-Time Data Processing with Spark Streaming for IoT Devices                                Feb 2023 - May 2023

• Designed and implemented a real-time data processing system using Apache Spark Streaming and Kafka.

• Processed and analyzed streaming data from IoT devices, generating real-time insights for predictive maintenance.

• Deployed the solution on AWS EMR, leveraging S3 for storage and Redshift for data warehousing.

Real-Time Financial Data Analysis Platform                                                                    Aug 2023 - Dec 2023

• Built a real-time data processing platform for financial transactions using Apache Flink and Kafka.

• Enabled real-time fraud detection and risk assessment by processing streaming data from financial systems.

• Integrated the platform with AWS services (Kinesis, S3, Redshift) for scalable and reliable data processing.

Timeline

Data Engineer

WatchGuard
02.2023 - 01.2024

Data Engineer

Cognizant
07.2021 - 05.2022

Junior Data Engineer

Wipro
05.2019 - 06.2021

Junior Data Engineer

247[ai]
06.2018 - 05.2019

Master of Science - Business Analytics, Cybersecurity

University of New Mexico

Bachelor of Technology - Electronics Communication Engineering

Guru Nanak Institutions
Prathyusha Duddyala