Summary
Overview
Work History
Education
Skills
Projects
Accomplishments
Timeline
Generic

Pravallika Daka

Waxhaw,NC

Summary

  • extensive experience orchestrating AWS and Azure environments, architecting Snowflake data warehouses, and designing efficient data pipelines using PySpark and SQL.
  • Proficient in the Hadoop ecosystem, including MapReduce, YARN, Apache Hive, Apache Spark, and Apache Kafka, for big data processing tasks.
  • Adept at infrastructure automation with CloudFormation and experienced in Agile methodology.
  • Skilled in data visualization with Tableau and proficient in optimizing queries and data analysis in Snowflake.

Overview

3
3
years of professional experience

Work History

Research Assistant

NIU
Dekalb, IL
09.2022 - 12.2023
  • Utilized Python pandas for initial exploration and cleaning of millions of data points on historical MSW generation rates across various Illinois municipalities
  • Leveraged PySpark for scalable processing and feature engineering to prepare the data for further analysis
  • Designed and optimized complex SQL queries to extract relevant information from departmental data stored in Snowflake
  • Collaborated with the NIU Department of Environmental Sciences to understand their data schema and query MSW generation data efficiently
  • Used query optimization techniques such as index tuning, query rewriting, and query plan analysis which reduced query execution time
  • Employed Azure Synapse Analytics for additional data analysis tasks, leveraging its scalability and performance for handling large datasets
  • Utilized Tableau to visualize MSW data and forecasted 2021 generation rates by analyzing percentage changes in generation capacity using census metrics
  • Developed a regression analysis model within Tableau to predict future MSW generation per capita.

Associate/Data Engineer

Cognizant Technology Solutions
Hyderabad, India
01.2021 - 07.2022
  • Orchestrated a comprehensive AWS environment (EMR, EC2, EKS, S3, Step Functions, Lambda, Glue, Athena and CloudWatch) for provisioning, storing, and monitoring data processing workflows.
  • Utilized Snowflake to architect and build data warehouses, focusing on well-defined schemas, optimized data models, and efficient querying capabilities
  • Architected data pipelines using PySpark and SQL to ingest, transform, and analyze massive datasets stored in Amazon S3, ensuring efficient and scalable data processing.
  • Possess expertise in the Hadoop ecosystem, including MapReduce, YARN, Apache Hive, Apache Spark, and Apache Kafka, for big data processing tasks.
  • Developed and maintained CloudFormation templates for automating the deployment of AWS resources, enabling streamlined infrastructure provisioning.
  • Implemented robust monitoring and logging solutions using AWS CloudWatchand Datadog, ensuring real-time system performance visibility and facilitating efficient troubleshooting
  • Utilized Tableau to create clear and insightful dashboards and reports, for better data exploration and communication for stakeholders
  • Actively participated in troubleshooting and resolving production related issuesduring on-call.
  • Actively engaged in Agile Scrum methodology by participating in sprint planning, retrospectiveand grooming meetings.

Education

Masters - Computer Science

Northern Illinois University
05.2024

Bachelors - Computer Science and Engineering

Amrita School Of Engineering, India
06.2021

Skills

  • Pyspark
  • SQL
  • AWS
  • Azure
  • Hadoop Ecosystem
  • Snowflake
  • Git
  • Tableau

Projects

Food Inspection Data Analysis

  • Utilized diverse graph theory methods, such as Depth First Search (DFS), Betweenness Centrality, Closeness Centrality, and Degree Distribution, to construct and analyze a network of over 3,000 food establishments, pinpointing high-risk locations and optimizing inspection strategies.

Failure Prediction using Ensembling Techniques

  • Led the creation of predictive models for software failure anticipation, crafting resilient prediction frameworks that incorporated classifiers such as the XGB Classifier and Gradient Boosting, as well as ensemble methods like Soft Voting and Hard Voting. This pioneering work culminated in a research paper presented at ICMLAS in February 2022, showcasing our innovative approach to software failure prediction.

Accomplishments

  • Research paper published at ICMLAS in February 2022, for the project paper "Failure Prediction using Ensembling Techniques"

Timeline

Research Assistant

NIU
09.2022 - 12.2023

Associate/Data Engineer

Cognizant Technology Solutions
01.2021 - 07.2022

Masters - Computer Science

Northern Illinois University

Bachelors - Computer Science and Engineering

Amrita School Of Engineering, India
Pravallika Daka