extensive experience orchestrating AWS and Azure environments, architecting Snowflake data warehouses, and designing efficient data pipelines using PySpark and SQL.
Proficient in the Hadoop ecosystem, including MapReduce, YARN, Apache Hive, Apache Spark, and Apache Kafka, for big data processing tasks.
Adept at infrastructure automation with CloudFormation and experienced in Agile methodology.
Skilled in data visualization with Tableau and proficient in optimizing queries and data analysis in Snowflake.
Overview
3
3
years of professional experience
Work History
Research Assistant
NIU
Dekalb, IL
09.2022 - 12.2023
Utilized Python pandas for initial exploration and cleaning of millions of data points on historical MSW generation rates across various Illinois municipalities
Leveraged PySpark for scalable processing and feature engineering to prepare the data for further analysis
Designed and optimized complex SQL queries to extract relevant information from departmental data stored in Snowflake
Collaborated with the NIU Department of Environmental Sciences to understand their data schema and query MSW generation data efficiently
Used query optimization techniques such as index tuning, query rewriting, and query plan analysis which reduced query execution time
Employed Azure Synapse Analytics for additional data analysis tasks, leveraging its scalability and performance for handling large datasets
Utilized Tableau to visualize MSW data and forecasted 2021 generation rates by analyzing percentage changes in generation capacity using census metrics
Developed a regressionanalysis model within Tableau to predict future MSW generation per capita.
Associate/Data Engineer
Cognizant Technology Solutions
Hyderabad, India
01.2021 - 07.2022
Orchestrated a comprehensive AWS environment (EMR, EC2, EKS, S3, Step Functions, Lambda, Glue, Athena and CloudWatch) for provisioning, storing, and monitoring data processing workflows.
Utilized Snowflake to architect and build datawarehouses, focusing on well-defined schemas, optimized data models, and efficient querying capabilities
Architected data pipelines using PySpark and SQL to ingest, transform, and analyze massive datasets stored in Amazon S3, ensuring efficient and scalable data processing.
Possess expertise in the Hadoop ecosystem, including MapReduce, YARN, Apache Hive, Apache Spark, and Apache Kafka, for big data processing tasks.
Developed and maintained CloudFormation templates for automating the deployment of AWS resources, enabling streamlined infrastructureprovisioning.
Implemented robust monitoring and logging solutions using AWS CloudWatchand Datadog, ensuring real-time system performance visibility and facilitating efficient troubleshooting
Utilized Tableau to create clear and insightful dashboards and reports, for better data exploration and communication for stakeholders
Actively participated in troubleshooting and resolving productionrelatedissuesduring on-call.
Actively engaged in AgileScrum methodology by participating in sprintplanning, retrospectiveand grooming meetings.
Education
Masters - Computer Science
Northern Illinois University
05.2024
Bachelors - Computer Science and Engineering
Amrita School Of Engineering, India
06.2021
Skills
Pyspark
SQL
AWS
Azure
Hadoop Ecosystem
Snowflake
Git
Tableau
Projects
Food Inspection Data Analysis
Utilized diverse graph theory methods, such as Depth First Search (DFS), Betweenness Centrality, Closeness Centrality, and Degree Distribution, to construct and analyze a network of over 3,000 food establishments, pinpointing high-risk locations and optimizing inspection strategies.
Failure Prediction using Ensembling Techniques
Led the creation of predictive models for software failure anticipation, crafting resilient prediction frameworks that incorporated classifiers such as the XGB Classifier and Gradient Boosting, as well as ensemble methods like Soft Voting and Hard Voting. This pioneering work culminated in a research paper presented at ICMLAS in February 2022, showcasing our innovative approach to software failure prediction.
Accomplishments
Research paper published at ICMLAS in February 2022, for the project paper "Failure Prediction using Ensembling Techniques"