Summary
Overview
Work History
Education
Skills
Certification
Projects
Timeline
Generic

JAHNAVI D

Tampa,USA

Summary

Results-driven Data Engineer known for high productivity and efficient task completion. Skilled in big data processing frameworks like Hadoop and Apache Spark, database management using SQL, and data visualization with tools such as Tableau. Excel in problem-solving, collaboration, and adaptability to leverage technical skills in developing innovative data solutions across diverse environments.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Data Engineer

VERTEX PHARMACEUTICALS
Boston, USA
01.2023 - Current
  • Developed and maintained Extract, Transform, and Load (ETL) pipelines to ingest and process large volumes of claims, patient data, and medical records from MS SQL Server, MongoDB, Oracle into Azure Data Lake storage Gen2 for centralized storage and into Azure Synapse Analytics for warehousing and analytics
  • Leveraged Azure Data Factory by using Triggers for scheduling and orchestrating data workflows, ensuring timely and reliable ingestion and transformation of patients data, improving pipeline efficiency
  • Performed data transformations using PySpark on Azure Databricks, processing 1 TB of healthcare-related transactional data and patient records and Kafka for real time data streaming enhancing data quality
  • Designed and implemented Data Quality checks using Pyspark, incorporating automated validation rules and anomaly detection algorithms, which reduced data discrepancies
  • Utilized Apache Airflow's DAGs and custom operators to automate ETL processes, integrating data ingestion, transformation, and loading into Synapse, improving data processing efficiency
  • Led the optimization of Azure Synapse by implementing distribution and partition strategies, configuring workload management for balanced query execution, and employing advanced compression techniques
  • Automated the deployment using Azure DevOps, setting up CI/CD pipelines for seamless integration and continuous deployment, reducing the deployment time and minimizing manual errors and scalability
  • Developed advanced Power BI dashboards and reports, integrating Azure Synapse Analytics data to provide real-time analytics on trends and outcomes, enhancing decision-making capabilities across multiple units and utilized DAX for complex calculations

Student Assistant

UNIVERSITY OF SOUTH FLORIDA
Tampa, USA
05.2022 - 11.2022
  • Guided students on utilizing and leveraging Python libraries such as Pandas and NumPy to automate and streamline data cleaning and preprocessing tasks efficiently
  • Designed highly interactive and visually engaging dashboards using Tableau to analyze and visualize comprehensive student enrollment and performance data effectively

Data Engineer

COGNIZANT
Chennai, India
03.2021 - 12.2021
  • Led the migration of data from Azure blob storage to Snowflake, employing Snowpipe for continuous data ingestion and leveraging tasks and streams to automate data processing and loading
  • Utilized Azure Data Catalog to automatically catalog and manage metadata for extensive datasets stored in Blob Storage and implemented some pipelines in Azure Data Factory to execute complex ETL transformations using PySpark on large-scale data
  • Automated and optimized ETL workflows using PySpark, Azure Databricks, and Azure Data Factory, reducing manual work, cutting processing time, and validating large files in Snowflake after job completion
  • Implemented Snowflake advanced features, including Clustering for optimized query performance on dimension and fact tables, enhancing data retrieval speeds
  • Designed and implemented a scalable real-time data pipeline using Azure Event Hubs to ingest and process high-volume data and set up advanced monitoring and alerting systems for data pipelines in Azure Monitor inside Azure Data Factory, reducing downtime
  • Led the deployment of a containerized application infrastructure using Azure Resource Manager (ARM) templates and Azure Kubernetes Service (AKS)
  • Automated the provisioning, configuration, and scaling of microservices, achieving a reduction in deployment time
  • Identified Key Performance Metrics (KPI) to determine areas of improvement in supply chain and directly managed ETL codebases using Git within Azure DevOps, implementing branching strategies

Data Analyst

SOLUTIONSIQ
Bengaluru, India
01.2019 - 03.2021
  • Developed dashboards with Tableau to monitor key performance indicators
  • Worked closely with management to prioritize business and information needs
  • Synthesized current business intelligence or trend data to support recommendations for action
  • Implemented new data analysis methodologies and data visualization techniques
  • Analyzed and tracked data to prepare forecasts and identify trends
  • Gathered and organized data to analyze current industry trends
  • Developed custom data models and algorithms to apply to data sets, enhancing business operations

Education

MASTER OF SCIENCE - COMPUTER SCIENCE

University of South Florida
Tampa
12.2023

Skills

Programming Languages: Python (Pandas, NumPy, Scikit-learn), Shell scripting, R, SQL, Oracle PL/SQL

Tools: Azure ML studio, Visio, Visual Studio, SAP, Postman, Tableau, PowerBI, MS Excel, Azure Databricks

Databases: MySQL, MS SQL, PostgreSQL, Oracle, AWS Redshift, SQLite

Big Data Technologies: Pyspark, Kafka, Databricks, Airflow

Management Tools: Jira, ADO board, CRM, Salesforce

Cloud Technologies: AWS (EC2, S3), AWS Lambda, AWS Glue, AWS EMR, Textract, AWS Sage maker, AWS Red Shift, Databricks, Apache Airflow, Snowflake, Azure Data Lake, Azure Data Factory, Azure SQL, Synapse

Certification

  • Microsoft Certified Azure Data Engineer
  • AWS Certified Solutions Architect Associate
  • MuleSoft Developer

Projects

Student Data ETL Pipeline, Built an ETL pipeline to automate the ingestion, transformation, and loading of student enrollment and performance data into a central database. Utilized Python and SQL for data processing and automation. This project improved data consistency and reduced manual intervention by 50%. Real-Time Weather Data Processing, Developed a real-time data pipeline to collect weather data from an API and store it in a cloud data warehouse. Used Apache Kafka for streaming and PySpark for processing large datasets. The project enabled near real-time data analysis and faster decision-making based on updated weather information.

Timeline

Data Engineer

VERTEX PHARMACEUTICALS
01.2023 - Current

Student Assistant

UNIVERSITY OF SOUTH FLORIDA
05.2022 - 11.2022

Data Engineer

COGNIZANT
03.2021 - 12.2021

Data Analyst

SOLUTIONSIQ
01.2019 - 03.2021

MASTER OF SCIENCE - COMPUTER SCIENCE

University of South Florida
JAHNAVI D