Summary

Overview

Work History

Education

Skills

Certification

Projects

Timeline

JAHNAVI D

Tampa,USA

Summary

Results-driven Data Engineer known for high productivity and efficient task completion. Skilled in big data processing frameworks like Hadoop and Apache Spark, database management using SQL, and data visualization with tools such as Tableau. Excel in problem-solving, collaboration, and adaptability to leverage technical skills in developing innovative data solutions across diverse environments.

Overview

years of professional experience

Certification

Work History

Data Engineer

VERTEX PHARMACEUTICALS

Boston, USA

01.2023 - Current

Developed and maintained Extract, Transform, and Load (ETL) pipelines to ingest and process large volumes of claims, patient data, and medical records from MS SQL Server, MongoDB, Oracle into Azure Data Lake storage Gen2 for centralized storage and into Azure Synapse Analytics for warehousing and analytics
Leveraged Azure Data Factory by using Triggers for scheduling and orchestrating data workflows, ensuring timely and reliable ingestion and transformation of patients data, improving pipeline efficiency
Performed data transformations using PySpark on Azure Databricks, processing 1 TB of healthcare-related transactional data and patient records and Kafka for real time data streaming enhancing data quality
Designed and implemented Data Quality checks using Pyspark, incorporating automated validation rules and anomaly detection algorithms, which reduced data discrepancies
Utilized Apache Airflow's DAGs and custom operators to automate ETL processes, integrating data ingestion, transformation, and loading into Synapse, improving data processing efficiency
Led the optimization of Azure Synapse by implementing distribution and partition strategies, configuring workload management for balanced query execution, and employing advanced compression techniques
Automated the deployment using Azure DevOps, setting up CI/CD pipelines for seamless integration and continuous deployment, reducing the deployment time and minimizing manual errors and scalability
Developed advanced Power BI dashboards and reports, integrating Azure Synapse Analytics data to provide real-time analytics on trends and outcomes, enhancing decision-making capabilities across multiple units and utilized DAX for complex calculations

Student Assistant

UNIVERSITY OF SOUTH FLORIDA

Tampa, USA

05.2022 - 11.2022

Guided students on utilizing and leveraging Python libraries such as Pandas and NumPy to automate and streamline data cleaning and preprocessing tasks efficiently
Designed highly interactive and visually engaging dashboards using Tableau to analyze and visualize comprehensive student enrollment and performance data effectively

Data Engineer

COGNIZANT

Chennai, India

03.2021 - 12.2021

Led the migration of data from Azure blob storage to Snowflake, employing Snowpipe for continuous data ingestion and leveraging tasks and streams to automate data processing and loading
Utilized Azure Data Catalog to automatically catalog and manage metadata for extensive datasets stored in Blob Storage and implemented some pipelines in Azure Data Factory to execute complex ETL transformations using PySpark on large-scale data
Automated and optimized ETL workflows using PySpark, Azure Databricks, and Azure Data Factory, reducing manual work, cutting processing time, and validating large files in Snowflake after job completion
Implemented Snowflake advanced features, including Clustering for optimized query performance on dimension and fact tables, enhancing data retrieval speeds
Designed and implemented a scalable real-time data pipeline using Azure Event Hubs to ingest and process high-volume data and set up advanced monitoring and alerting systems for data pipelines in Azure Monitor inside Azure Data Factory, reducing downtime
Led the deployment of a containerized application infrastructure using Azure Resource Manager (ARM) templates and Azure Kubernetes Service (AKS)
Automated the provisioning, configuration, and scaling of microservices, achieving a reduction in deployment time
Identified Key Performance Metrics (KPI) to determine areas of improvement in supply chain and directly managed ETL codebases using Git within Azure DevOps, implementing branching strategies

Data Analyst

SOLUTIONSIQ

Bengaluru, India

01.2019 - 03.2021

Developed dashboards with Tableau to monitor key performance indicators
Worked closely with management to prioritize business and information needs
Synthesized current business intelligence or trend data to support recommendations for action
Implemented new data analysis methodologies and data visualization techniques
Analyzed and tracked data to prepare forecasts and identify trends
Gathered and organized data to analyze current industry trends
Developed custom data models and algorithms to apply to data sets, enhancing business operations

Education

MASTER OF SCIENCE - COMPUTER SCIENCE

University of South Florida

Tampa

12.2023

Skills

Programming Languages: Python (Pandas, NumPy, Scikit-learn), Shell scripting, R, SQL, Oracle PL/SQL

Tools: Azure ML studio, Visio, Visual Studio, SAP, Postman, Tableau, PowerBI, MS Excel, Azure Databricks

Databases: MySQL, MS SQL, PostgreSQL, Oracle, AWS Redshift, SQLite

Big Data Technologies: Pyspark, Kafka, Databricks, Airflow

Management Tools: Jira, ADO board, CRM, Salesforce

Cloud Technologies: AWS (EC2, S3), AWS Lambda, AWS Glue, AWS EMR, Textract, AWS Sage maker, AWS Red Shift, Databricks, Apache Airflow, Snowflake, Azure Data Lake, Azure Data Factory, Azure SQL, Synapse

Certification

Microsoft Certified Azure Data Engineer
AWS Certified Solutions Architect Associate
MuleSoft Developer

Projects

Student Data ETL Pipeline, Built an ETL pipeline to automate the ingestion, transformation, and loading of student enrollment and performance data into a central database. Utilized Python and SQL for data processing and automation. This project improved data consistency and reduced manual intervention by 50%. Real-Time Weather Data Processing, Developed a real-time data pipeline to collect weather data from an API and store it in a cloud data warehouse. Used Apache Kafka for streaming and PySpark for processing large datasets. The project enabled near real-time data analysis and faster decision-making based on updated weather information.

Timeline

Data Engineer

VERTEX PHARMACEUTICALS

01.2023 - Current

Student Assistant

UNIVERSITY OF SOUTH FLORIDA

05.2022 - 11.2022

Data Engineer

COGNIZANT

03.2021 - 12.2021

Data Analyst

SOLUTIONSIQ

01.2019 - 03.2021

MASTER OF SCIENCE - COMPUTER SCIENCE

University of South Florida

JAHNAVI D

Summary

Overview

Work History

Data Engineer

Student Assistant

Data Engineer

Data Analyst

Education

MASTER OF SCIENCE - COMPUTER SCIENCE

Skills

Certification

Projects

Timeline

Data Engineer

Student Assistant

Data Engineer

Data Analyst

MASTER OF SCIENCE - COMPUTER SCIENCE

Similar Profiles

Juanita DessourcesJuanita Dessources

Chavannes ValbruneChavannes Valbrune

Noah NguyenNoah Nguyen

Laurent CUILLERLaurent CUILLER

Elizabeth CastroElizabeth Castro