Summary

Overview

Work History

Education

Skills

Accomplishments

Timeline

Pranav Pandirla

McKinney

Summary

Data Engineer with proven expertise in developing efficient data pipelines using Azure Data Factory. Skilled in transforming complex datasets and enhancing performance metrics. Strong collaboration and problem-solving abilities to ensure high-quality data governance and integrity.

Overview

years of professional experience

Work History

Data Engineer

PepsiCo

Plano

06.2024 - Current

Responsible for implementing dynamic data pipelines using Azure Data Factory (ADF) to manage end-to-end data workflows.
Configured Linked Services, Datasets, and Triggers in ADF for effective data integration between on-premises SQL servers, Azure Blob Storage, and Azure Synapse.
Developed parameterized pipelines using expressions and variables to handle multiple data sources.
Configured and implemented the Azure Data Factory triggers, and scheduled the pipelines.
Responsible for performing complex data transformations in Databricks by using PySpark, including joins, filters, aggregations, and partitioning, to ensure efficient processing and optimized performance.
Implemented secure data access in Azure by configuring service principal authentication between Azure Databricks and Azure Data Lake.
Created external tables and materialized views in Azure Synapse Analytics, pushing transformed datasets from Databricks for analytics and reporting.
Developed and published interactive dashboards using Power BI.

Data Engineer

Bank of America

Plano

04.2023 - 06.2024

Developed and maintained Shell scripts for file detection, pre-processing, and automated transfer to HDFS.
Responsible for performing schema-based validation using a combination of Shell scripting and PySpark, ensuring data quality before processing.
Automated data ingestion workflows from landing to HDFS and staging layers, including error handling and logging.
Implemented loading of validated data into staging tables, and performed comparisons with target tables for consistency checks.
Implemented incremental data processing pipelines that handle daily deltas efficiently.
Designed and maintained Change Data Capture (CDC) logic to identify and apply inserts, updates, and deletes between raw and target tables, implementing both SCD Type 1 (overwrite existing records) and SCD Type 2 (track historical changes with versioning and effective dates).
Responsible for using PySpark for efficient data transformation, validation, and merging operations on large-scale datasets.
Scheduled and orchestrated ETL workflows using Autosys, with clear job dependencies, and error notification setups.
Responsible for troubleshooting job failures, data mismatches, and file issues, ensuring end-to-end data flow reliability.
Responsible for maintaining detailed documentation of ETL logic, schema validation rules, and job execution steps.
Collaborated with upstream providers and downstream data consumers to align on data quality and availability SLAs.
Responsible for optimizing Spark jobs and HDFS storage for performance, scalability, and cost-efficiency.
Ensured compliance with data governance and security standards throughout the data pipeline.

Intern

Social Prachar

Hyderabad

12.2019 - 03.2020

Collected, cleaned, and preprocessed data using Pandas, NumPy, and scikit-learn to prepare it for machine learning modeling.
Conducted exploratory data analysis (EDA) to uncover patterns, correlations, and outliers using Matplotlib and Seaborn.
Built and evaluated multiple ML models, including Linear Regression, Logistic Regression, Decision Trees, Random Forests, and K-Nearest Neighbors (KNN).
Split the dataset into training and test sets using train_test_split, and apply cross-validation to assess model robustness.
Evaluated models using metrics such as accuracy, precision, recall, and confusion matrix.

Education

Master of Science - Computer Science

University of North Texas

Denton, Texas

12-2022

Bachelor of Science - Computer Science

IARE

Hyderabad, India

06-2020

Skills

Big data technologies
SQL and databases
PL/SQL Programming
Java programming
Python programming language
PySpark programming
UNIX shell scripting

Git version control
AWS cloud services
Azure cloud services
Power BI
Tableau
Azure Data Factory (ADF)
Azure Data Bricks

Accomplishments

Qualified for the Infosys HackWithInfy in 2019.
Qualified for the TCS CodeVita in 2019.
Solved around 150 programming challenges in leet code platform.

Timeline

Data Engineer

PepsiCo

06.2024 - Current

Data Engineer

Bank of America

04.2023 - 06.2024

Intern

Social Prachar

12.2019 - 03.2020

Master of Science - Computer Science

University of North Texas

Bachelor of Science - Computer Science

IARE

Pranav Pandirla

Summary

Overview

Work History

Data Engineer

Data Engineer

Intern

Education

Master of Science - Computer Science

Bachelor of Science - Computer Science

Skills

Accomplishments

Timeline

Data Engineer

Data Engineer

Intern

Master of Science - Computer Science

Bachelor of Science - Computer Science

Similar Profiles

Preethi Manisha VempatiPreethi Manisha Vempati

Yougender YYougender Y

Lavanya PadamatiLavanya Padamati

Sakhena Meghana KuthadaSakhena Meghana Kuthada

Dimitris ManikisDimitris Manikis