Summary
Overview
Work History
Education
Skills
Accomplishments
Timeline
Generic

Pranav Pandirla

McKinney

Summary

Data Engineer with proven expertise in developing efficient data pipelines using Azure Data Factory. Skilled in transforming complex datasets and enhancing performance metrics. Strong collaboration and problem-solving abilities to ensure high-quality data governance and integrity.

Overview

5
5
years of professional experience

Work History

Data Engineer

PepsiCo
Plano
06.2024 - Current
  • Responsible for implementing dynamic data pipelines using Azure Data Factory (ADF) to manage end-to-end data workflows.
  • Configured Linked Services, Datasets, and Triggers in ADF for effective data integration between on-premises SQL servers, Azure Blob Storage, and Azure Synapse.
  • Developed parameterized pipelines using expressions and variables to handle multiple data sources.
  • Configured and implemented the Azure Data Factory triggers, and scheduled the pipelines.
  • Responsible for performing complex data transformations in Databricks by using PySpark, including joins, filters, aggregations, and partitioning, to ensure efficient processing and optimized performance.
  • Implemented secure data access in Azure by configuring service principal authentication between Azure Databricks and Azure Data Lake.
  • Created external tables and materialized views in Azure Synapse Analytics, pushing transformed datasets from Databricks for analytics and reporting.
  • Developed and published interactive dashboards using Power BI.

Data Engineer

Bank of America
Plano
04.2023 - 06.2024
  • Developed and maintained Shell scripts for file detection, pre-processing, and automated transfer to HDFS.
  • Responsible for performing schema-based validation using a combination of Shell scripting and PySpark, ensuring data quality before processing.
  • Automated data ingestion workflows from landing to HDFS and staging layers, including error handling and logging.
  • Implemented loading of validated data into staging tables, and performed comparisons with target tables for consistency checks.
  • Implemented incremental data processing pipelines that handle daily deltas efficiently.
  • Designed and maintained Change Data Capture (CDC) logic to identify and apply inserts, updates, and deletes between raw and target tables, implementing both SCD Type 1 (overwrite existing records) and SCD Type 2 (track historical changes with versioning and effective dates).
  • Responsible for using PySpark for efficient data transformation, validation, and merging operations on large-scale datasets.
  • Scheduled and orchestrated ETL workflows using Autosys, with clear job dependencies, and error notification setups.
  • Responsible for troubleshooting job failures, data mismatches, and file issues, ensuring end-to-end data flow reliability.
  • Responsible for maintaining detailed documentation of ETL logic, schema validation rules, and job execution steps.
  • Collaborated with upstream providers and downstream data consumers to align on data quality and availability SLAs.
  • Responsible for optimizing Spark jobs and HDFS storage for performance, scalability, and cost-efficiency.
  • Ensured compliance with data governance and security standards throughout the data pipeline.

Intern

Social Prachar
Hyderabad
12.2019 - 03.2020
  • Collected, cleaned, and preprocessed data using Pandas, NumPy, and scikit-learn to prepare it for machine learning modeling.
  • Conducted exploratory data analysis (EDA) to uncover patterns, correlations, and outliers using Matplotlib and Seaborn.
  • Built and evaluated multiple ML models, including Linear Regression, Logistic Regression, Decision Trees, Random Forests, and K-Nearest Neighbors (KNN).
  • Split the dataset into training and test sets using train_test_split, and apply cross-validation to assess model robustness.
  • Evaluated models using metrics such as accuracy, precision, recall, and confusion matrix.

Education

Master of Science - Computer Science

University of North Texas
Denton, Texas
12-2022

Bachelor of Science - Computer Science

IARE
Hyderabad, India
06-2020

Skills

  • Big data technologies
  • SQL and databases
  • PL/SQL Programming
  • Java programming
  • Python programming language
  • PySpark programming
  • UNIX shell scripting
  • Git version control
  • AWS cloud services
  • Azure cloud services
  • Power BI
  • Tableau
  • Azure Data Factory (ADF)
  • Azure Data Bricks

Accomplishments

  • Qualified for the Infosys HackWithInfy in 2019.
  • Qualified for the TCS CodeVita in 2019.
  • Solved around 150 programming challenges in leet code platform.

Timeline

Data Engineer

PepsiCo
06.2024 - Current

Data Engineer

Bank of America
04.2023 - 06.2024

Intern

Social Prachar
12.2019 - 03.2020

Master of Science - Computer Science

University of North Texas

Bachelor of Science - Computer Science

IARE
Pranav Pandirla
Want your own profile? Create for free at Resume-Now.com