Summary
Overview
Work History
Education
Skills
Timeline
Generic

PONNAM SAI VINEETH

Irving

Summary

  • 7 + years of experience in Data Engineering and implementing Data Warehousing solutions, with hands-on experience on Azure cloud platforms, specializing in building ETL ingestion flows using Azure Data Factory, leveraging Azure Databricks and Azure Key Vault.
  • Extensive work with PySpark and Spark SQL for data processing, and proficiency in Snow SQL and Snowpipe for continuous data ingestion, along with writing complex stored procedures within the Snowflake environment.
  • Proficient in Google Cloud Platform (GCP) services, including Google BigQuery for data analytics, Cloud Composer for scheduling DAGs, and Google Cloud Storage (GCS) for data storage and management.
  • Experience working within cross-functional Agile Scrum teams, with excellent communication and interpersonal skills, strong understanding of reporting objects, data modeling, and SQL across multiple dialects. Skilled in developing reports and dashboards using Power BI for data visualization and business insights.

Overview

8
8
years of professional experience

Work History

DATA ENGINEER

Albertsons
08.2021 - Current
  • Data Integration and Transformation: Integrated and transformed data from various structured and unstructured sources into meaningful business insights.
  • Offshore Team Management: Led and managed an offshore team, ensuring project goals were achieved and high-quality results were delivered.
  • Historical Data Migration: Executed comprehensive historical data loads, ensuring accurate and efficient migration.
  • Airflow and Snowflake: Utilized Airflow and Cloud Composer for orchestrating and scheduling complex workflows, including SQL queries, data loading, and extraction tasks in Snowflake and GCP environments.
  • Technical Skills: Proficient in Snowflake components (Snowpipe, Stage, Streams, Tasks, SnowSQL, Views, Data Modelling, CI/CD, Clones, Time Travel, Stored Procedures), and experienced with JSON, XML, CSV formats, and tools like Offset Explorer, Putty, WinSCP, Stonebranch, Control Center for data validation and monitoring.
  • GCP Expertise: Worked on data migration from Snowflake to GCP BigQuery, using GCS buckets, and familiar with GCP components like Dataflow, Pub/Sub, BigQuery, Cloud Composer, and DataProc.
  • Reporting Tools: Developed Power BI dashboards and reports, leveraging data from Snowflake to provide actionable insights.
  • ETL and Automation: Collaborated with leaders to develop tools, reports, and predictive models. Implemented efficient ETL processes and played a key role in setting up autoscaling and archival frameworks to enhance performance and automate processes.

Data Engineer

GAP
08.2019 - 06.2021
  • Designed and built scalable ingestion pipelines using AWS Glue, AWS Lambda, and Spark (PySpark and Scala) to process daily extracts, enabling both real-time and batch data processing for downstream reporting and analytics.
  • Developed AWS Glue ETL pipelines using PySpark and Scala for data transformation and loading from multiple sources (AWS S3, Redshift, RDS, etc.), ensuring seamless data integration across cloud-based platforms.
  • Leveraged Spark (PySpark and Scala) within AWS EMR to process large-scale datasets, optimizing both performance and scalability of data transformation jobs.
  • Utilized Amazon Redshift and Snowflake for large-scale data storage, query optimization, and improving data retrieval efficiency for analytics.
  • Implemented event-driven architecture using AWS Lambda in conjunction with S3 and AWS Glue, automating data ingestion and processing workflows.
  • Integrated AWS Lake Formation with Glue Data Catalog to establish data governance, centralize access control, and improve data security across multiple data sources.
  • Created interactive dashboards using AWS QuickSight for business intelligence, visualizing key business metrics and enabling ad-hoc analysis to provide actionable insights to stakeholders.
  • Developed complex data transformation logic and business rules in Spark using both PySpark and Scala, ensuring high performance and scalability in data pipelines.
  • Optimized Glue job performance by fine-tuning execution configurations (memory, parallelism, etc.), reducing job runtime by 30% and improving resource utilization.
  • Automated the testing of Glue jobs using Python unit tests and integrated continuous integration/continuous deployment (CI/CD) practices with tools like Jenkins and GitHub to ensure the accuracy and reliability of data transformations before production deployment.
  • Provided end-to-end production support for AWS Glue jobs, Lambda functions, and ingestion pipelines, troubleshooting failures, and improving system stability by reducing downtime.
  • Collaborated with cross-functional teams in an Agile Scrum environment to design and implement ETL and data processing solutions, ensuring alignment with business objectives and timely delivery.
  • Worked with Docker and EKS to containerize data processing components, ensuring scalability and portability in the cloud Environment

ASSOCIATIVE DATA ENGINEER

Katalyst
Hyderabad
12.2017 - 03.2019
  • Linked Services and Data Sources: Created linked services for various databases, including Oracle, Teradata, SQL Server, SAP HANA, on-premises file share, blob storage, and ADLS Gen2, enabling seamless data integration from diverse sources.
  • Azure Data Factory (ADF): Utilized ADF activities such as Lookups, Stored Procedures, conditionals, loops, variable manipulation, metadata retrieval, and filtering to design and implement data pipelines
  • ADF Triggers and Monitoring: Configured and scheduled ADF pipelines using triggers and monitored pipeline execution.Implemented alert notifications for pipeline failures
  • Self-Hosted Integration Runtime: Configured and managed the self-hosted integration runtime, facilitating data integration between on-premises and cloud environments
  • Databricks and Spark: Implemented logging frameworks, ETL logics, validation frameworks, user-defined frameworks, and various data handling techniques (e.g., SCD Type 1 and 2, UPSERT). Leveraged ADB and ADF for running Spark-Python and Spark-Scala notebooks, respectively.
  • Research and Documentation: Conducted research and documented best practices and standards for utilizing our BI tools, ensuring optimal usage and adherence to industry trends and guidelines.

Education

Master of Science - Computer Science

Wichita State University
Wichita, KS
05.2021

Bachelor of Technology - Computer Science

Gitam University
Hyderabad, INDIA
04.2018

Skills

Operating Systems: Unix and Windows

Databases: SQL, PL/SQL, NoSQL, Mongo DB, Teradata,Oracle, SQL Server, AZURE DW,SNOWSQL

Reporting Tools: Tableau, Power BI

Programming Languages: PYTHON, R, SQL,

Version Source Control: GitHub SDLC Methodologies: Agile, SCRUM

Azure Services: Azure Data Factory, Azure data bricks,ADLS Gen 1& 2, SSIS, Azure key vault, Blob Storage, Event Hub, Log analytics, Cosmos DB, ADLA, ADLS

AWS Services: Glue, Redshift, S3,RDS, Athena, EMR,Lambda,Quicksight,DyanoDB

GCP services: BQ, Cloud Composer, Data Flow, GCS bucket

Certifications: Microsoft AZ-900 Certificate Number: H802-3508

Google Associative Cloud Engineer Series ID: 130283

Timeline

DATA ENGINEER

Albertsons
08.2021 - Current

Data Engineer

GAP
08.2019 - 06.2021

ASSOCIATIVE DATA ENGINEER

Katalyst
12.2017 - 03.2019

Master of Science - Computer Science

Wichita State University

Bachelor of Technology - Computer Science

Gitam University
PONNAM SAI VINEETH