Summary
Overview
Work History
Education
Skills
Websites
Blog
Timeline
Generic

Pallavi Rayadurgam

Dallas,TX

Summary

Results-driven Data Engineer with 8+ years of experience designing and implementing scalable, cloud based data pipelines and architectures. Proven expertise in Databricks, Delta Lake, Azure, AWS, GCP and Snowflake. Adept in building Medallion Architectures, automating ETL workflows, and integrating LLMs like LangChain and Gemini for intelligent data access. Strong communicator with a track record of delivering high-impact data solutions aligned to business objectives.

Overview

10
10
years of professional experience

Work History

Senior Data Engineer

Publicis Health Media
06.2021 - Current
  • Designed and deployed LLM-powered chatbot using LangChain and Gemini that interprets natural language questions, reads database schemas, and auto-generates optimized SQL queries for structured data exploration.
  • Built and optimized data pipelines in Databricks to automate inbound and outbound file processing workflows using Azure Blob Storage, SFTP integration, and Delta Lake.
  • Migrated legacy storage pipelines to Databricks Volumes and adopted Unity Catalog for centralized governance and fine-grained access control.
  • Collaborated with cross-functional business teams to gather requirements and design interactive Tableau dashboards that provided actionable insights for marketing and operations.
  • Developed and maintained performance-optimized Tableau dashboards using curated data from Snowflake, Redshift and DeltaTables, improving stakeholder visibility into key KPIs and trends.
  • Implemented scalable ingestion workflows to pick files from Blob SFTP, transform data using PySpark, and load into Azure SQL and Unity Catalog-managed tables.
  • Designed and implemented Medallion Architecture (Bronze, Silver, Gold layers) in Databricks to organize raw, refined, and curated data for downstream analytics and reporting.
  • Accessed AWS Redshift data shared by external partners to perform data analysis and integrate third-party datasets into internal workflows.
  • Used Snowflake Data Share & Databricks Unity Catalog Delta sharing with external data partners, enabling seamless cross-organizational collaboration.
  • Accessed and analyzed Redshift data using Jupyter Notebooks for quick exploration and model prototyping in Python.
  • Orchestrated end-to-end data pipelines using Apache Airflow to trigger Databricks jobs and manage data ingestion from Azure Blob Storage, ensuring scalable and reliable execution across batch processing workflows
  • Implemented Azure Functions to trigger real-time alerts and notifications based on pipeline status and data validation rules, enhancing operational monitoring and response time.
  • Created Data factory (ADF) pipelines that can bulk copy multiple tables at once from relational database to Azure data lake gen2.
  • Developed Spark applications using PySpark & Spark SQL in Databricks for extraction, transformation, aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Experienced in using Send grid API to automate emails about file transfers
  • Integrated python with Box API to upload and download files from blob storage.
  • Developed and orchestrated containerized services using Kubernetes and CI/CD tools to automate pipeline deployment and scaling in AWS and Azure environments
  • Environment: PySpark, Databricks, Azure (Blob, ADF, Functions, Key Vault), AWS (Redshift, S3, EC2), Snowflake, Unity Catalog, Delta Lake, Tableau, LangChain, Gemini, Airflow, Kubernetes.

AWS Cloud Engineer

VERIZON
05.2019 - 06.2021
  • Developed Pyspark jobs to develop variety of models and algorithms for analytical purposes.
  • Developed Health check monitoring for ETL pipelines using python and performed transformations in SQL & written complex queries in SQL to find data load status for the past few days to display in dashboard.
  • Experienced in Automating, Configuring and deploying instances on AWS environments, also familiar with EC2, EBS Volumes, Cloud Formation and managing security groups on AWS.
  • Responsible for maintaining Linux based EC2 Instances & Windows based EC2 instances on various environments.
  • Experience in Linux/Windows environment (Cent OS, Windows Server 2012 R2).
  • Worked extensively on CLOB data types in SQL & PL/SQL procedures (Oracle), joins and sub queries to simplify complex queries involving multiple tables in SQL.
  • Developed pipeline to run pySpark jobs on EMR as step functions.
  • Developed Python scripts to process logs and consolidate jobs ran on different servers.
  • Used Python modules like Pandas and NumPy and date time to perform extensive data analysis.
  • Well versed in automation scripting using Perl, Python, Bash.
  • Converted Perl Scripts to Python.
  • Prototyped AWS CFT to launch AWS EC2 instances, create security groups and EBS volumes.
  • Used python to connect hive, oracle databases to perform data quality checks (validate count comparison) as part of data migration.
  • Worked in a real-time data streaming data using AWS kinesis, EMR and AWS Glue.
  • Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins along with Shell scripts to automate routine jobs.
  • Utilized Ansible playbook for code pipeline deployment.
  • Working knowledge on AWS Products and services like AWS EC2, EBS Volumes, Security Groups.
  • Knowledge in writing automation scripts using Shell scripting and to manage AWS.
  • Environment: AWS EC2, EBS, AWS EMR, AWS CLI, Security Groups, Oracle SQL, Hive SQL Perl, Python, CloudFormation, XML, YAML, Jenkins, Ansible, Git, Cent OS, Windows Server 2012 R, Windows 2016, Liquibase, PySpark.

Data Engineer

CAPITAL ONE
08.2017 - 05.2019
  • Developed pySpark jobs for data transformations as a process of data extraction.
  • Developed data pipelines for Inbound and Outbound data from vendors buckets using Python & AWS SDK Boto3 and deployed code to GitHub.
  • Identified patterns of behavior in customer migration to products and services.
  • Perform data cleansing, data imputation and data preparation using Pandas & NumPy.
  • Experience In building end to end data pipelines for data transfers using Python & AWS including Boto3.
  • Prototyped pipelines using Databricks notebooks, Snowflake and PySpark.
  • Coordinated with vendor data teams to push and validate marketing data into a Snowflake data warehouse.
  • Creating ETL pipeline using Python, Amazon S3, EC2, EMR and Snowflake database.
  • Experience in Spinning up EMR & EC2 using Amazon Cloud Formation Templates and thru Console for line of business.
  • Worked with job schedulers like Control-M to schedule jobs.
  • Managing Schema objects such as Tables, Views, Indexes and referential integrity depending on user requirements and converting them into technical specifications.
  • Migrated datasets from Teradata to AWS Snowflake thru S3 as Intermediate Storage.
  • Worked with Symphony data pipeline.
  • Working Knowledge of schema design and API documentation via platforms like Swagger.
  • Good working experience on Unix Shell Scripting and reusable scripts.
  • Used Shell Scripting for reading Variables based on Environment.
  • Build data pipeline based on Spark, AWS EMR, AWS S3.
  • Develop ETL logic by using Spark data frame and Dataset API in python.
  • Designed and implemented the risk assessments as microservices with RESTful API through Swagger using JSON.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop.
  • Performed Data Quality Checks for data sets using PySpark.
  • Responsible for creation of data quality queries to ensure key data is accounted.
  • Responsible for delivering datasets from Snowflake to One Lake Data Warehouse.
  • Environment: AWS EMR, AWS EC2, AWS S3, AWS-CLI, AWS Boto3, Snowflake, UNIX Scripting, BASH, Control M, Java, SQL, Python, Spark, Jira, GitHub, Airflow.

Python Developer

INVENTLA SOLUTIONS
05.2015 - 11.2015
  • Developed the application as per the functional requirements from the analysts.
  • Collect, track, and integrate multiple sources of big data.
  • Used Python modules like Pandas and NumPy and date time to perform extensive data analysis.
  • Blended data from multiple databases into one report by selecting primary keys from each database for data validation.
  • Analyzed and created dimensional data modeling to meet OLAP needs.
  • Resolved complex problems in less time.
  • Used several python libraries like Python, NumPy and matplotlib.
  • Developed Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop.
  • Experience in working with various Python Integrated Development, PyCharm, Atom.
  • Worked on joins and sub queries to simplify complex queries involving multiple tables.
  • Work with business teams to create Hive queries for ad hoc analysis.
  • Tested the whole application for errors screen by screens.
  • Environment: Python, Pandas, Oracle, Hive, Hadoop, Linux.

Education

Doctor of Computer science - Big Data Analytics

Colorado Technical University
04-2028

Masters - computer science and information systems

Southern Arkansas University

Bachelors - computer science engineering

Sri Venkateshwara College of Engineering

Skills

    Languages: Python, SQL, Scala, Bash, Java, C

    Big Data & Cloud: Azure Data Factory, Azure(Blob Storage, Functions,ADF, Synapse) Databricks, Delta Lake, AWS (S3, EC2, EMR, Lambda, Redshift), GCP, Snowflake

    Data Tools: PySpark, Spark SQL, Unity Catalog, Tableau, Jupyter, Git, Jenkins

    APIs & LLMs: LangChain, Gemini API, Box API, SendGrid

    Other: Azure Key Vault, Unix/Linux, Shell Scripting, Terraform, Ansible, Control-M, Docker, Kubernets

Blog

https://rayadurgampallavi.blogspot.com/

Timeline

Senior Data Engineer

Publicis Health Media
06.2021 - Current

AWS Cloud Engineer

VERIZON
05.2019 - 06.2021

Data Engineer

CAPITAL ONE
08.2017 - 05.2019

Python Developer

INVENTLA SOLUTIONS
05.2015 - 11.2015

Bachelors - computer science engineering

Sri Venkateshwara College of Engineering

Doctor of Computer science - Big Data Analytics

Colorado Technical University

Masters - computer science and information systems

Southern Arkansas University
Pallavi Rayadurgam