Summary
Overview
Work History
Education
Skills
Timeline
Generic

Priyanka Tella

Dallas

Summary

Accomplished Data Engineer with building scalable data pipelines, automating business processes, and ensuring data quality across cloud and big data platforms. Proficient in Python scripting, SQL, and PySpark for developing reusable automation frameworks and complex data transformations. Hands-on expertise in AWS, Azure, and workflow orchestration tools like Airflow and Autosys. Skilled in supporting production-grade data operations and quality workflows in manufacturing-like environments. Experience collaborating on AI/ML pipeline automation.Strong focus on data integrity, performance optimization, and business-aligned data architecture.

Overview

7
7
years of professional experience

Work History

Data Engineer

Bank of America
Dallas
02.2024 - Current
  • Full SDLC Involvement: Hands-on experience in requirements gathering, analysis, design, development, and testing using Agile methodologies, ensuring seamless end-to-end business solutions while handling sensitive data.
  • CI/CD & Deployment: Deployed code across multiple environments using CI/CD pipelines, resolved defects during ISIT, CIT, and UAT testing, and provided support for data loads in testing phases. Developed reusable components to minimize manual intervention.
  • Python Automation & Troubleshooting: Automated AWS S3 data upload/download using Python scripts, resolved critical Python application bugs, and optimized existing processes with advanced methodologies.
  • Big Data & PySpark Development: Built PySpark-based data pipelines leveraging Spark DataFrames, executing jobs on AWS EMR and storing data in S3 for large-scale ETL processing.
  • SQL Expertise: Wrote high-performance data transformations and aggregations using Spark SQL on top of large datasets stored in S3, Hive, and HDFS, improving query response times by 40% through partitioning and caching strategies.
  • Cloud & AWS Engineering: Implemented AWS Step Functions to automate SageMaker tasks, including data publishing, ML model training, and deployment. Managed AWS Lambda, EC2 provisioning, VPCs, and security groups to support cloud infrastructure.
  • Database & Query Optimization: Implemented distributed SQL querying using Presto (Starburst variant) to run federated queries across heterogeneous sources, including AWS S3 and Snowflake, significantly reducing data latency for downstream analytics.
  • ETL & Data Integration: Led end-to-end data processing using Apache Spark with both Scala and PySpark, enabling flexible and scalable ETL workflows across cloud environments.
  • REST API Development: Designed and developed RESTful APIs using Python with PostgreSQL, ensuring efficient data communication across services.
  • Software Testing & Quality Assurance: Created unit test cases, test plans, and test specifications, improving test coverage and ensuring robust functionality across all Python-based applications.

Data Engineer

Wells Fargo
Dallas
07.2023 - 01.2024
  • Worked on the veteran’s project while dealing with most sensitive data in End-to-End business solutions.
  • Led the migration of data processing workflows from SAS to Python and PySpark, optimizing legacy scripts into scalable data pipelines for enhanced performance and maintainability.
  • Developed PySpark-based ETL workflows to integrate, enrich, and transform large datasets from diverse sources, ensuring data quality and consistency across critical business reporting processes.
  • Collaborated with business teams to analyze data discrepancies, building SQL and Python-driven validation scripts to proactively detect and resolve data quality issues in production environments.
  • Created automated data processing workflows using Python scripts and scheduling tools (Autosys, Jenkins), reducing manual intervention in routine data validation and reporting tasks by 35%.
  • Utilized Tableau dashboards to visualize survey data trends and quality metrics, enabling business stakeholders to monitor data integrity and make informed decisions based on real-time analytics.
  • Created a modular solution for a generic workflow with Autosys to automate data processing.

Data Engineer

Groundspeed Analytics
Dallas
09.2022 - 06.2023
  • Responsible for the execution of big data analytics, predictive analytics, and machine learning initiatives.
  • Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake.
  • Utilized AWS services with focus on big data architect, analytics and enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Developed Scala scripts, UDF's using both data frames/SQL and RDD in Spark for data aggregation, queries and writing back into S3 bucket.
  • Experienced working in data cleansing and data mining.
  • Wrote, compiled, and executed programs as necessary using Apache Spark in Scala to perform ETL jobs with ingested data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis.
  • Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake.
  • Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Worked extensively with columnar storage formats including Parquet, Avro, and ORC for efficient serialization and querying; created ingestion scripts to convert raw JSON/CSV into Avro/ORC and registered them in Hive Metastore.
  • Designed Spark jobs using Scala to read Avro and ORC files from AWS S3, apply schema evolution and transformations, and write back to S3 in optimized columnar formats for downstream consumption in BI tools.
  • Implemented AWS Lambda functions to run scripts in response to events in Amazon DynamoDB table or S3 bucket or to HTTP requests using Amazon API gateway.
  • Migrated data from AWS S3 bucket to Snowflake by writing custom read/write snowflake utility function using Scala.
  • Profile structured, unstructured, and semi-structured data across various sources to identify patterns in data and Implement data quality metrics using necessary queries or python scripts based on source.
  • Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created DAGs to run the Airflow.
  • Created DAG to use the Email Operator, Bash Operator, and spark Livy operator to execute and in EC2 instance.

Data Engineer

Optum
Hyderabad
03.2020 - 04.2022
  • Created Linked Services for multiple source system (Azure SQL Server, ADLS, BLOB, Rest API).
  • Created Pipeline’s to extract data from on premises source systems to azure cloud data lake storage, extensively worked on copy activities and implemented the copy behavior’s such as flatten hierarchy, preserve hierarchy and Merge hierarchy, Implemented Error Handling concept through copy activity.
  • Exposure on Azure Data Factory activities such as Lookups, Stored procedures, if condition, for each, Set Variable, Append Variable, Get Metadata, Filter and wait.
  • Configured the logic apps to handle email notification to the end users and key shareholders with the help of web services activity.
  • Created dynamic pipeline to handle multiple sources extracting to multiple targets and extensively used azure key vaults to configure the connections in linked services.
  • Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines and monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notification of failure pipelines.
  • Extensively worked on Azure Data Lake Analytics with the help of Azure Data bricks to implement SCD-1, SCD-2 approaches.
  • Created Azure Stream Analytics Jobs to replication the real time data to load to Azure SQL Data warehouse.
  • Implemented delta logic extractions for various sources with the help of control table; implemented the Data Frameworks to handle the deadlocks, recovery, logging the data of pipelines.
  • Understand the latest features like (Azure DevOps, OMS, NSG Rules, etc..,) introduced by Microsoft Azure and utilized it for existing business applications.
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
  • Developing Spark (Scala) notebooks to transform and partition the data and organize files in ADLS.
  • Working on Azure Data bricks to run Spark-Python Notebooks through ADF pipelines.
  • Using Data bricks utilities called widgets to pass parameters on run time from ADF to Data bricks.
  • Created Triggers, PowerShell scripts and the parameter JSON files for the deployments.
  • Worked with VSTS for the CI/CD Implementation.
  • Reviewing individual work on ingesting data into azure data lake and provide feedbacks based on reference architecture, naming conventions, guidelines, and best practices.
  • Implemented End-End logging frameworks for Data factory pipelines.

Software Engineer

Kastech
Hyderabad
03.2018 - 03.2020
  • Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena and Glue.
  • Hands on Experience in migrating datasets and ETL workloads with Python from On-prem to AWS Cloud services.
  • Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams.
  • Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.
  • Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.
  • Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines.
  • Worked on full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption.
  • Developed AWS lambdas using Python and Step functions to orchestrate data pipelines.
  • Worked on automating the infrastructure setup, launching and termination EMR clusters etc.
  • Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce series of aggregated datasets for downstream analysis.
  • Build real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift.
  • Worked on creating Kafka producers using Kafka Java Producer API for connecting to external Rest live stream application and producing messages to Kafka topic.
  • Implemented a Continuous Delivery pipeline with Bitbucket and AWS AMI's.
  • Designed, documented operational problems by following standards and procedures using Jira.

Education

Bachelor of Science - Computer Science

Acharya Nagarjuna University
Guntur, AP

Skills

  • Data pipeline development
  • Python automation
  • pyspark programming
  • SQL and NoSQL expertise
  • Rest API design
  • Cloud architecture
  • Data integration and quality assurance
  • Big data analytics
  • CI/CD pipelines

Timeline

Data Engineer

Bank of America
02.2024 - Current

Data Engineer

Wells Fargo
07.2023 - 01.2024

Data Engineer

Groundspeed Analytics
09.2022 - 06.2023

Data Engineer

Optum
03.2020 - 04.2022

Software Engineer

Kastech
03.2018 - 03.2020

Bachelor of Science - Computer Science

Acharya Nagarjuna University
Priyanka Tella