Summary
Overview
Work History
Education
Skills
Websites
Personal Information
Timeline
Generic

Divya Desham

Dallas

Summary

Around 6 years of experience in Software Development with strong focus on Big Data, Hadoop and Spark. Strong expertise in Big Data ecosystem like Spark, Hive, Sqoop, HDFS, Map Reduce, Kafka, Yarn. Developed production ready Spark applications using Data frames, Datasets, Spark SQL and Spark Streaming. Solid experience in using various file formats like CSV, XML, Parquet, ORC, JSON. Strong knowledge of NoSQL databases and worked with HBase, Cassandra and Mongo DB. Experience in using cloud services like Amazon EMR, S3, EC2, Red shift, Athena and Azure Databricks, Azure Data Factory. Worked on Spark Streaming and Structured Spark streaming including Kafka for real time data processing. Good knowledge in Oracle PL/SQL and shell scripting. Worked extensively in Agile methodology to complete projects continuously and collaboratively. Having strong analytical and problem-solving skills and can resolve complex technical issues. Seasoned Senior Data Engineer with background in developing, testing, and maintaining data architectures. Possess strong skills in database management systems, Big Data processing frameworks, data modeling and warehousing. Have successfully led teams in creating innovative data solutions to improve system efficiency and business decision-making processes. Demonstrated impact through enhanced data availability and accuracy in previous roles.

Overview

6
6
years of professional experience

Work History

Senior Data Engineer

Wells Fargo
Texas
11.2024 - Current
  • Optimized pipelines for cost, latency, and traceability, ensuring reproducibility and consistency across environments.
  • Helped optimize Redis caching layers and Aurora PostgreSQL access patterns to reduce query latency for high-traffic APIs, configured read/write traffic splitting using RDS Proxy and ALB listener rules.
  • Developed internal tooling and ETL components in Python to assist with data labeling, normalization, and cataloging across supplier datasets.
  • Participated in automating CI/CD workflows and testing environments using Terraform, GitHub Actions.
  • Environment: AWS EC2, Glue, Lambda, RDS, S3, Secrets Manager

Data Engineer

United Health Group
Texas
04.2023 - 10.2024
  • Collaborated with Business Analysts, and SMEs across departments to gather business requirements, and identify workable items for further development.
  • Partner with ETL developers to ensure that data is well cleaned, and the data warehouse is up to date for reporting purposes by Hive.
  • Selected and generated data into CSV files and stored them into AWS S3 by using AWS EC2 and then structured and stored in AWS Redshift.
  • Processed some simple statistical analysis of data profiling like cancel rate, var, skew, Kurt of trades, and runs of each stock everyday group by 1 min, 5 min, and 15 min.
  • Used PySpark and Pandas to calculate the moving average and RSI score of the stocks and generated them into a data warehouse.
  • Designed and implemented Infrastructure as Code (IaC) practices with Terraform to standardize deployment of data lake components and EMR clusters, reducing manual configuration errors and enabling faster environment setup.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, OpenShift, pair RDDs.
  • Utilized Terraform to manage infrastructure on AWS, creating reusable modules for deploying scalable and secure data environments.
  • Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
  • Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.
  • Designed custom data validation frameworks in Python to ensure data quality and consistency across pipelines.
  • Developed and validated machine learning models including Ridge and Lasso regression for predicting the total amount of trade.
  • Automated data engineering workflows with Python scripts, improving efficiency and reducing manual interventions by 40%.
  • Leveraged Spark SQL to query structured data efficiently from distributed data stores like HDFS and Amazon S3.
  • Boosted the performance of regression models by applying polynomial transformation and feature selection and used those methods to select stocks.
  • Environment: Spark, AWS, AWS S3, AWS Redshift, SQL, Snowflake, Jenkins, Git.

Big Data Developer

GSV SOFT SYSTEM
Hyderabad
08.2019 - 08.2022
  • Developed multiple Spark applications in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file format.
  • Implemented data pipeline to read data from DB2 using spark SQL and load into Dataframes and write as ORC files.
  • Developed a framework using Pyspark to generate Parquet and CSV files from hive and snowflake tables.
  • Experience managing Azure Data Lakes (ADLs) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
  • Spark integration of data storage systems, particularly AZURE Data Lake and Blob storage.
  • Using PySpark and AZURE Data Factory, design, build, and implement large ETL pipelines.
  • Implemented logging and error-handling mechanisms in Python code to ensure robust and maintainable data workflows.
  • Developed JSON flattening framework using JSON schema in spark.
  • Developed test scripts for unit and integration testing.
  • Good experience with Unix commands.
  • Environment: Pyspark, Hive, Sqoop, Python, Azure, Snowflake.

Education

Bachelor of technology - Electronics Communication Engineering

Sri Indu College of Engineering
Hyderabad, India
01.2019

Skills

  • Spark and MapReduce
  • HDFS and HIVE
  • HBase and Pyspark
  • Cloudera (CDH) and AWS EC2
  • EMR and S3
  • Redshift and Athena
  • Glue and Step Function
  • Lambda and S3 Event Notification
  • RDS and Azure HDInsight
  • Azure Databricks and Azure Data Factory
  • Azure SQL DW and Oracle
  • MySQL and MS-SQL Server
  • Cassandra and MongoDB
  • DB2 and Python
  • SQL and Scala
  • PL/SQL and Shell scripting
  • Terraform and Java
  • JavaScript, CSS, HTML
  • Windows, UNIX/Linux, Mac OS

Personal Information

Visa Status: H4 EAD

Timeline

Senior Data Engineer

Wells Fargo
11.2024 - Current

Data Engineer

United Health Group
04.2023 - 10.2024

Big Data Developer

GSV SOFT SYSTEM
08.2019 - 08.2022

Bachelor of technology - Electronics Communication Engineering

Sri Indu College of Engineering
Divya Desham