Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Lavanya M

Dallas,TX

Summary

I work as a Senior Data Engineer with 7 years of experience in designing and building data solutions that support large-scale, enterprise-grade analytics. In my current role, I’m part of a team responsible for developing and maintaining end-to-end ETL pipelines that power critical data platforms within our Organization. Our work involves integrating and transforming data from multiple sources, ensuring it is clean, reliable, and accessible for downstream applications. We build and manage data workflows using technologies like SQL, Python, Spark, and HQL, while leveraging platforms such as AWS (EMR, S3, Lambda), Azure Data Factory, Snowflake, and Hadoop for cloud-based storage, processing, and orchestration. My day-to-day responsibilities include writing and optimizing PySpark jobs, orchestrating ETL pipelines, and ensuring seamless data movement between systems, both cloud-native and on-premises. I work closely with cross-functional teams to ensure data consistency, optimize job performance, and align data workflows. I also take an active role in implementing data quality checks, improving pipeline reliability, and driving automation to enhance operational efficiency.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Data Engineer

People Tech Group
05.2018 - Current
  • Developed and deployed ETL pipelines using Python, SQL, and Scala to streamline data ingestion, transformation, and delivery across cloud and on-prem environments.
  • Built and scheduled data integration workflows with Informatica, AWS Glue, and Azure Data Factory to automate critical data processes.
  • Processed and transformed high-volume datasets using Apache Spark and Hadoop, improving job performance and reducing runtime.
  • Managed and queried large datasets using Hive and Pig for data summarization, cleansing, and business reporting.
  • Implemented secure file transfers and ingestion processes using SFTP for loading third-party and internal datasets.
  • Built and orchestrated workflows using Apache Airflow and Autosys, enabling automated and reliable job execution.
  • Created and maintained data models in Snowflake, Amazon Redshift, and BigQuery to support enterprise analytics and reporting tools.
  • Tuned SQL queries and stored procedures across PostgreSQL, MySQL, and SQL Server to reduce processing time and enhance performance.
  • Used Toad and Hue to debug, validate, and optimize queries running on Oracle and Hive environments.
  • Leveraged Git and Jenkins to build CI/CD pipelines for deploying Spark jobs and SQL scripts across dev, QA, and prod environments.
  • Monitored data pipelines and resource usage with AWS CloudWatch, addressing failures and optimizing resource consumption.
  • Worked with Docker containers to package and deploy data processing applications in a portable and consistent environment.
  • Created infrastructure-as-code using Terraform to automate deployment and configuration of cloud resources in AWS and Azure.
  • Scheduled ETL and reporting jobs through Autosys, ensuring time-critical data availability for downstream systems.
  • Developed Google Apps Script automations for integrating and transforming spreadsheet data into Snowflake and BigQuery.
  • Consumed REST APIs and automated data pulls using Python, integrating external data into enterprise data warehouses.
  • Applied indexing, partitioning, and performance tuning strategies in Hive and Snowflake to reduce query latency and improve efficiency.
  • Validated access control policies and RBAC configurations in Snowflake and Hadoop to ensure secure data usage.
  • Collaborated across multiple teams at GM to deliver scalable, high-availability data pipelines supporting multiple departments.
  • Conducted ETL unit testing and data validation to ensure accuracy, completeness, and compliance with enterprise standards.
  • Built secure SFTP-based data pipelines to automate ingestion of partner data files into Hadoop and cloud environments.
  • Created Hive tables and applied partitioning strategies to optimize query performance on high-volume datasets.
  • Used Hue extensively for writing, testing, and debugging Hive and HQL scripts in development and QA environments.
  • Scheduled and monitored production ETL jobs using Autosys, ensuring timely and reliable data delivery across systems.
  • Developed Spark jobs in Python to transform semi-structured data from Hadoop into Snowflake-ready formats.
  • Tuned complex SQL queries to reduce runtime and improve performance of reporting dashboards and analytics tools.
  • Performed root cause analysis and data validation using Toad for Oracle, identifying discrepancies across multiple data sources.
  • Leveraged Spark DataFrames and RDDs for parallel data processing and aggregations across large Hadoop datasets.
  • Designed reusable Python modules to handle common transformation logic, SFTP file handling, and logging across ETL workflows.
  • Integrated Hive with Spark to run large-scale transformations and joins, reducing dependency on traditional batch processing tools.

Software Engineer Intern

Ramp Group
03.2017 - 06.2017
  • Assisted in developing and maintaining Python-based data processing scripts to automate data ingestion and transformation workflows.
  • Contributed to building ETL pipelines that moved data from AWS S3 into Snowflake, supporting analytics and reporting teams.
  • Wrote and optimized SQL queries in Snowflake for data validation, aggregation, and business insights.
  • Supported CI/CD integration by configuring Jenkins jobs for automated deployment of Python applications and ETL scripts.
  • Participated in monitoring and debugging data pipeline failures using Jenkins logs and AWS CloudWatch.
  • Collaborated with senior engineers to design scalable data solutions and implement best practices in code versioning and testing.

Education

Master of Science - MSIT

Lawrence Technological University
Southfield, MI
07-2017

Bachelor of Science - Civil Engineering

Jawaharlal Nehru Technological University(JNTUH)
Hyderabad
06-2015

Skills

  • Programming Languages: Python, SQL, Scala
  • ETL Tools: Informatica, AWS Glue, Azure Data Factory, Apache NiFi
  • Big Data Technologies: Apache Spark, Hadoop, Hive, Pig
  • Cloud Platforms: AWS, Azure, Google Cloud Platform
  • Data Warehousing: Snowflake, Amazon Redshift
  • Databases: PostgreSQL, MySQL, Oracle, Toad
  • Workflow Orchestration: Apache Airflow
  • DevOps & CI/CD: Jenkins, Docker, Kubernetes, Terraform
  • Data Modeling & Design: Snowflake Schema, ER Diagrams
  • Monitoring & Logging: Grafana, CloudWatch

Certification

AWS Certified Developer - Associate

Timeline

Data Engineer

People Tech Group
05.2018 - Current

Software Engineer Intern

Ramp Group
03.2017 - 06.2017

Master of Science - MSIT

Lawrence Technological University

Bachelor of Science - Civil Engineering

Jawaharlal Nehru Technological University(JNTUH)