Lavanya M

Data Engineer

People Tech Group

05.2018 - Current

Developed and deployed ETL pipelines using Python, SQL, and Scala to streamline data ingestion, transformation, and delivery across cloud and on-prem environments.
Built and scheduled data integration workflows with Informatica, AWS Glue, and Azure Data Factory to automate critical data processes.
Processed and transformed high-volume datasets using Apache Spark and Hadoop, improving job performance and reducing runtime.
Managed and queried large datasets using Hive and Pig for data summarization, cleansing, and business reporting.
Implemented secure file transfers and ingestion processes using SFTP for loading third-party and internal datasets.
Built and orchestrated workflows using Apache Airflow and Autosys, enabling automated and reliable job execution.
Created and maintained data models in Snowflake, Amazon Redshift, and BigQuery to support enterprise analytics and reporting tools.
Tuned SQL queries and stored procedures across PostgreSQL, MySQL, and SQL Server to reduce processing time and enhance performance.
Used Toad and Hue to debug, validate, and optimize queries running on Oracle and Hive environments.
Leveraged Git and Jenkins to build CI/CD pipelines for deploying Spark jobs and SQL scripts across dev, QA, and prod environments.
Monitored data pipelines and resource usage with AWS CloudWatch, addressing failures and optimizing resource consumption.
Worked with Docker containers to package and deploy data processing applications in a portable and consistent environment.
Created infrastructure-as-code using Terraform to automate deployment and configuration of cloud resources in AWS and Azure.
Scheduled ETL and reporting jobs through Autosys, ensuring time-critical data availability for downstream systems.
Developed Google Apps Script automations for integrating and transforming spreadsheet data into Snowflake and BigQuery.
Consumed REST APIs and automated data pulls using Python, integrating external data into enterprise data warehouses.
Applied indexing, partitioning, and performance tuning strategies in Hive and Snowflake to reduce query latency and improve efficiency.
Validated access control policies and RBAC configurations in Snowflake and Hadoop to ensure secure data usage.
Collaborated across multiple teams at GM to deliver scalable, high-availability data pipelines supporting multiple departments.
Conducted ETL unit testing and data validation to ensure accuracy, completeness, and compliance with enterprise standards.
Built secure SFTP-based data pipelines to automate ingestion of partner data files into Hadoop and cloud environments.
Created Hive tables and applied partitioning strategies to optimize query performance on high-volume datasets.
Used Hue extensively for writing, testing, and debugging Hive and HQL scripts in development and QA environments.
Scheduled and monitored production ETL jobs using Autosys, ensuring timely and reliable data delivery across systems.
Developed Spark jobs in Python to transform semi-structured data from Hadoop into Snowflake-ready formats.
Tuned complex SQL queries to reduce runtime and improve performance of reporting dashboards and analytics tools.
Performed root cause analysis and data validation using Toad for Oracle, identifying discrepancies across multiple data sources.
Leveraged Spark DataFrames and RDDs for parallel data processing and aggregations across large Hadoop datasets.
Designed reusable Python modules to handle common transformation logic, SFTP file handling, and logging across ETL workflows.
Integrated Hive with Spark to run large-scale transformations and joins, reducing dependency on traditional batch processing tools.