Summary
Overview
Work History
Education
Skills
Timeline
Generic

Hima Gangasani

Portland,OR

Summary

Experienced Data Engineer adept at leading large-scale data transformation projects utilizing Python, SQL, and ETL techniques. Collaborated on developing resilient data pipelines, improving cloud storage systems, and ensuring data integrity to boost operational efficiencies.

Overview

9
9
years of professional experience

Work History

Data Engineer

The Pokemon Company International
07.2024 - 07.2025
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability using Databricks.
  • Enhancing data quality by performing thorough cleaning, validation, and transformation tasks.
  • Worked on the end-to-end migration of data warehousing infrastructure from Snowflake to Databricks, reducing cloud spend through optimized compute and storage configuration.
  • Refactored SQL scripts and ETL pipelines to ensure compatibility with Apache Spark and Delta Lake during the migration process.
  • Designed and implemented a robust data ingestion framework in Databricks using Auto Loader and Delta Live Tables to replace Snowpipe-based ingestion.
  • Working on AWS technologies S3, Redshift, IAM, EC2, RDS.
  • Working on Infrastructure as Code methodologies utilizing terraform for configuring management tasks.

Environment: AWS S3, RDS, EC2, Redshift, IAM, Scala, Snowflake, Spark, SQL, Databricks, Terraform

Data Engineer

Lithia Motors
11.2023 - 04.2024
  • Worked on migrating data from on-prem SQL Server to Snowflake
  • Developing and managing ETL processes using ADF to schedule and orchestrate data workflows, ensuring timely and reliable data processing
  • Utilizing Python and its extensive libraries to develop scalable and efficient data processing scripts, automation workflows, and data analysis tools
  • Designing and implementing scalable data storage and processing solutions using HDFS, Hive, and Snowflake
  • Implementing data processing and analytics tasks using PySpark, Spark, and MapReduce to handle large-scale datasets
  • Designing and implementing data warehousing solutions on Azure Synapse, including schema design, data modeling, and optimizing query performance
  • Experience in writing optimized T-SQL queries for efficient data retrieval, manipulation, and analysis
  • Proficient in developing, optimizing, and maintaining stored procedures that enhance database performance and scalability
  • Designed and Implemented data load processes from data sources into Azure Data Lake and Azure Data warehouse.
  • Developing and managing ETL processes using Apache Airflow to schedule and orchestrate data workflows, ensuring timely and reliable data processing
  • Monitored databases and related systems to verify optimized performance.

Environment: SQL, T-SQL, Azure DevOps, Snowflake, Microsoft SQL Server, Databricks, Azure data lake, Azure SQL, ADF, Azure Synapse, Airflow, AWS, PySpark, Python, Github

Data Engineer

Nike
03.2022 - 11.2023
  • Designed, implemented, and maintained scalable data pipelines and ETL processes using technologies such as AWS, PySpark, Hadoop, Snowflake
  • Developed and optimized data ingestion, integration, and transformation processes to ensure efficient and reliable data movement across different systems
  • Hands-on experience in Spark, Spark Streaming, applying transformation and actions
  • Skilled in working with various cloud services like AWS EC2, S3, EMR, Athena, Glue, Redshift, Lambda
  • Demonstrated expertise in Transact-SQL (T-SQL), the procedural extension of SQL used in SQL Server for querying, modifying, and managing relational database data
  • Experience in implementing Databricks clusters and environments to process large-scale data
  • Worked on Databricks to process, analyze data using Databricks notebooks, Delta Lake for various data tasks
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python
  • Experience in Python, utilizing its extensive libraries and frameworks to develop robust and scalable solutions for data, analysis, and automation tasks
  • Experience in Snowflake, cloud-based data warehousing platform, to design and implement scalable and high-performance data solutions
  • Building and managing data warehouses, data lakes, and data marts to support data storage, organization, and retrieval needs
  • Ensuring data quality, integrity, and consistency by implementing data validation and cleansing techniques
  • Monitoring and troubleshooting ETL data pipelines and resolving performance issues using Apache Airflow
  • Writing complex SQL queries, creating stored procedures, and optimizing database performance
  • Implementing automation and orchestration tools like Airflow to schedule, monitor, and manage data workflows
  • Containerizing data engineering processes using Docker for easy deployment and scalability
  • Experienced in translating business requirements into data models, ensuring alignment between technical solutions and organizational objectives
  • Skilled in identifying data quality issues through comprehensive data profiling and analysis techniques, ensuring data meets established standards and requirements.

Environment: HDFS, Hive, AWS, EC2, S3, EMR, Redshift, Glue, Lambda, RDS, Github, SQL, T-SQL, Snowflake, Athena, Databricks, SQL Server, Pyspark, Spark, Python, Hadoop, MapReduce, Airflow, Docker, Jenkins, Microsoft SQL Server, JIRA, Confluence

Data Engineer

Providence Health and Services
03.2020 - 03.2022
  • Designing and implementing data pipelines and ETL processes to efficiently extract, transform, and load large volumes of structured and unstructured data
  • Created and managed Snowflake datawarehouse, databases, and roles, ensuring optimal performance and secure data access.
  • Developing and maintaining data integration solutions using tools like Informatica PC, SSIS, and Python, ensuring smooth data flow between various systems, databases, and data sources, while maintaining data quality and integrity
  • Worked on control-M workflow engine for job scheduling
  • Experience in writing dynamic SQL queries and scripts using T-SQL to build dynamic SQL statements based on runtime conditions or user input
  • Designing ETL processes using Informatica and SSIS to load data from Flat Files, Oracle, and Excel files to target Snowflake Data Warehouse database
  • Proven ability to collaborate with cross-functional teams including data engineers, analysts, and business stakeholders to develop and refine data models based on evolving requirements
  • In-depth knowledge of Snowflake Database, Schema and Table structures
  • Worked in designing, developing, and deploying custom reports using SSRS to meet business requirements
  • Involved in SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on the basis of using defect reports.

Environment: SQL, PL/SQL, Snowflake, AWS, EC2, EMR, Redshift, S3, RDS, Informatica PC, SSIS, SSRS, GitHub, TFS, Azure SQL, Azure DevOps, SQL Server, T-SQL, Python, MySQL, control-M, JIRA, Confluence.

Data Engineer

Capital One
03.2018 - 01.2020
  • Executed end-to-end data processing tasks, including data ingestion, processing, quality checks
  • Designing, implementing, and optimizing data warehousing solutions using Snowflake
  • Developed JSON scripts to deploy data pipelines in Azure Data Factory (ADF), utilizing SQL activities to process and manipulate data effectively
  • Created pipelines in ADF to extract, transform, and load data from diverse sources like Azure SQL and Azure SQL Data Warehouse
  • Experience in designing and developing applications in Spark using Python to compare performance of Spark with Hive and SQL/Oracle
  • Utilized Informatica as ETL tool and wrote stored procedures to extract data from source systems/files, cleanse and transform data, and load it into databases
  • Skilled in using T-SQL statements for defining and managing database objects such as tables, views, indexes, constraints, and triggers
  • Proficient in schema management tasks like creating, altering, and dropping database objects
  • Strong understanding of relational database concepts and SQL querying, enabling effective interaction with databases to validate and refine data models
  • Strong understanding of data governance principles and frameworks, facilitating the establishment of data quality policies, processes, and controls within the organization
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity, and verifying pipeline stability
  • Experienced in performance tuning and optimization of data warehouse processes and queries to meet SLAs and improve overall system performance
  • Imported and exported data between HDFS and Hive from Oracle database using Sqoop, ensuring seamless data integration.

Environment: Azure SQL, SQL Server, Blob storage, ADF, Oracle, MySQL, T- SQL, PL/SQL, Redshift, Snowflake, Hadoop, HBase, Airflow, Sqoop, HDFS, Hive, Spark, Informatica PC, Pyspark, Control-M, Oozie, Pig.

Data Engineer/ETL

JetSweep Inc
12.2016 - 02.2018
  • Imported data from various relational data sources like RDBMS and Teradata to HDFS using Sqoop, ensuring efficient data transfer and integration
  • Designed and implemented incremental imports into Hive tables, ensuring efficient data updates and synchronization
  • Worked on loading and transforming large volumes of structured, semi-structured, and unstructured data, ensuring data quality and adherence to business requirements
  • Utilized Python scripts to analyze data, extracting valuable insights and facilitating data-driven decision-making
  • Imported and exported data between NoSQL databases and HDFS, ensuring seamless data integration and movement
  • Experience in using T-SQL alongside SQL Server Integration Services (SSIS) for data integration, extract-transform-load (ETL) processes, and data migration tasks
  • Migrated ETL jobs to Pig scripts for data transformations, joins, and pre-aggregations before storing data onto HDFS, improving data processing efficiency
  • Exported data from HDFS to RDBMS using Sqoop for report generation and visualization purposes, enabling data analysis and reporting
  • Experience in creating dynamic and interactive visualizations using Tableau to communicate complex data insights effectively
  • Utilized Oozie workflow engine for job scheduling, ensuring timely execution and coordination of data processing tasks.

Environment: Teradata, SQL Server, T-SQL, SSIS, Hadoop, HDFS, Kerberos, HBase, Python, MapReduce, Hive, Oozie, Sqoop, Pig, Flume, Java, JSON, Rest API, Tableau, NoSQL, CDH3, CDH4

Education

Master's in business administration -

Campbellsville University

Master of Science in Computer and Information Sciences -

University of Michigan - Flint

Skills

  • Python
  • SQL
  • Hadoop
  • Snowflake
  • Pyspark
  • Airflow
  • Databricks
  • Amazon EC2
  • Amazon EMR
  • Athena
  • SQL Server
  • ETL
  • Apache Spark
  • MySQL
  • Oracle
  • Kafka
  • Teradata
  • Apache Hive
  • Amazon Redshift
  • Transact-SQL

Timeline

Data Engineer

The Pokemon Company International
07.2024 - 07.2025

Data Engineer

Lithia Motors
11.2023 - 04.2024

Data Engineer

Nike
03.2022 - 11.2023

Data Engineer

Providence Health and Services
03.2020 - 03.2022

Data Engineer

Capital One
03.2018 - 01.2020

Data Engineer/ETL

JetSweep Inc
12.2016 - 02.2018

Master of Science in Computer and Information Sciences -

University of Michigan - Flint

Master's in business administration -

Campbellsville University
Hima Gangasani