Summary
Overview
Work History
Education
Skills
Timeline
Generic

DINESH GOPATHI

Schaumburg,IL

Summary

I am a Data Engineer passionate about transforming raw data into valuable insights. With a Master's degree in Computer Science and 4 years of Data Engineering experience, I specialize in building and maintaining ETL data pipelines using tools like Apache Spark and AWS services. My expertise includes working with both relational and NoSQL databases to ensure data integrity and performance. I am skilled in automation and data security practices, and thrive in collaborative, agile environments where I can contribute to impactful projects.

Overview

5
5
years of professional experience

Work History

Data Engineer

AmerisourceBergen
AZ
09.2022 - Current
  • Designed and implemented end-to-end ETL solutions using Python and Scala ensuring seamless data flow and transformation across diverse data sources.
  • Developed custom Spark-based frameworks for ETL and ELT processes, leveraging Spark SQL and kafka Streaming to handle large-scale data processing with high efficiency and reliability.
  • Managed and optimized relational and NoSQL databases, including MySQL and MongoDB, for storing and querying structured and unstructured data.
  • Worked with the migration of on-premises data infrastructure to AWS cloud, implementing AWS Data Services such as S3, Glue, and Kinesis for scalable and cost-effective data processing.
  • Implemented data serialization techniques using Parquet and AVRO formats, ensuring efficient storage and retrieval of data in distributed environments.
  • Led initiatives to enhance data security measures on AWS, configuring IAM roles, encryption mechanisms, and managing access control using Lake Formation and KMS.
  • Designed and implemented end-to-end data pipelines, utilizing AWS services such as AWS Glue and S3 to extract, transform, and load data into Snowflake, facilitating real-time analytics and reporting.
  • Utilized Snowflake's compatibility with AWS services like S3 for efficient data loading and storage, and AWS Lambda for serverless computing, optimizing performance and resource utilization.
  • Automated deployment pipelines and continuous delivery processes using Apache Airflow, streamlining software releases and improving development efficiency.
  • Played a key role in all phases of the Software Development Life Cycle (SDLC), from requirements gathering and design to implementation, testing, and deployment.
  • Practiced agile methodologies such as CI/CD and prioritized applicant resiliency and security, ensuring adherence to project timelines and quality standards.

Data Engineer

XLogic Technologies
India
08.2020 - 07.2021
  • Designed and implemented a scalable ETL framework leveraging Apache Spark, facilitating the extraction, transformation, and loading of large volumes of data.
  • Developed custom Spark jobs in Python/PySpark to efficiently process and cleanse raw data from diverse sources, ensuring data quality and consistency.
  • Integrated MongoDB as a NoSQL database to store semi-structured data such as user-generated content and logs, exploiting its flexible schema to accommodate evolving data structures.
  • Implemented connectors or adapters to convert MySQL data into Parquet or Avro formats for optimized storage and processing in distributed data processing frameworks using Apache Spark.
  • Orchestrated data workflows using Step Functions, automating EMR job execution and ensuring workflow reliability and enabling seamless communication through EventBridge for timely data processing and automation.
  • Implemented security measures for S3 data, including encryption and access control using IAM policies.
  • Established CI/CD pipelines using Jenkins to automate software builds, testing, and deployment processes, ensuring rapid and reliable delivery of code changes.
  • Implemented Agile methodologies including Scrum and Kanban, facilitating iterative development cycles, regular sprint planning, and continuous improvement through feedback loops.

Data Engineer

Atlasis Software
India
01.2019 - 07.2020
  • Developed robust ETL pipelines to extract data from diverse sources, transform it using Azure Databricks, and load it into Azure data lakes and data warehouses.
  • Leveraged Azure Data Factory for orchestrating and automating ETL workflows, ensuring seamless data movement and transformation.
  • Implemented Azure SQL Database for relational data storage, optimizing schema design and query performance for efficient data retrieval.
  • Utilized SQL for data manipulation tasks such as filtering, sorting, joining, and aggregating data from multiple tables and datasets.
  • Implemented data lifecycle management policies to optimize storage costs and ensure data availability and durability.
  • Played a key role in all phases of the Software Development Life Cycle (SDLC), from requirements gathering and design to implementation, testing, and deployment.

Education

Master Of Science in Computer Science -

Northern Arizona University
12.2022

Bachelor in Electronics and Communication Engineering -

Institute of Aeronautical Engineering
06.2021

Skills

  • Python Scala SQL Java
  • Apache Spark Spark SQL Hadoop
  • AWS Cloud ( EC2, S3 Bucket, Redshift, Lambda, Kinesis, DynamoDB, Glue, Step Functions , Event Bridge )
  • Azure Cloud ((Azure DevOps, Azure Data Lake, Azure Data Factory, Azure Databricks)
  • Snowflake Tableau Power BI
  • MS SQL Server PostgreSQL MongoDB MySQL Cassandra

Timeline

Data Engineer

AmerisourceBergen
09.2022 - Current

Data Engineer

XLogic Technologies
08.2020 - 07.2021

Data Engineer

Atlasis Software
01.2019 - 07.2020

Master Of Science in Computer Science -

Northern Arizona University

Bachelor in Electronics and Communication Engineering -

Institute of Aeronautical Engineering
DINESH GOPATHI