Summary
Overview
Work History
Education
Skills
Timeline
Generic
Gopal Reddy

Gopal Reddy

Data Engineer
Denton,TX

Summary

Innovative and detail-oriented data engineer with over 4+ years of experience designing and optimizing scalable data pipelines across AWS, Azure, and GCP. Proficient in Python, SQL, Scala, Spark, and Kafka, with expertise in big data processing, ETL, and real-time streaming. Skilled in cloud data warehousing (Snowflake, Redshift, BigQuery) and workflow automation (Apache Airflow, CI/CD). Adept at data modeling, security, and performance optimization, delivering high-impact and efficient data solutions that drive business intelligence and innovation. Passionate about solving complex data challenges and enabling data-driven decision-making.

Overview

4
4
years of professional experience
4
4
Languages

Work History

Azure Data Engineer

Lockheed Martin Corporation
01.2024 - Current

Professional Experience

  • Designed and optimized scalable data pipelines using Azure Data Lake, Synapse Analytics, and Databricks for efficient data storage and processing.
  • Integrated structured and unstructured data from SQL, MongoDB, Cassandra, PostgreSQL, and MySQL into centralized data lakes and warehouses.
  • Developed and maintained ETL processes with PySpark, Hive, MapReduce, and SQL, transforming raw data into actionable insights.
  • Implemented real-time data streaming and event-driven architectures using Apache Kafka, Flume, and Zookeeper.
  • Designed high-performance data models and optimized queries for large datasets using Impala and Sqoop.
  • Managed cloud storage solutions with Azure Data Lake, ensuring security, scalability, and efficiency.
  • Automated workflows with Apache Airflow, Git, Maven, and Azure DevOps, streamlining CI/CD processes.
  • Leveraged Hadoop and PySpark to process and analyze large-scale data efficiently.
  • Ensured data security and compliance by implementing encryption and access controls on Azure.
  • Collaborated with data scientists, analysts, and stakeholders to develop and enhance data-driven solutions.

Environment: Azure Data Lake, Synapse Analytics, Databricks, SQL, MongoDB, Cassandra, PostgreSQL, MySQL, PySpark, Hive, MapReduce, Apache Kafka, Flume, Zookeeper, Impala, Sqoop, Apache Airflow, Git, Maven, Azure DevOps, Hadoop.

AWS Data Engineer

GEICO
05.2023 - 12.2023

Professional Experience

  • Designed and optimized scalable data pipelines using AWS S3, Redshift, and EMR for efficient insurance data processing.
  • Developed and maintained ETL workflows using Python, PySpark, Hive, and SQL, ensuring seamless data transformation into Redshift and Snowflake.
  • Integrated diverse data sources (MySQL, PostgreSQL, MongoDB, Cassandra, HBase) into centralized repositories, ensuring data consistency and integrity.
  • Implemented real-time data streaming using Kafka, Flume, and Zookeeper, enabling real-time insurance data monitoring and processing.
  • Leveraged Redshift, Snowflake, and Google BigQuery for data warehousing, data modeling, and advanced analytics on insurance datasets.
  • Optimized big data processing using Hadoop, EMR, MapReduce, Impala, and PySpark, supporting data-driven decision-making.
  • Ensured data security and compliance by implementing encryption and access controls in AWS S3, Redshift, and other AWS services.
  • Managed and deployed cloud infrastructure using AWS CloudFormation, EC2, and S3, ensuring scalability and reliability.
  • Automated deployment processes with CI/CD tools like Git, Maven, and AWS CodePipeline, streamlining updates, testing, and releases.
  • Orchestrated data workflows using Apache Airflow, ensuring seamless scheduling and monitoring of analytics jobs.
  • Developed custom ETL solutions for optimized data ingestion and transformation.
  • Collaborated with cross-functional teams to integrate and enhance data engineering solutions.

Environment: AWS S3, Redshift, EMR, Python, PySpark, Hive, SQL, Snowflake, MySQL, PostgreSQL, MongoDB, Cassandra, HBase, Kafka, Flume, Zookeeper, Google BigQuery, Hadoop, MapReduce, Impala, AWS CloudFormation, EC2, Git, Maven, AWS CodePipeline, Apache Airflow.

GCP Data Engineer

Discover Financial Services
03.2022 - 12.2022

Professional Experience

  • Designed and implemented scalable data pipelines using Google Cloud Dataflow, automating ETL processes to handle large volumes of financial transactions.
  • Managed and organized data lakes on Google Cloud Storage (GCS), ensuring scalable and cost-effective storage for structured and unstructured banking data.
  • Developed and optimized data warehouses in BigQuery, enabling advanced financial reporting and customer behavior analysis.
  • Built real-time data processing workflows using Google Cloud Pub/Sub and Apache Kafka, ensuring minimal latency for transaction monitoring.
  • Implemented big data processing solutions with Apache Spark, Hadoop, and Presto, enhancing risk assessment, fraud detection, and customer insights.
  • Automated infrastructure provisioning with Terraform, enabling consistent and repeatable cloud-based data deployments.
  • Streamlined CI/CD pipelines with Jenkins and DevOps practices, accelerating the deployment of data pipelines, ML models, and analytics applications.
  • Collaborated with data scientists and analysts to develop complex queries in BigQuery using SQL, Python, and Java, driving insights into customer transactions and risk management.
  • Designed and optimized batch and stream processing pipelines using Apache Beam on Google Cloud Dataflow, supporting real-time financial decision-making.
  • Ensured data integrity and stability by developing robust ETL workflows and optimizing database structures.

Environment: Google Cloud Dataflow, ETL, Google Cloud Storage (GCS), BigQuery, Google Cloud Pub/Sub, Apache Kafka, Apache Spark, Hadoop, Presto, Terraform, CI/CD, Jenkins, SQL, Python, Java, Apache Beam.

Data Engineer

Cigna Corporation
03.2021 - 02.2022

Professional Experience

  • Designed and implemented scalable data pipelines to collect, process, and store healthcare data from Electronic Health Records (EHRs) and IoT devices using Apache Kafka, Apache Spark, and ETL tools.
  • Ensured data security and HIPAA compliance by implementing encryption and access controls in AWS S3 for secure healthcare data storage.
  • Integrated and managed diverse healthcare data sources, leveraging SQL and NoSQL databases for structured and unstructured data processing.
  • Developed and optimized data warehouses in Amazon Redshift, enabling efficient querying and reporting on large healthcare datasets.
  • Implemented real-time data streaming solutions with Apache Kafka and AWS Kinesis, ensuring minimal latency in processing patient monitoring and operational data.
  • Automated ETL workflows using Apache Airflow and AWS Glue, streamlining complex data pipeline management.
  • Managed cloud infrastructure with Terraform and AWS CloudFormation, ensuring consistent deployments across cloud environments.
  • Collaborated with clinical and IT teams to derive actionable insights, utilizing Tableau, Power BI, and Looker for healthcare analytics.
  • Supported machine learning initiatives by ensuring clean and reliable datasets for TensorFlow-based models, driving advanced analytics and decision-making in healthcare.
  • Optimized ETL processes and database structures, enhancing data integrity, performance, and reliability.

Environment: Apache Kafka, Apache Spark, ETL, AWS S3, SQL, NoSQL databases, Amazon Redshift, AWS Kinesis, Apache Airflow, AWS Glue, Terraform, AWS CloudFormation, Tableau, Power BI, Looker, TensorFlow.


Education

master's - Data Science

University of North Texas
Denton, Texas
12-2024

Skills

Hadoop

undefined

Timeline

Azure Data Engineer

Lockheed Martin Corporation
01.2024 - Current

AWS Data Engineer

GEICO
05.2023 - 12.2023

GCP Data Engineer

Discover Financial Services
03.2022 - 12.2022

Data Engineer

Cigna Corporation
03.2021 - 02.2022

master's - Data Science

University of North Texas
Gopal ReddyData Engineer