Summary

Overview

Work History

Skills

Timeline

Siva Varma

Summary

Over 5 years of experience in Data Engineering, specializing in GCP, and AWS with expertise in building scalable data lakes, ETL pipelines, real-time data processing, and cloud-based analytics. Designed and developed centralized data lakes on GCP, leveraging Cloud Storage, Dataproc, BigQuery, and BigTable for efficient data storage and processing. Built scalable ETL workflows using Cloud Dataflow, Dataproc with Spark, and Apache Airflow, automating data ingestion and transformation pipelines. Implemented real-time data ingestion architectures using Druid and Kafka on GCP, ensuring low-latency data processing for analytics and reporting. Collaborated with ML engineers to integrate data pipelines with AI/ML models. Strengthened data security and compliance by implementing GCP IAM policies, role-based access control, encryption, and data masking techniques. Orchestrated multi-source data ingestion pipelines using Cloud Composer and Cloud Dataproc, ensuring seamless integration with diverse data sources. Developed scalable, high-performance code in Python and Scala for complex data transformations and workflow automation. Designed and maintained AWS-based data solutions, developing data lakes on Amazon S3 and optimizing them with partitioning strategies and lifecycle policies. Built and managed ETL pipelines using AWS Glue, Lambda, and Apache Airflow, streamlining automated data transformations and processing. Implemented real-time streaming solutions with AWS Kinesis, Spark Streaming, and Apache Kafka, ensuring continuous data availability and processing. Worked on multi-terabyte data migrations from Oracle to AWS, storing optimized copies in Amazon Redshift for business intelligence and reporting. Optimized Redshift clusters, focusing on schema design, query performance tuning, and workload management for enhanced data analytics. Involved in data security on AWS by configuring IAM roles, S3 bucket policies, and AWS KMS encryption, ensuring regulatory compliance. Developed real-time dashboards and analytics solutions using Tableau, and AWS Athena improving data accessibility and business insights. Automated Tableau dashboard updates using Python and AWS Lambda, reducing manual effort and improving real-time reporting efficiency.

Overview

years of professional experience

Work History

Data Engineer

Marsh McLennan Agency

Dallas, TX

01.2023 - Current

Developed a centralized data lake on Google Cloud Platform (GCP) using key services such as Cloud Storage, Dataproc, BigQuery, and BigTable
Created PySpark scripts for data cleansing and enrichment of clickstream data, optimizing real-time analytics performance
Implemented real-time data ingestion architectures leveraging Druid on GCP, enhancing transformation and query efficiency
Built scalable and fault-tolerant data pipelines using Spark Streaming to handle high-volume data streams, ensuring continuous data availability for business-critical operations
Developed and managed ETL and data flow jobs using Apache Airflow on GCP, automating daily incremental loads with various Airflow operators
Collaborated with machine learning engineers to integrate data pipelines with ML models
Designed and optimized BigQuery schemas, leveraging partitioning and clustering to improve query performance and reduce execution time by 40%
Involved in enhancing data security and compliance by implementing GCP IAM policies, role-based access control (RBAC), data masking, and encryption, ensuring secure access and data protection
Integrated Google Cloud Search and BigQuery to enable fast, full-text search capabilities across extensive datasets, improving data retrieval efficiency and query performance
Worked on large-scale migration of datasets from PostgreSQL to Google BigQuery, optimizing performance, reducing query execution times, and lowering operational costs
Designed real-time ingestion pipelines to integrate data from Database, APIs, and streaming services enhancing analytics and reporting
Automated and orchestrated multi-source data ingestion and transformation workflows using Cloud Dataproc
Established a comprehensive data governance framework to maintain data integrity and ensure compliance across all data interactions
Developed and maintained efficient, high-quality code in Spark, Scala, and Python for complex data transformations, improving system scalability and reliability

Big Data Developer

Digno Solutions

Hyderabad, IN

05.2021 - 07.2022

Extracted, transformed, and loaded data from various source systems into AWS storage services using AWS Glue, Amazon EMR, and Amazon S3
Built data pipelines in AWS Glue by leveraging connections, jobs, and workflows to extract, transform, and load data from multiple sources, including Amazon RDS, Amazon S3, Amazon Redshift, and PostgreSQL
Developed Spark applications in Scala and Spark SQL for data extraction, transformation, and aggregation across various file formats to generate insights into customer behavior
Developed end-to-end data ingestion workflows using AWS Glue, integrating them with BMC Control-M scheduling tools to automate and streamline data processing and workflow management
Managed and optimized Databricks clusters by handling upgrades, performance monitoring, and workload tuning to improve cost efficiency
Wrote complex SQL queries in AWS Redshift to perform advanced transformations, aggregations, and optimizations
Worked on migration of data from PostgreSQL to AWS, ensuring data consistency, optimizing performance, and facilitating business analytics and reporting
Optimized Spark applications on AWS EMR by fine-tuning batch intervals, parallelism settings, and memory configurations, enhancing processing performance and resource efficiency
Worked on orchestrating multi-source data ingestion and transformation workflows using AWS Glue and Amazon CloudWatch, ensuring efficient data processing and monitoring
Wrote and maintained high-quality, scalable code in Scala, and Python to support complex data transformation processes and enhance system reliability
Worked on real-time streaming pipelines utilizing Apache Kafka and Spark Streaming for efficient data processing
Developed Tableau reports for real-time claims analysis and financial forecasting, assisting in compliance audits and regulatory reporting
Automated Tableau dashboard updates using Python scripts and AWS Lambda, ensuring real-time insights for executive decision-making and reducing manual workload

Data Engineer

Amogus Technologies

Hyderabad, IN

04.2019 - 05.2021

Engineered and managed data pipelines leveraging AWS Glue and Lambda, automating ETL processes across multiple data sources
Designed and optimized data lakes on Amazon S3 by implementing partitioning strategies and lifecycle policies to enhance performance and reduce costs
Developed real-time data ingestion workflows utilizing AWS Kinesis Data Streams ensuring low-latency processing
Configured and fine-tuned Redshift clusters, focusing on schema design, query performance optimization, and workload management for efficient analytics
Worked on orchestrating multi-step data pipelines using AWS Data Pipeline and AWS Step Functions, ensuring reliable workflow execution and efficient data processing
Implemented SQL-based querying solutions using AWS Athena and Glue Catalog, enabling efficient serverless analytics on Amazon S3
Assisted in debugging and troubleshooting ETL workflows, identifying and resolving data pipeline failures and bottlenecks
Wrote and optimized SQL queries for data extraction and transformation within Redshift and S3-based datasets
Collaborated with senior engineers to optimize data pipeline performance and improve data processing efficiency
Gained experience with version control tools like Git and GitHub, ensuring code versioning and collaboration

Skills

Cloud Computing Platforms :

Google Cloud Platform (GCP)

Amazon Web Services (AWS)

GCP Services :

Cloud Storage

BigQuery

BigTable

Dataproc

Cloud Dataflow

Cloud Composer (Apache Airflow)

Pub/Sub

Cloud IAM

Cloud Monitoring

AWS Services :

Athena

Glue Crawler

Glue Catalog

Redshift

Lambda

RDS

EMR

Kinesis

SNS

IAM

CloudFormation

Terraform

CloudWatch

Cost Explorer

Data Warehouses :

Google BigQuery

Amazon Redshift

Programming Languages :

Python

Scala

SQL

Bash

Big Data & Streaming Frameworks :

Apache Spark (PySpark, Spark SQL, Spark Streaming)

Apache Kafka

Apache Beam

Druid

Elasticsearch

Data Pipeline & Orchestration :

Apache Airflow

Cloud Composer

AWS Glue

AWS Data Pipeline

Databases & Storage :

PostgreSQL

MySQL

Amazon RDS

Google Cloud SQL

Version Control & Development Tools :

Git

GitHub

VS Code

Data Formats :

JSON

CSV

Parquet

Avro

ORC

XML

Visualization & Reporting :

Tableau

AWS QuickSight

Timeline

Data Engineer

Marsh McLennan Agency

01.2023 - Current

Big Data Developer

Digno Solutions

05.2021 - 07.2022

Data Engineer

Amogus Technologies

04.2019 - 05.2021

Siva Varma

Summary

Overview

Work History

Data Engineer

Big Data Developer

Data Engineer

Skills

Timeline

Data Engineer

Big Data Developer

Data Engineer

Similar Profiles

Erika PinkusErika Pinkus

Sharon McGrawSharon McGraw

Brooke O'SheaBrooke O'Shea

Sharale Patrice GraySharale Patrice Gray

Ferhan KushevFerhan Kushev