Summary
Overview
Work History
Education
Skills
Timeline
Generic

VasudevaReddy N

Richardson,TX

Summary

Accomplished Data Engineer with 8+ years of experience in designing and implementing scalable data pipelines, optimizing ETL frameworks, and building robust, cloud-native platforms. I have extensive experience designing and delivering end-to-end solutions across diverse domains and technologies, including ETL pipelines, algorithms, databases, websites, and other software products. My expertise spans full-stack development, data engineering, DevOps, cloud computing, and analytics, enabling the creation of efficient and innovative solutions for both startups and established corporations.

Overview

8
8
years of professional experience

Work History

Data Engineer

OPEL Systems, Inc
07.2024 - Current
  • Built a scalable healthcare data pipeline on AWS using services like S3, Glue, Lake Formation, and Snowflake to efficiently manage claims, policy, and member data.
  • Implemented the Medallion Architecture, leveraging AWS Glue and Snowflake to organize data into bronze, silver, and gold layers for optimized transformation and storage.
  • Developed and optimized batch ETL pipelines with AWS Glue and Spark on EMR, ensuring the delivery of clean and reliable data for downstream systems.
  • Managed metadata and access control with AWS Lake Formation, ensuring robust data governance, HIPAA compliance, and secure data encryption.
  • Orchestrated end-to-end data workflows using AWS Step Functions, automating data movement and transformation to streamline complex processes.

Data Engineer

SHSU
08.2023 - 05.2024
  • Built scalable ETL pipelines using AWS Glue to process and load banking transactional data for analytics.
  • Architected Lakehouse on Amazon S3 with AWS Glue Data Catalog for efficient data storage and management.
  • Optimized queries using Amazon Athena for fast, serverless analytics on large banking datasets.
  • Managed centralized data warehouse in Snowflake for secure, unified storage and reporting.
  • Ensured PCI-DSS/GDPR compliance by implementing encryption, access controls, and data masking for sensitive data.
  • Orchestrated data workflows using AWS Step Functions, ensuring reliability and fault tolerance across processes.
  • Enabled real-time data processing with Apache Kafka, supporting immediate insights and operational decisions.
  • Implemented data quality checks using AWS Deequ to ensure clean and reliable banking data.
  • Optimized pipeline performance by fine-tuning AWS Glue and Spark Streaming for low latency, high throughput processing.

Product Engineer

LTIMindtree
10.2021 - 12.2022
  • Developed 30+ backend micro-services and 30 REST APIs (Flask, Swagger) for the Climanomics Platform; integrated Kafka for real-time data streaming into Snowflake.
  • Built AWS Glue ETL jobs for real-time data transformation and loading into Snowflake; implemented incremental data loads using Postgres WAL and Debezium.
  • Deployed Amazon Deequ for automated data quality checks and custom validation rules in AWS Glue; applied Write-Audit-Publish (WAP) for data integrity and governance.
  • Managed Spark EMR clusters for large-scale data processing; used medallion architecture in Snowflake (bronze, silver, gold) for consistent, reliable data pipelines.
  • Performed SCD Type 2-dimensional modeling in Snowflake for historical data tracking, and conducted window-based analysis.
  • Established Master Data Management (MDM) as the single source of truth, centralizing data in Snowflake for consistent, accurate reporting.
  • Implemented OKTA-based JWT authentication for secure access to the Climanomics platform and systems.
  • Optimized ETL pipeline performance in AWS Glue and Spark via memory tuning, adaptive query execution, and dynamic partitioning for improved scalability and cost-efficiency.

Data Engineer

Legato Health Technologies
11.2020 - 10.2021
  • Designed and implemented an end-to-end automated step-function workflow for cloud account setup, application deployment, and ongoing maintenance, successfully managing 110+ applications with minimal manual intervention.
  • Contributed to the strategic planning and execution of AWS Landing Zone projects, ensuring secure, automated, and scalable setups for new AWS environments to meet business needs.
  • Developed 10+ AWS Lambda functions using Boto3 to automate IAM role management (deletion, updates, assignments), and interact with AWS services, significantly improving identity and access management efficiency.
  • Integrated Lambda functions with systems like SailPoint, IIQ Server, Distribution List (DL) management, and Hosted Zones, and orchestrated them into a cohesive Step Functions workflow to automate data processing and IAM management tasks.
  • Developed a front-end form using Flask and JavaScript frameworks, enabling users to select from sandbox, silver, and premium AWS accounts while filling out the form.
  • Provisioned POC sandbox accounts using Cloud Foundation and workflows in Google Cloud Platform (GCP), ensuring automation and consistency across multi-cloud environments.
  • Automated infrastructure provisioning using Terraform and Boto3 (Python), enabling seamless deployment pipelines and reducing provisioning time by 50%, eliminating manual errors.
  • Led the migration of on-premises data pipelines to AWS, leveraging EC2, S3, Glue, and Apache Spark for enhanced data processing and storage, optimizing workflows, and scalability.
  • Designed data processing workflows using AWS Lambda and Step Functions to automate the ingestion, transformation, and loading (ETL) of large datasets, reducing manual oversight.
  • Utilized Apache Spark to process structured and unstructured data, optimizing performance and reducing latency in data pipelines running on AWS.
  • Collaborated with cross-functional teams (Data Engineers, DevOps, and Cloud Architects) to ensure seamless integration between AWS services and on-premises systems for the migration project.

Bigdata Developer

Infodot Systems Private Limited, Mercedes
Benz
04.2018 - 10.2020
  • Developed ETL pipelines with Azure Databricks and PySpark, integrating data from web APIs and Azure Data Lake, and storing it in Azure SQL Database and Synapse Analytics for scalable processing.
  • Optimized ETL pipelines using Apache Spark's distributed processing, dynamically scaling resources in Azure Databricks, reducing processing time, and leveraging Azure Data Lake Storage for cost efficiency.
  • Implemented data quality checks with Apache Airflow and Great Expectations, ensuring data integrity and consistency across automotive ETL workflows.
  • Built a micro-batch de-duplication pipeline in Azure Databricks using Spark Structured Streaming to reduce daily data load times and enable real-time analytics.
  • Designed fact and dimensional models for vehicle performance and sales data, creating optimized data marts in Azure SQL Database for operational and financial reporting.
  • Managed Schema Evolution and Data Transformation with Delta Lake and Azure Databricks, processing data in Parquet and ORC formats for efficient storage and analytics.
  • Integrated Hive Metastore with Databricks for centralized metadata management, ensuring schema consistency, and improving data governance across Spark jobs.
  • Streamlined data ingestion from legacy systems into Azure Data Lake using Sqoop with incremental loading, enabling continuous data flow for analytics.
  • Orchestrated ETL workflows using Apache Airflow, automating pipeline execution across IoT sensors, dealership systems, and telematics data for actionable insights.
  • Applied advanced analytics in Azure Databricks using Spark SQL, aggregation, windowing, and cumulation patterns to generate insights for vehicle performance and customer behavior.
  • Optimized costs with Azure Synapse Analytics, applying data compression, columnar formats, and partitioning strategies for enhanced query performance.
  • Implemented Medallion Architecture in Databricks for structured data layers: Bronze (raw data), Silver (cleaned data), and Gold (aggregated insights), ensuring scalable and efficient data processing and governance.

Software Developer

Infodot Systems Private Limited
04.2017 - 03.2018
  • Developed a comprehensive fintech platform for portfolio management using Python, Django, ReactJS, PostgreSQL, Pandas, and D3.js, focusing on a factor-based investment approach.
  • Designed and implemented optimized database schemas with over 200 MSSQL tables and more than 40 stored procedures.
  • Created and integrated 100+ RESTful APIs with the Django REST framework, improving integration speed by 35%, and streamlining data processing through custom middleware.
  • Built responsive, interactive user interfaces with ReactJS, HTML, CSS, and JavaScript, and integrated dynamic charts and graphs for enhanced data visualization.
  • Applied advanced statistical methods with Pandas and NumPy for data analysis and back testing, increasing analysis accuracy by 30% in investment strategy optimization.
  • Implemented automated testing with pytest, and integrated CI/CD pipelines using Jenkins, CodeBuild, and SonarQube, ensuring code quality.

Education

Masters in Computing and Data Science -

Sam Houston State University (SHSU)
05.2024

UG in Information Technology -

RGMCET
04.2017

Skills

  • Data Lakehouse
  • ETL development
  • Apache Iceberg
  • Snowflake
  • MongoDB
  • Databricks
  • Unity Catalog
  • AWS
  • Azure
  • CI/CD
  • Docker
  • NoSQL databases
  • SQL expertise
  • Sqoop
  • Hive
  • Spark
  • Kafka
  • Python
  • Scala
  • Shell Scripting
  • Hadoop ecosystem
  • Swagger
  • Elastic-search
  • Data modeling
  • PySpark
  • Boto3
  • Metadata management
  • Data governance
  • API development
  • Web development expertise

Timeline

Data Engineer

OPEL Systems, Inc
07.2024 - Current

Data Engineer

SHSU
08.2023 - 05.2024

Product Engineer

LTIMindtree
10.2021 - 12.2022

Data Engineer

Legato Health Technologies
11.2020 - 10.2021

Bigdata Developer

Infodot Systems Private Limited, Mercedes
04.2018 - 10.2020

Software Developer

Infodot Systems Private Limited
04.2017 - 03.2018

Masters in Computing and Data Science -

Sam Houston State University (SHSU)

UG in Information Technology -

RGMCET
VasudevaReddy N