Accomplished Data Engineer with 8+ years of experience in designing and implementing scalable data pipelines, optimizing ETL frameworks, and building robust, cloud-native platforms. I have extensive experience designing and delivering end-to-end solutions across diverse domains and technologies, including ETL pipelines, algorithms, databases, websites, and other software products. My expertise spans full-stack development, data engineering, DevOps, cloud computing, and analytics, enabling the creation of efficient and innovative solutions for both startups and established corporations.
Overview
8
8
years of professional experience
Work History
Data Engineer
OPEL Systems, Inc
07.2024 - Current
Built a scalable healthcare data pipeline on AWS using services like S3, Glue, Lake Formation, and Snowflake to efficiently manage claims, policy, and member data.
Implemented the Medallion Architecture, leveraging AWS Glue and Snowflake to organize data into bronze, silver, and gold layers for optimized transformation and storage.
Developed and optimized batch ETL pipelines with AWS Glue and Spark on EMR, ensuring the delivery of clean and reliable data for downstream systems.
Managed metadata and access control with AWS Lake Formation, ensuring robust data governance, HIPAA compliance, and secure data encryption.
Orchestrated end-to-end data workflows using AWS Step Functions, automating data movement and transformation to streamline complex processes.
Data Engineer
SHSU
08.2023 - 05.2024
Built scalable ETL pipelines using AWS Glue to process and load banking transactional data for analytics.
Architected Lakehouse on Amazon S3 with AWS Glue Data Catalog for efficient data storage and management.
Optimized queries using Amazon Athena for fast, serverless analytics on large banking datasets.
Managed centralized data warehouse in Snowflake for secure, unified storage and reporting.
Ensured PCI-DSS/GDPR compliance by implementing encryption, access controls, and data masking for sensitive data.
Orchestrated data workflows using AWS Step Functions, ensuring reliability and fault tolerance across processes.
Enabled real-time data processing with Apache Kafka, supporting immediate insights and operational decisions.
Implemented data quality checks using AWS Deequ to ensure clean and reliable banking data.
Optimized pipeline performance by fine-tuning AWS Glue and Spark Streaming for low latency, high throughput processing.
Product Engineer
LTIMindtree
10.2021 - 12.2022
Developed 30+ backend micro-services and 30 REST APIs (Flask, Swagger) for the Climanomics Platform; integrated Kafka for real-time data streaming into Snowflake.
Built AWS Glue ETL jobs for real-time data transformation and loading into Snowflake; implemented incremental data loads using Postgres WAL and Debezium.
Deployed Amazon Deequ for automated data quality checks and custom validation rules in AWS Glue; applied Write-Audit-Publish (WAP) for data integrity and governance.
Managed Spark EMR clusters for large-scale data processing; used medallion architecture in Snowflake (bronze, silver, gold) for consistent, reliable data pipelines.
Performed SCD Type 2-dimensional modeling in Snowflake for historical data tracking, and conducted window-based analysis.
Established Master Data Management (MDM) as the single source of truth, centralizing data in Snowflake for consistent, accurate reporting.
Implemented OKTA-based JWT authentication for secure access to the Climanomics platform and systems.
Optimized ETL pipeline performance in AWS Glue and Spark via memory tuning, adaptive query execution, and dynamic partitioning for improved scalability and cost-efficiency.
Data Engineer
Legato Health Technologies
11.2020 - 10.2021
Designed and implemented an end-to-end automated step-function workflow for cloud account setup, application deployment, and ongoing maintenance, successfully managing 110+ applications with minimal manual intervention.
Contributed to the strategic planning and execution of AWS Landing Zone projects, ensuring secure, automated, and scalable setups for new AWS environments to meet business needs.
Developed 10+ AWS Lambda functions using Boto3 to automate IAM role management (deletion, updates, assignments), and interact with AWS services, significantly improving identity and access management efficiency.
Integrated Lambda functions with systems like SailPoint, IIQ Server, Distribution List (DL) management, and Hosted Zones, and orchestrated them into a cohesive Step Functions workflow to automate data processing and IAM management tasks.
Developed a front-end form using Flask and JavaScript frameworks, enabling users to select from sandbox, silver, and premium AWS accounts while filling out the form.
Provisioned POC sandbox accounts using Cloud Foundation and workflows in Google Cloud Platform (GCP), ensuring automation and consistency across multi-cloud environments.
Automated infrastructure provisioning using Terraform and Boto3 (Python), enabling seamless deployment pipelines and reducing provisioning time by 50%, eliminating manual errors.
Led the migration of on-premises data pipelines to AWS, leveraging EC2, S3, Glue, and Apache Spark for enhanced data processing and storage, optimizing workflows, and scalability.
Designed data processing workflows using AWS Lambda and Step Functions to automate the ingestion, transformation, and loading (ETL) of large datasets, reducing manual oversight.
Utilized Apache Spark to process structured and unstructured data, optimizing performance and reducing latency in data pipelines running on AWS.
Collaborated with cross-functional teams (Data Engineers, DevOps, and Cloud Architects) to ensure seamless integration between AWS services and on-premises systems for the migration project.
Bigdata Developer
Infodot Systems Private Limited, Mercedes
Benz
04.2018 - 10.2020
Developed ETL pipelines with Azure Databricks and PySpark, integrating data from web APIs and Azure Data Lake, and storing it in Azure SQL Database and Synapse Analytics for scalable processing.
Optimized ETL pipelines using Apache Spark's distributed processing, dynamically scaling resources in Azure Databricks, reducing processing time, and leveraging Azure Data Lake Storage for cost efficiency.
Implemented data quality checks with Apache Airflow and Great Expectations, ensuring data integrity and consistency across automotive ETL workflows.
Built a micro-batch de-duplication pipeline in Azure Databricks using Spark Structured Streaming to reduce daily data load times and enable real-time analytics.
Designed fact and dimensional models for vehicle performance and sales data, creating optimized data marts in Azure SQL Database for operational and financial reporting.
Managed Schema Evolution and Data Transformation with Delta Lake and Azure Databricks, processing data in Parquet and ORC formats for efficient storage and analytics.
Integrated Hive Metastore with Databricks for centralized metadata management, ensuring schema consistency, and improving data governance across Spark jobs.
Streamlined data ingestion from legacy systems into Azure Data Lake using Sqoop with incremental loading, enabling continuous data flow for analytics.
Orchestrated ETL workflows using Apache Airflow, automating pipeline execution across IoT sensors, dealership systems, and telematics data for actionable insights.
Applied advanced analytics in Azure Databricks using Spark SQL, aggregation, windowing, and cumulation patterns to generate insights for vehicle performance and customer behavior.
Optimized costs with Azure Synapse Analytics, applying data compression, columnar formats, and partitioning strategies for enhanced query performance.
Implemented Medallion Architecture in Databricks for structured data layers: Bronze (raw data), Silver (cleaned data), and Gold (aggregated insights), ensuring scalable and efficient data processing and governance.
Software Developer
Infodot Systems Private Limited
04.2017 - 03.2018
Developed a comprehensive fintech platform for portfolio management using Python, Django, ReactJS, PostgreSQL, Pandas, and D3.js, focusing on a factor-based investment approach.
Designed and implemented optimized database schemas with over 200 MSSQL tables and more than 40 stored procedures.
Created and integrated 100+ RESTful APIs with the Django REST framework, improving integration speed by 35%, and streamlining data processing through custom middleware.
Built responsive, interactive user interfaces with ReactJS, HTML, CSS, and JavaScript, and integrated dynamic charts and graphs for enhanced data visualization.
Applied advanced statistical methods with Pandas and NumPy for data analysis and back testing, increasing analysis accuracy by 30% in investment strategy optimization.
Implemented automated testing with pytest, and integrated CI/CD pipelines using Jenkins, CodeBuild, and SonarQube, ensuring code quality.