Summary
Overview
Work History
Education
Skills
Timeline
Generic

Rup Roy

New York

Summary

Senior Data Engineer with 8+ years of experience designing and implementing scalable ETL pipelines, real-time data streaming solutions, and cloud-native architectures. Skilled in Python, SQL, Spark, Kafka, and AWS/Azure/GCP services. Experienced in financial services, healthcare, insurance, and defense industries, delivering secure, efficient, and business-aligned data solutions.

Overview

8
8
years of professional experience

Work History

Senior Data Engineer

MetLife
04.2022 - Current
  • Developed full stack React + Node web application with a Python + FastAPI powered backend to transcribe voice input into structured reports, streamlining manual reporting and reducing errors.
  • Led 6-member team to develop a Dockerized role-based entity management web application with Node, ReactJS, and PostgreSQL for operational planning.
  • Optimized SQL reporting workflows, reducing query latency by 50% across actuarial reporting systems.
  • Created Tableau dashboards to provide transparency across 500M+ policyholder records.
  • Implemented CI/CD automation for ETL workflows using Jenkins, Docker, and Kubernetes.
  • Orchestrated workflows in Apache Airflow to automate data ingestion from multiple external sources (APIs, flat files, cloud).
  • Collaborated with actuaries and risk teams to design predictive models for fraud detection and claims forecasting.
  • Migrated legacy Oracle-based systems into AWS Redshift, improving scalability and reducing infrastructure costs by 30%.
  • Developed data governance policies ensuring compliance with HIPAA and SOX requirements.
  • Enhanced actuarial models by integrating third-party demographic and financial datasets into existing pipelines.
  • Partnered with business stakeholders to translate requirements into technical specifications.
  • Conducted root-cause analysis for data pipeline failures, implementing automated recovery scripts.
  • Developed automated regression tests for ETL pipelines to validate schema and data quality.
  • Mentored junior data engineers and analysts on SQL tuning and Python best practices.
  • Supported actuarial and finance departments with custom SQL-based risk assessment reports.
  • Monitored and tested application performance to identify potential bottlenecks, develop solutions and collaborate with developers on solution implementation.
  • Led the migration of 30+ legacy insurance applications from on-premises data centers to AWS, leveraging AWS Migration Hub, Application Discovery Service, and Database Migration Service (DMS), resulting in 40% cost savings and improved application performance.
  • Designed and implemented a multi-account AWS landing zone using AWS Control Tower, Service Control Policies (SCPs), and IAM roles, ensuring strict compliance with SOX and GDPR requirements.
  • Automated infrastructure provisioning and configuration using Terraform and AWS CloudFormation, reducing deployment timelines by 70%.
  • Enhanced security posture by integrating AWS GuardDuty, Config, and CloudTrail, improving threat detection and compliance monitoring.
  • Conducted post-migration optimization by right-sizing EC2 instances, implementing Auto Scaling, and leveraging AWS Savings Plans, leading to $500K annual savings.

Senior Data Engineer

Lockheed Martin
02.2021 - 03.2022
  • Built high-performance ETL pipelines for classified defense systems using Python, SQL, and Apache Spark.
  • Migrated mission-critical engineering workloads from on-premises VMware to Microsoft Azure, including 150+ VMs and high-performance computing clusters, using Azure Migrate and Azure Site Recovery.
  • Designed a secure Azure landing zone with Blueprints, Policy, and Role-Based Access Control (RBAC) to meet NIST 800-53 and ITAR compliance requirements.
  • Integrated on-premises Active Directory with Azure Active Directory (AAD) and implemented Conditional Access Policies for multi-factor authentication (MFA).
  • Built CI/CD pipelines using Azure DevOps for automated deployments, reducing release cycle times by 60%.
  • Implemented Azure Monitor, Log Analytics, and Sentinel for centralized monitoring and proactive threat detection, improving incident response by 35%.
  • Implemented Kafka-based streaming pipelines to support real-time telemetry data processing.
  • Optimized data models in PostgreSQL and Oracle for mission-critical reporting systems.
  • Built CI/CD pipelines with GitLab and Jenkins for automated deployment of ETL workflows.
  • Developed role-based access control (RBAC) for sensitive datasets, ensuring compliance with DoD standards.
  • Created anomaly detection models for aerospace telemetry using Python and scikit-learn.
  • Led migration of legacy batch ETL processes to real-time event-driven architectures.
  • Containerized ETL pipelines using Docker and deployed across Kubernetes clusters for scalability.
  • Developed automated unit and integration tests to validate secure ETL processes.
  • Worked with cybersecurity teams to ensure data pipelines met NIST compliance requirements.
  • Created monitoring dashboards in Grafana to visualize real-time system performance.
  • Collaborated with aerospace engineers to integrate telemetry and sensor data pipelines.
  • Performed database tuning and partitioning to handle billions of telemetry records efficiently.
  • Mentored team members on DevOps practices and secure data engineering standards.

Data Engineer

HCA Healthcare
11.2019 - 01.2021
  • Designed HIPAA-compliant ETL pipelines in AWS for processing electronic health record (EHR) data.
  • Implemented data pipelines using Python, PySpark, and AWS Glue to handle patient care datasets.
  • Spearheaded the migration of a large-scale patient management platform to AWS, ensuring HIPAA compliance throughout the process.
  • Re-architected a legacy monolithic application into microservices using Docker and Amazon EKS, improving scalability and reducing downtime during peak loads by 50%.
  • Implemented CI/CD pipelines with Jenkins and AWS CodePipeline to automate build, test, and deployment workflows across multiple environments.
  • Enhanced data security by integrating AWS KMS for encryption at rest and AWS WAF to protect against web exploits, ensuring compliance with healthcare regulations.
  • Built real-time streaming pipelines with Kafka to monitor hospital device telemetry and alerts.
  • Developed predictive models for patient readmission risk using Python (scikit-learn, Pandas).
  • Created Redshift-based data warehouse to centralize patient and hospital operations data.
  • Orchestrated workflows with Apache Airflow to automate reporting and compliance checks.
  • Developed interactive dashboards in Tableau for executives to monitor patient outcomes.
  • Implemented automated data validation scripts to ensure accuracy of medical reporting.
  • Partnered with doctors and administrators to build analytics solutions improving patient care.
  • Migrated legacy on-prem systems to AWS cloud, reducing operational costs by 35%.
  • Designed de-identification workflows to protect patient privacy in research datasets.
  • Implemented CI/CD pipelines to automate deployment of healthcare analytics applications.
  • Optimized SQL queries and ETL processes, reducing runtime of compliance reports by 60%.
  • Integrated claims and billing datasets with clinical records for financial risk modeling.
  • Trained data analysts on SQL, AWS, and Airflow best practices for healthcare data.

Lead Data Engineer

BNY Mellon
10.2017 - 09.2019
  • Architected enterprise-scale ETL pipelines using Apache Spark, SQL, and AWS Glue for financial datasets.
  • Built real-time fraud detection pipelines processing millions of transactions per day using Kafka and Spark.
  • Developed predictive risk scoring models integrated with financial reporting systems.
  • Optimized SQL Server and Redshift queries, reducing query times by 55% for trading analytics.
  • Implemented CI/CD workflows with Jenkins, Git, and Kubernetes for data infrastructure deployments.
  • Created self-service data APIs for internal teams to access financial data securely.
  • Designed scalable data lakes in AWS S3 to consolidate structured and unstructured datasets.
  • Led a hybrid cloud migration strategy, integrating AWS and Azure environments to support global banking operations, ensuring zero downtime for critical applications.
  • Designed a secure multi-cloud network using AWS Transit Gateway, Azure ExpressRoute, and IPSec VPN tunnels, reducing latency for intercontinental operations by 45%.
  • Migrated Oracle and SAP workloads to Azure while integrating real-time analytics pipelines into AWS Redshift and Glue, enabling cross-platform data insights.
  • Established centralized monitoring using Datadog, integrating alerts with ServiceNow for automated incident response, reducing MTTR by 40%.
  • Developed robust disaster recovery and backup solutions leveraging AWS S3 cross-region replication and Azure Backup Vaults, ensuring 99.99% data availability and regulatory compliance.
  • Built Tableau dashboards to provide executives with real-time insights into market risk.
  • Developed automated lineage and metadata tracking systems for financial compliance.
  • Partnered with risk and compliance teams to ensure adherence to SEC and FINRA regulations.
  • Migrated critical risk reporting pipelines from Oracle to Snowflake for improved scalability.
  • Implemented anomaly detection on trade flows using Python machine learning libraries.
  • Enhanced data governance processes with role-based access and auditing capabilities.
  • Mentored junior engineers and led Agile sprint planning for data engineering team.
  • Reduced infrastructure costs by 25% through storage optimization and query performance tuning.

Education

Your Degree - Computer Science

Uttara Institute of Business And Technology
Dhaka, Bangladesh
04-2016

Skills

  • Programming: Python, SQL, Scala, Java
  • Big Data & Cloud: Apache Spark, PySpark, Hadoop, Kafka, AWS, Azure (Glue, Redshift, Lambda, S3, EMR, Kinesis), Azure Data Factory, GCP BigQuery, Databricks
  • Databases: PostgreSQL, Oracle, MySQL, MongoDB, Snowflake, SQL Server
  • DevOps & CI/CD: Docker, Kubernetes, Jenkins, Git, Terraform, Airflow
  • Visualization: Tableau, Power BI, Looker

Timeline

Senior Data Engineer

MetLife
04.2022 - Current

Senior Data Engineer

Lockheed Martin
02.2021 - 03.2022

Data Engineer

HCA Healthcare
11.2019 - 01.2021

Lead Data Engineer

BNY Mellon
10.2017 - 09.2019

Your Degree - Computer Science

Uttara Institute of Business And Technology