Summary
Overview
Work History
Education
Skills
Timeline
Generic

RAGHAVENDER SINGH

Georgetown,TX

Summary

Seasoned data engineer with 5 years of experience specializing in designing, building, and optimizing scalable data pipelines using cutting-edge technologies such as Hadoop, Spark, Kafka, Airflow, and Hive. Skilled in Python, PySpark, and Scala to develop robust ETL workflows and implement real-time data processing solutions using Spark Streaming and Kafka. Proficient in cloud platforms like AWS, Azure, and GCP, leveraging services such as S3, Redshift, Glue, ADF, Synapse, and Snowflake to architect scalable and cost-effective data solutions. Extensive background in SQL and NoSQL databases (PostgreSQL, Oracle, DynamoDB, Cassandra) with advanced query optimization and data modeling for efficient storage and retrieval. Hands-on experience with CI/CD pipelines, infrastructure automation using Terraform, and workflow orchestration with Airflow to ensure reliable data processing. Adept at developing interactive dashboards using Power BI to integrate business intelligence solutions for real-time insights and data-driven decision-making. Proficient in optimizing big data frameworks (Hadoop, Spark) for improved performance, scalability, and cost efficiency to enable faster data processing and reduce operational costs. Skilled in implementing data integration solutions using Apache Kafka to ensure seamless data flow and real-time stream processing for mission-critical applications. Strong expertise in automating data pipeline testing and monitoring with tools like Apache Airflow and custom Python scripts to ensure data integrity throughout the ETL process. Experience in implementing data versioning and lineage tracking using tools like Apache Atlas and Glue Data Catalog to ensure transparent and traceable data pipelines for compliance and auditing purposes.

Overview

6
6
years of professional experience

Work History

Senior Data Engineer

Slesha IT INC
05.2024 - Current
  • Developed and implemented automated testing frameworks, using Test-Driven Development (TDD) practices to ensure that software releases met business requirements and maintained high quality.
  • Led cloud migration of legacy on-prem ETL workloads to Azure Cloud and Snowflake, modernizing enterprise healthcare data systems.
  • Built end-to-end data pipelines using ADF, SHIR, and Snowflake, enabling seamless batch and streaming data processing.
  • Developed automated data testing frameworks for regression, validation, and post-production checks using custom Python scripts.
  • Implemented data quality checkpoints in ADF and Airflow workflows to ensure accuracy, completeness, and timeliness of data.
  • Engineered CI/CD pipelines with GitHub Actions for consistent, version-controlled data deployment workflows.
  • Integrated and transformed real-time data streams via Apache Kafka and ensured reliable delivery with retry mechanisms.
  • Applied RBAC policies using Snowflake access controls and Azure role definitions for secure data access.
  • Designed data encryption strategies for data at rest in Blob Storage and in transit between Snowflake and ADF pipelines.
  • Built SQL-based stored procedures and optimized transformation queries to accelerate data refresh cycles.
  • Modeled data using star schema and snowflake schema techniques to support analytics and reporting.
  • Used Airflow to schedule, monitor, and alert for ETL pipeline execution across cloud and hybrid environments.
  • Designed and maintained metadata documentation, flow diagrams, and data dictionaries for lineage and traceability.
  • Collaborated with cross-functional teams to translate integration requirements into efficient data solutions.
  • Enabled data interoperability by designing RESTful APIs for secure inter-system data sharing.
  • Performed root cause analysis on pipeline failures and data anomalies, improving stability and trust.
  • Designed and implemented row-level access controls in Snowflake using secure views and masking policies.
  • Created alerting systems for job failures, long runtimes, and SLA breaches using Python-based monitoring scripts.
  • Partnered with Dev SecOps to enforce encryption, auditing, and compliance on all data pipelines.
  • Designed reporting solutions with Power BI to provide real-time insights on healthcare operations and trends.
  • Tuned Spark and SQL transformations to reduce data load and processing time across multiple pipelines.
  • Conducted performance benchmarking for Azure Data Factory activities and Snowflake virtual warehouses.
  • Implemented structured testing with unit, integration, and end-to-end test coverage across ETL stages.
  • Documented transformation logic and mappings between source, staging, and reporting layers.
  • Reviewed legacy SSIS processes and translated them into scalable ADF pipelines for cloud readiness.
  • Conducted peer reviews and mentored junior engineers on best practices for cloud-native data engineering.
  • Collaborated with cross-functional teams to identify data requirements and establish best practices.

Data Engineer

Legato Health Technologies
09.2020 - 12.2022
  • Developed and deployed data pipelines for processing large healthcare datasets, ensuring seamless data integration and transformation across systems.
  • Designed and deployed scalable data pipelines using AWS Redshift, Glue, and Lambda to ingest and transform healthcare data.
  • Automated ETL data quality validation using Py Test and JUnit, ensuring every pipeline stage passed schema conformance, NULL checks, and record count validations.
  • Transformed data using complex SQL and Python scripts to generate patient-level datasets for analytics, enabling population health reporting and cost optimization.
  • Created Redshift external tables and materialized views to support high-performance querying and incremental data refresh.
  • Defined and enforced role-based access controls (RBAC) and row-level security policies on Redshift and S3 datasets to protect PHI and maintain HIPAA compliance.
  • Integrated data governance practices by documenting data dictionaries, column-level lineage, and business rules for downstream analytics teams.
  • Tuned Redshift performance using vacuuming strategies, distribution styles, and query plan optimizations to reduce reporting query latency by 40%.
  • Used Terraform to provision reusable Redshift clusters, Glue jobs, and IAM roles across development, staging, and production environments.
  • Wrote metadata-driven ETL logic to handle dynamic column mappings and configurable transformation rules for healthcare provider data.
  • Led the data quality audit initiative to identify and patch anomalies in eligibility and claims datasets, improving overall trust in analytics outputs.
  • Coordinated end-to-end testing across 5+ data sources (Oracle, CSVs, HL7 extracts, etc.) ensuring consistent integration with reporting layers.
  • Developed Python scripts to automate nightly data refresh validations, comparing record counts, date gaps, and duplicate entries.
  • Created custom logs, dashboards, and alerts using CloudWatch and Slack API for proactive issue detection.

IT Analyst

Legato Health Technologies
05.2019 - 09.2020
  • Integrated RPA bots with SQL Server and Oracle databases to streamline claim intake, member lookup, and provider data validation workflows.
  • Built Python-based input validation and output reconciliation scripts to ensure end-to-end accuracy and consistency of automated data flows.
  • Designed and maintained comprehensive RPA documentation, including flowcharts, configs, exception logs, and runbooks, enhancing transparency and auditability.
  • Created and monitored dashboards in Power BI and SSRS to visualize bot performance KPIs, error rates, and runtime metrics.
  • Enabled secure bot operations by implementing credential vaulting, encrypted payload handling, and access segmentation to protect PHI and meet HIPAA requirements.
  • Conducted structured functional and regression testing of RPA workflows, identifying issues before deployment and maintaining a 98% bot success rate.
  • Collaborated with cross-functional teams to identify automation candidates in reporting, billing, and data integration workflows, aligning RPA initiatives with business goals.
  • Led production support for automated data workflows, troubleshooting failed runs and applying root-cause fixes to maintain SLA compliance.
  • Developed reusable bot components for healthcare-specific needs such as patient ID normalization, claim matching, and timestamp alignment, reducing development time for future automation.

Education

Master of Science - Computer Science

Kennesaw State University
Marietta, GE
05-2024

Skills

  • Proficient in Python, SQL, Java, Shell, and PL/SQL
  • Data engineering frameworks: Data bricks, Snowflake, Airflow
  • Database management: PostgreSQL, MySQL, Oracle
  • RESTful API design expertise
  • AWS Lambda data integration
  • Software testing expertise
  • Cloud platform expertise: AWS, Azure, Google Cloud
  • Data modeling expertise
  • Version control with Git

Timeline

Senior Data Engineer

Slesha IT INC
05.2024 - Current

Data Engineer

Legato Health Technologies
09.2020 - 12.2022

IT Analyst

Legato Health Technologies
05.2019 - 09.2020

Master of Science - Computer Science

Kennesaw State University