Summary
Overview
Work History
Education
Skills
Certification
Personal Information
Timeline
Generic

Jhansi Bussa

Summary

Data Engineer with over 3+ years of experience designing, building, and maintaining data pipelines to support data integration, transformation, and reporting needs across enterprise systems. Strong expertise in SQL and Python, with practical experience in writing complex queries, optimizing database performance, and automating data workflows. Experienced in working with structured and semi-structured data using relational databases such as SQL Server, Snowflake, Redshift, and Oracle. Proficient in developing scalable ETL solutions using tools like Apache Airflow, AWS Glue, Azure Data Factory, and Dataflow across AWS, Azure, and GCP environments. Hands-on experience with real-time data streaming and processing using Kafka and Spark Streaming for time-sensitive business applications. Proven ability to analyze and troubleshoot data pipeline issues, perform root cause analysis, and implement solutions that improve data accuracy and performance. Background in implementing data validation and reconciliation processes to ensure consistency and reliability of data across systems. Familiar with CI/CD pipelines, version control, and automated deployments using Jenkins, GitHub Actions, and Terraform. Successfully collaborated on cross-functional teams, supporting multiple projects and aligning data solutions with business and technical requirements. Committed to maintaining data security and compliance with regulatory standards including HIPAA and GDPR.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Engineer

Ryan Specialty
Chicago
11.2024 - Current
  • Company Overview: The goal of this project was to enhance the insurance data infrastructure at Ryan Specialty by designing and implementing scalable, secure, and high-performing data pipelines. The initiative aimed to automate data ingestion and processing from diverse sources, improve data quality, and enable advanced reporting capabilities.
  • Designed and developed automated data pipelines to extract, transform, and load data from various internal and third-party insurance systems into a centralized data platform.
  • Built modular ETL processes using Azure Data Factory and SQL Server to support high-volume data workloads and streamline analytics readiness.
  • Created and optimized T-SQL queries for data transformation, cleansing, and performance tuning, improving processing speed and reducing resource usage.
  • Developed batch automation solutions to reduce manual dependencies and enhance data pipeline reliability and maintainability.
  • Monitored and supported production workflows, handling issue resolution and ensuring timely data delivery for underwriting, claims, and compliance teams.
  • Modeled large-scale insurance datasets in Snowflake using star and snowflake schemas, improving query performance and supporting BI initiatives.
  • Integrated Kafka and Spark Streaming for real-time data processing of policy updates and transaction events, enabling faster decision-making.
  • Worked closely with DevSecOps teams to implement CI/CD pipelines using Jenkins and GitHub, ensuring version-controlled and efficient code deployment.
  • Assisted in cloud migration initiatives by transitioning legacy workloads to Google BigQuery, enhancing scalability and reducing processing costs.
  • Conducted thorough data validation and implemented reconciliation checks to maintain data accuracy and integrity across systems.
  • Partnered with business stakeholders and data analysts to gather requirements and deliver reporting solutions using Power BI for actionable insights.
  • The goal of this project was to enhance the insurance data infrastructure at Ryan Specialty by designing and implementing scalable, secure, and high-performing data pipelines. The initiative aimed to automate data ingestion and processing from diverse sources, improve data quality, and enable advanced reporting capabilities.
  • Environment: Azure Synapse Analytics, Azure Data Factory, Snowflake, SQL Server, Kafka, Spark Streaming, PySpark, BigQuery, Power BI, T-SQL, Jenkins, GitHub, Kubernetes, Docker, Python, Hive, Elasticsearch, MongoDB, Tableau, Kibana

AWS Data Engineer

CHG Healthcare
Utah
11.2023 - 10.2024
  • Company Overview: This project aimed to enhance CHG Healthcare’s enterprise data platform by building reliable, scalable pipelines to manage the processing of Electronic Health Records (EHR), claims data, billing information, and provider performance metrics.
  • Developed end-to-end data pipelines using Python, AWS Glue, and Lambda to automate ingestion and transformation of structured and semi-structured healthcare data.
  • Designed and maintained modular ETL workflows with AWS Glue Workflows and Step Functions to handle high-volume datasets and reduce operational overhead.
  • Conducted regular batch processing tasks, including job scheduling, monitoring, and root cause analysis for resolving data pipeline failures.
  • Performed data validation, reconciliation, and quality assurance across staging and reporting layers to ensure data consistency and compliance.
  • Managed secure data storage and transformation using Amazon S3, Redshift, and SQL Server, supporting analytical needs for claims and billing systems.
  • Implemented dimensional models including Star Schema and Data Vault 2.0 in Snowflake and Redshift for historical tracking and regulatory reporting.
  • Worked with DevSecOps teams to deploy and maintain CI/CD pipelines using Jenkins and GitHub Actions for consistent and secure code delivery.
  • Supported both batch and real-time data processing using AWS-native services, with a focus on tuning performance and optimizing resource usage.
  • Collaborated with cross-functional teams data analysts, developers, and compliance officers—to align technical solutions with healthcare analytics requirements.
  • Provided production support and resolved time-sensitive issues, including during off-hours, to ensure availability of data critical to healthcare operations.
  • This project aimed to enhance CHG Healthcare’s enterprise data platform by building reliable, scalable pipelines to manage the processing of Electronic Health Records (EHR), claims data, billing information, and provider performance metrics.
  • Environment: AWS Glue, AWS Lambda, Step Functions, Amazon S3, Redshift, Snowflake, SQL Server, SnapLogic, Python, Jenkins, GitHub Actions, CloudWatch, Athena, EMR, Unix/Linux, Data Vault 2.0, Agile Methodology.

GCP Data Engineer

DMI
India
08.2021 - 07.2023
  • Company Overview: The project at DMI focused on transforming the company’s legacy insurance data infrastructure by migrating on-premise systems to a modern, fully managed Google Cloud Platform (GCP) environment.
  • Developed robust ETL and ELT pipelines using Apache Beam and Cloud Dataflow to process batch and streaming insurance data from APIs, Kafka, flat files, and third-party systems.
  • Migrated large-scale datasets from on-premise Hadoop clusters and traditional warehouses into BigQuery to support modern data analytics with improved performance and scalability.
  • Automated recurring data processing jobs with Cloud Functions, Cloud Scheduler, and Cloud Composer, ensuring high availability and minimizing manual interventions.
  • Managed production workflows, including scheduling, monitoring, and resolving data pipeline failures to maintain uninterrupted data flow across systems.
  • Built streaming pipelines using Pub/Sub and Dataflow to process transactional insurance events such as claims updates, underwriting requests, and policy renewals in real-time.
  • Implemented rigorous data validation and reconciliation mechanisms to ensure consistency and accuracy across policy, claims, and customer datasets.
  • Tuned BigQuery queries and optimized transformation logic to improve execution time and reduce resource consumption.
  • Designed containerized microservices with Cloud Run and GKE to process and expose insurance data endpoints in real-time applications.
  • Used Terraform to provision GCP infrastructure consistently across development, testing, and production environments.
  • Supported CI/CD integration through Jenkins and GitHub, ensuring version control and smooth deployment cycles.
  • Engaged with stakeholders, analysts, and development teams to align solutions with reporting requirements and operational goals.
  • Facilitated onboarding of internal users and provided best practices for querying and consuming data efficiently in BigQuery.
  • The project at DMI focused on transforming the company’s legacy insurance data infrastructure by migrating on-premise systems to a modern, fully managed Google Cloud Platform (GCP) environment.
  • Environment: Google Cloud Platform (BigQuery, Cloud Dataflow, Pub/Sub, Cloud Functions, Cloud Composer, Cloud Run, GKE, GCS, Dataproc), Apache Beam, Python, SQL, Terraform, Cloud Scheduler, Jenkins, GitHub, Kafka, Snowflake, PySpark, Hive, Oracle DB, Linux

Data Engineer Intern

Sandoz
India
08.2020 - 07.2021
  • Company Overview: This project at Sandoz focused on building a centralized data platform to manage pharmaceutical manufacturing, clinical research, and distribution data for regulatory compliance and internal reporting.
  • Built automated data pipelines using PySpark and Python to process clinical, manufacturing, and sales data.
  • Designed scalable data workflows using AWS Glue and Amazon S3 for batch processing of healthcare datasets.
  • Developed and optimized PL/SQL procedures in Oracle to manage ETL processes for clinical data integration.
  • Created data models in Snowflake and Azure SQL to track drug performance, trial metrics, and compliance reports.
  • Integrated SAP SD data to add customer and product transaction insights to reporting pipelines.
  • Developed Tableau and Power BI dashboards to visualize clinical progress, product trends, and safety events.
  • Performed data validation and reconciliation to ensure accuracy throughout the data processing stages.
  • Applied data security practices like encryption and access control to meet HIPAA and GDPR standards.
  • Supported production workflows by monitoring, troubleshooting, and resolving batch data issues.
  • Collaborated with analysts, QA, and regulatory teams to align data solutions with reporting needs.
  • This project at Sandoz focused on building a centralized data platform to manage pharmaceutical manufacturing, clinical research, and distribution data for regulatory compliance and internal reporting.
  • Environment: AWS Glue, Amazon S3, Azure Data Factory, Snowflake, Azure SQL, Oracle (PL/SQL), SAP SD, Tableau, Power BI, PySpark, Python (Pandas), HiveSQL, PrestoSQL, MSSQL, Jenkins, Docker, FTP, Data Lake, Linux, GDPR, HIPAA

Education

Master of Science - Business Analytics

Kent State University

Skills

  • AWS (S3, Glue, Lambda, Redshift, Step Functions, RDS)
  • Azure (Data Lake, Synapse)
  • GCP (BigQuery, Dataflow)
  • AWS Glue
  • ADF
  • Dataflow
  • IBM DataStage
  • Talend
  • Informatica
  • SnapLogic
  • Matillion
  • Apache NiFi
  • FastAPI
  • Custom DQ Frameworks (SQL, Python, Lambda)
  • LangChain
  • OpenAI GPT-4
  • FAISS
  • Haystack
  • Hugging Face
  • RAG Pipelines
  • Snowflake
  • Redshift
  • SQL Server
  • Oracle
  • PostgreSQL
  • MySQL
  • MongoDB
  • Cassandra
  • DB2 (Exposure)
  • HBase
  • ChromaDB
  • Hadoop
  • Elasticsearch
  • DBT
  • Jinja templating
  • Data Vault 20
  • Star/Snowflake Schemas
  • SQL optimization
  • Python
  • R
  • Shell (bash)
  • Java
  • SQL
  • PySpark
  • JSON
  • HTML
  • CSS
  • Apache Airflow (Cloud Composer)
  • Spark
  • Kafka
  • Step Functions
  • Oozie
  • Jenkins
  • CloudWatch
  • Power BI
  • Tableau
  • Looker
  • Kibana
  • IBM Cognos (Support)
  • Terraform
  • GitHub Actions
  • CI/CD Pipelines
  • HIPAA/GDPR Compliance
  • SSMS
  • TOAD
  • Azure Data Studio
  • RStudio
  • SoapUI
  • GitHub
  • Visual Studio

Certification

  • Salesforce Certified Administrator (SCA), 01/01/24
  • AWS Cloud, 01/01/21
  • AI & ML Certification, 01/01/22

Personal Information

Title: Data Engineer

Timeline

Data Engineer

Ryan Specialty
11.2024 - Current

AWS Data Engineer

CHG Healthcare
11.2023 - 10.2024

GCP Data Engineer

DMI
08.2021 - 07.2023

Data Engineer Intern

Sandoz
08.2020 - 07.2021

Master of Science - Business Analytics

Kent State University
Jhansi Bussa