Summary
Overview
Work History
Education
Skills
Skills And Technologies
Personal Information
Timeline
Generic
Charan Charamalla

Charan Charamalla

Pleasanton,CA

Summary

Around 9+ years of experience in Data Engineering, working on building end-to-end data solutions that involve ingestion, transformation, modeling, and reporting for large-scale systems across healthcare, retail, and finance domains. Hands-on experience with big data technologies like Apache Spark, Hadoop, and Kafka, working on both batch and real-time processing workflows for operational and analytical use cases. Skilled in cloud environments including AWS, Azure and GCP with solid experience working on services like S3, Lambda, Glue, Redshift, EC2, Athena, Azure Data Lake, Bigquery and Data Factory. Developed robust and reusable ETL pipelines using Python, PySpark, and SQL to transform raw data into structured formats and deliver it to data lakes and data warehouses. Proficient in building data models and architecting database solutions with an understanding of normalization, denormalization, schema design, and performance optimization. Built and maintained data workflows that process high-volume transactional data using tools like Spark and Glue, with strong focus on reliability, performance, and maintainability. Integrated Kafka for real-time streaming use cases, setting up topics, producers, and consumers for event-driven architectures and near real-time alerting and analytics. Worked on automating data pipeline deployments and infrastructure changes using Terraform, Jenkins, and GitHub in CI/CD workflows to ensure faster, repeatable deployments. Applied machine learning models within data workflows using AWS SageMaker and Azure ML Studio, helping automate predictions for business operations and reporting. Designed serverless pipelines using AWS Lambda and event-based triggers, building lightweight and scalable solutions for data ingestion and processing. Experience working with data stored in S3, Azure Blob, and ADLS Gen2 with a focus on organizing storage, access control, and encryption practices for compliance and security. Implemented and managed data warehouse solutions in Snowflake and Redshift, using best practices around partitioning, clustering, and query optimization for better performance. Used DBT for data modeling and building semantic layers, and Airflow to orchestrate ETL and ELT pipelines with proper scheduling, monitoring, and error handling. Built dashboards using Tableau and Power BI, integrating them with cloud databases and providing business users with live insights and drill-down capabilities. Extracted and processed data from APIs and various RDBMS sources including PostgreSQL, MySQL, and SQL Server for downstream reporting and machine learning tasks. Managed version control using Git, participated in code reviews, and worked in Agile teams with tools like Jira to deliver sprint-based work and collaborate across teams. Applied Spark performance tuning techniques, such as caching, partition control, and memory tuning to improve data processing speed and cluster efficiency. Proficient in PostgreSQL and other RDBMS for schema design, query optimization, and performance tuning. Skilled in Docker and Kubernetes for containerized deployments and microservice orchestration. Extensive experience with Apache Kafka for real-time streaming, event-driven data ingestion, and alerting use cases. Supported DevOps and monitoring efforts using CloudWatch, ELK, and Prometheus to ensure pipeline visibility, logging, and real-time alerts for failures or delays. Designed microservice-based data applications and deployed them in Kubernetes environments using Docker and Helm for container orchestration. Supported data security, compliance, and access control using IAM roles, KMS, and encryption policies across cloud platforms. Maintained documentation in Confluence for pipeline logic, schedule, ownership, and known edge cases, making transitions and onboarding easier. Regularly collaborated with business stakeholders, data scientists, and engineering teams to translate requirements into scalable data solutions and ensure alignment between platform design and business goals.

Overview

10
10
years of professional experience
4
4
years of post-secondary education

Work History

Senior Data Engineer

Safeway
Pleasanton, CA
04.2023 - Current
  • Built scalable Spark-based ETL pipelines in PySpark to extract structured data from Oracle, SAP, and Teradata, applying schema handling and error logging before landing data into AWS S3 for downstream consumption.
  • Designed and maintained batch and streaming data workflows using AWS Glue, Lambda, and Kinesis to support real-time inventory, pricing, and supply chain use cases.
  • Set up data lake storage structure in S3 with proper partitioning, file formats, and folder management to make data easier to access and query by multiple teams.
  • Implemented CDC-based incremental pipelines using Glue and metadata tracking to handle source updates efficiently and avoid full refreshes.
  • Used DBT to build standardized data models from raw data, including staging, intermediate, and business logic layers for analytics and reporting.
  • Developed PySpark ETL scripts with custom validation rules, schema enforcement, and error handling, ensuring pipeline reliability and maintainability.
  • Optimized Athena query performance through partition tuning, proper file formats (Parquet/ORC), and table design aligned with usage patterns.
  • Created clean and structured tables in Redshift to support business dashboards, views, and scheduled data extracts.
  • Worked with the data science team to prepare cleaned, enriched, and pre-processed datasets for machine learning models, including feature generation.
  • Set up MLflow to automate experiment tracking and keep versioned records of model inputs, outputs, and parameters.
  • Managed orchestration of daily and hourly workflows using Apache Airflow, creating reusable DAG templates and adding alerts for failure detection.
  • Applied data security best practices using IAM role-based access and KMS encryption to safeguard S3 buckets and other cloud resources.
  • Collaborated with DevOps teams to build CI/CD pipelines using Jenkins and GitHub, ensuring smooth deployment of code and infrastructure changes.
  • Created structured logging and centralized monitoring for Spark and Glue jobs using the ELK stack, making it easier to troubleshoot issues.
  • Integrated Kafka producers/consumers with AWS Glue pipelines for real-time inventory updates and event-driven analytics.
  • Deployed containerized Spark jobs on Kubernetes clusters to improve scalability and reduce infrastructure costs.
  • Added support for PostgreSQL data sources, performing schema migration and optimizing ingestion to S3 and Redshift.
  • Assisted with onboarding of new data sources through APIs, SFTP flat file drops, and internal system extracts, building custom ingestion logic where needed.
  • Built utility functions in Python for common ETL tasks like column renaming, type conversion, null checks, and file archiving.
  • Worked closely with business users and analysts to understand data requirements and translate them into models and pipelines.
  • Documented data pipelines, transformation logic, and schedules in Confluence to support transparency and handoffs between teams.
  • Developed validation queries and source-target checks to support QA during UAT cycles and production deployments.
  • Tuned Spark job performance by adjusting partition logic, join strategies, and memory configurations in complex data flows.
  • Defined and tracked SLAs for data freshness and availability using Airflow sensors and alerting policies.
  • Partnered with data governance teams to ensure sensitive data was handled according to internal policies and compliance requirements.
  • Responded to pipeline issues and incidents during support hours by debugging logs, rerunning jobs, or adjusting configurations as needed.
  • Provided inputs on architectural discussions, sharing experience from previous patterns used for ingestion, processing, and modeling at scale.
  • Participated in Agile development cycles, including story grooming, estimations, retrospectives, and sprint planning meetings.
  • Environment: AWS (S3, Glue, Lambda, Kinesis, Redshift, Athena, EMR), Python, PySpark, Apache Spark, Airflow, DBT, MLflow, ELK Stack, Jenkins, GitHub, Oracle, Teradata, SAP, Confluence

Data Engineer

HCA Healthcare
Nashville, TN
09.2021 - 03.2023
  • Designed and implemented scalable ETL pipelines using PySpark on Databricks to process raw clinical, operational, and patient data into structured formats suitable for analysis and reporting across hospital departments. These pipelines supported various use cases such as patient outcomes, physician utilization, and internal compliance tracking.
  • Ingested HL7 and FHIR messages from hospital information systems into AWS S3 buckets, transforming them into flat, queryable structures using Spark SQL to make them accessible for downstream users including clinical data analysts and reporting teams.
  • Developed transformation logic to normalize patient, provider, and encounter data across different EHR platforms. Built mappings to align disparate data sources into a unified model for longitudinal patient analysis and care performance tracking.
  • Created ETL workflows that supported both batch and near real-time ingestion for priority use cases such as emergency admissions, medication administration, and patient flow metrics. Managed job scheduling and recovery logic for late or missing data.
  • Designed staging, intermediate, and curated data layers within Redshift, maintaining schema versioning and consistent naming standards to support analytics, reporting, and machine learning use cases across care delivery teams.
  • Set up orchestration and monitoring of daily workflows using Apache Airflow, developing modular DAGs with error logging, email alerts, and SLA sensors. Airflow served as the central coordination tool for all recurring data jobs across business domains.
  • Developed Python scripts and Spark transformations to calculate risk indicators, readmission probability, and chronic condition flags used in quality-of-care models and dashboards. Worked closely with medical informatics teams to validate clinical logic.
  • Configured AWS Lambda functions to handle file arrival events in S3 and trigger lightweight data quality checks before processing began. Integrated these checks into broader monitoring workflows to improve data trust and accuracy.
  • Built feature-ready datasets used in predictive modeling by handling missing values, encoding categorical variables, and engineering features from lab results, vitals, and medication history. All feature logic was version-controlled and documented.
  • Collaborated with privacy and compliance teams to ensure adherence to HIPAA and internal data handling policies. Set up IAM role-based access, implemented encryption in transit and at rest, and ensured sensitive fields were masked or redacted where needed.
  • Managed validation of incoming datasets by building schema comparison tools and row-level reconciliations to catch upstream data quality issues early. Integrated these checks into Airflow jobs and set up Slack and email alerts for any failed runs.
  • Supported the onboarding of external datasets such as population health statistics and third-party care coordination data, implementing custom ingestion logic and mapping them to internal data models for use in combined analytics views.
  • Participated in solution design sessions with architects and clinical leaders, providing input on data modeling approaches, transformation logic, and system reliability trade-offs. Offered guidance on best practices for scalable and auditable data flows.
  • Maintained detailed documentation in Confluence covering data flow diagrams, field-level transformation rules, job schedules, dependencies, and known limitations or workarounds for each data product.
  • Developed SQL-based dashboards in Tableau to track data pipeline health, row volumes, late arrivals, and critical field completeness across various source systems and time periods.
  • Provided hands-on support during user acceptance testing, building custom validation queries and partnering with QA analysts to track and resolve any mismatches between source and processed data.
  • Assisted clinical analysts and operational teams in writing advanced SQL queries for reporting, ad hoc data pulls, and trend analysis. Provided data dictionary references and helped optimize queries when datasets grew large.
  • Supported the migration of legacy ETL workflows into Databricks, refactoring older logic into reusable PySpark components and improving job runtime consistency and monitoring.
  • Worked with DevOps and data platform engineers to automate deployment pipelines using GitHub and Jenkins. Ensured that each change to ETL logic went through version control, peer review, and deployment validation.
  • Participated in sprint planning, backlog grooming, and retrospectives as part of the Agile team structure. Contributed ideas during planning sessions and shared lessons learned during end-of-sprint reviews to improve team delivery quality.
  • Environment: Databricks, PySpark, AWS (S3, Redshift, Lambda, Kinesis), Apache Airflow, HL7, FHIR, SQL Server, Snowflake, Kafka, Tableau, Jenkins, GitHub, Confluence

Data Engineer

Mayo Clinic
Rochester, MN
09.2019 - 08.2021
  • Migrated legacy MapReduce workflows into optimized Apache Spark jobs, modernizing processing frameworks used for large volumes of healthcare and financial data and significantly improving reliability and maintainability.
  • Built Spark-based data transformation logic to clean, join, and normalize patient billing, insurance, and cost center data for use in regulatory compliance and revenue cycle dashboards.
  • Implemented Sqoop-based incremental ingestion workflows from Oracle into Hive, carefully managing delta loads to ensure freshness of the reporting layer while maintaining historical integrity for audit use cases.
  • Automated periodic and event-driven ingestion from various systems into AWS S3 using Lambda and EMR, eliminating manual interventions and improving the pipeline’s ability to scale with increasing data volume.
  • Built multiple Tableau dashboards used by clinical and financial stakeholders to track budget utilization, claims backlog, and operational KPIs — enabling faster decision-making and trend visibility.
  • Created Snowflake schema models for departmental cost reporting and physician utilization analysis. Focused on building dimensionally structured datasets with clean joins and efficient query performance.
  • Developed and deployed near-real-time data integration solutions leveraging AWS Lambda, SNS, and S3 for handling operational events like patient discharge updates and lab result ingestion.
  • Used Databricks to build and run Python-based ETL pipelines for large-scale data integration involving clinical, lab, and claims data from both internal and third-party providers.
  • Supported IAM-based access controls across multiple AWS services, setting up policies to ensure restricted access to sensitive healthcare data in alignment with HIPAA requirements.
  • Trained and deployed ML models using AWS SageMaker for credit scoring and fraud risk detection, embedding model scoring logic into downstream batch pipelines for periodic risk review processes.
  • Integrated the Drools Rules Engine into the processing layer to enable business rule configuration outside of code, empowering operations teams to update thresholds and logic without developer intervention.
  • Worked with Aurora DB to store financial scoring data and implemented stored procedures to enable high-performance retrieval for near-real-time dashboards.
  • Implemented monitoring and alerting for all ETL pipelines using AWS CloudWatch, capturing job status, runtime anomalies, and unexpected schema changes with real-time alerts.
  • Used Informatica Data Director for MDM workflows, setting up matching and merging rules to deduplicate provider and patient data from various systems and improve data quality.
  • Designed and managed Redshift-based data marts used for long-term archival and analytical queries by finance and reimbursement teams, focusing on query optimization and access patterns.
  • Created SSIS-based workflows and T-SQL scripts to support some of the traditional on-prem pipelines that were still operational during the cloud migration phase.
  • Developed version-controlled Python modules and SQL scripts stored in Git, used pull request workflows, and participated in peer reviews to maintain coding standards and team transparency.
  • Helped implement CI/CD pipelines using Jenkins for continuous deployment of Spark and Glue jobs, streamlining releases and reducing errors during deployment cycles.
  • Assisted data governance and security teams by supporting vulnerability scanning processes using OWASP tools, SonarQube, and VeraCode to ensure pipelines and data processes met internal compliance requirements.
  • Collaborated with business analysts and data stewards to gather requirements, clarify transformation logic, and build documentation to support ongoing maintenance and onboarding of new team members.
  • Environment: Apache Spark, MapReduce, Hive, Sqoop, Python, AWS (S3, Lambda, EMR, Aurora, Redshift, CloudWatch, SageMaker, SNS), Tableau, Snowflake, Databricks, Drools Rules Engine, Informatica Data Director, Jenkins, Git, SSIS, T-SQL

Data Engineer

Sherwin Williams
Cleveland, OH
05.2017 - 08.2019
  • Designed and developed ETL workflows using Informatica to extract data from flat files, Excel, and internal reporting systems, transforming and loading it into Oracle data warehouses for enterprise-wide reporting needs.
  • Migrated legacy ETL logic to Informatica-based mappings, leveraging update strategies and workflow dependencies to ensure data consistency across development, QA, and production environments.
  • Worked closely with business stakeholders and database administrators to define data requirements, source-to-target mappings, and validation rules, ensuring that delivered solutions matched reporting expectations.
  • Developed reports and dashboards for Global and Technical Services teams using SSRS, Oracle BI, and Excel — incorporating PivotTables, VLOOKUPs, and Access queries to deliver trend and operational analysis to department managers.
  • Created custom Python scripts to automate manual data pulls and cleaning tasks, and built visualizations using Matplotlib to provide quick insights for non-technical stakeholders.
  • Performed discovery and reverse engineering of undocumented SSIS and SQL Server components, developing metadata dictionaries and lineage maps to support modernization.
  • Participated in Agile project cycles using SCRUM methodology, attending sprint planning meetings, documenting tasks in Jira, and contributing to retrospectives for ongoing project improvement.
  • Designed conceptual, logical, and physical data models using Power Designer, aligning data warehouse architecture with evolving business needs while documenting relationships, keys, and attributes clearly.
  • Analyzed large volumes of Excel and Oracle-based source data to build and test ETL logic, ensuring that business rules were implemented correctly and datasets met completeness and accuracy thresholds.
  • Developed data pipelines using IBM DataStage to integrate various data sources like flat files, SQL Server, and Oracle, applying transformation and cleansing logic to support downstream data marts and analytics layers.
  • Conducted thorough data validation by writing and executing SQL queries to check for duplicates, missing values, mismatches, and referential integrity issues across multiple target systems.
  • Built ETL packages using SSIS to extract data from OLTP systems and load them into star-schema structured OLAP databases. Integrated SSAS cubes to support advanced multidimensional reporting.
  • Supported database platform upgrade by migrating SQL Server 2008 to 2008 R2, including backup validation, compatibility checks, and reconfiguring affected SSIS packages and SQL Agent jobs.
  • Developed reusable PL/SQL components including stored procedures, functions, and triggers to automate transformations and reduce code duplication across multiple ETL workflows.
  • Performed backend validation of UNIX-based flat files before ingestion into staging layers, using shell commands and basic scripting to monitor job dependencies and file availability.
  • Used Erwin Model Mart to manage shared logical and physical models, helping coordinate metadata definitions across the data modeling and ETL teams. Also tracked SCD (slowly changing dimension) logic for key subject areas.
  • Designed high-level ETL architecture to guide data movement from operational systems to enterprise data warehouse, defining extraction logic, staging layers, fact/dimension table relationships, and scheduling triggers.
  • Supported QA and testing teams by preparing test cases, validating ETL outputs, and troubleshooting mismatches between source and target using detailed reconciliation queries.
  • Documented ETL flows, technical design documents, and deployment guides for internal teams, making handoffs and support transitions smoother.
  • Worked on data profiling tasks using SQL and Informatica features to identify anomalies and improve data quality ahead of key report releases.
  • Participated in review meetings with data governance teams to align on data definitions, lineage tracking, and shared lookup/reference table logic used across domains.
  • Environment: Informatica, IBM DataStage, Oracle, SQL Server 2008, SSIS, SSRS, PL/SQL, T-SQL, Erwin Model Mart, Power Designer, DB2, Netezza, UNIX, SAS, SPSS, Aginity, Access, Excel, Matplotlib, Python

Data Analyst

Cognizant Technology Solutions (CTS)
Chennai, India
10.2015 - 11.2016
  • Attended and participated in information and requirements gathering sessions and translated business requirements into working logical and physical data models for Data Warehouse, Data marts and OLAP applications.
  • Performed extensive Data Analysis and Data Validation on Teradata and designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN.
  • Created and maintained Logical Data Model (LDM) for the project includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
  • Integrated data from various Data sources like MS SQL Server, DB2, Oracle, Netezza and Teradata using Informatica to perform Extraction, Transformation, loading (ETL processes) Worked on ETL development and Data Migration using SSIS and (SQL Loader, PL/SQL).
  • Involved in Designed and Developed logical & physical data models and Meta Data to support the requirements using ERWIN.
  • Involved using ETL tool Informatica to populate the database, data transformation from the old database to the new database using Oracle.
  • Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models and Performance query tuning to improve the performance along with index maintenance.
  • Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata and wrote and executed unit, system, integration and UAT scripts in a DataWarehouse projects.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data Warehouse, and data mart reporting system in accordance with requirements.
  • Responsible for Creating and Modifying T-SQL stored procedures/triggers for validating the integrity of the data.
  • Worked on Data Warehouse concepts and dimensional data modelling using Ralph Kimball methodology.
  • Created number of standard reports and complex reports to analyze data using Slice & Dice and Drill Down, Drill through using SSRS.
  • Developed separate test cases for ETL process (Inbound & Outbound) and reporting.
  • Environment: Oracle, MS Visio, PL-SQL, Microsoft SQL Server, SSRS, T-SQL, Rational Rose, Data warehouse, OLTP, OLAP, ERWIN, Informatica 9.x, Windows, SQL, PL/SQL, SQL Server, Talend Data Quality, Oracle 9i/10g, Flat Files, Windows.

Education

Bachelor of Science - Computer Science

Anna University
Chennai,TN
08.2012 - 05.2016

Skills

  • AWS
  • Azure
  • GCP
  • Apache Hadoop
  • Apache Spark
  • Kafka
  • Informatica
  • DBT
  • ERwin
  • PowerDesigner
  • Snowflake
  • Redshift
  • Oracle
  • SQL Server
  • SAP HANA
  • Teradata
  • Python
  • SQL
  • Scala
  • Java
  • Shell Scripting
  • Airflow
  • Terraform
  • Jenkins
  • GitHub
  • AWS SageMaker
  • Snowflake schema design
  • Logical Data Modeling
  • Physical Data Modeling
  • AWS CloudFormation
  • GIT
  • CloudWatch
  • Grafana
  • New Relic
  • Prometheus
  • Dynatrace
  • Kubernetes
  • Helm
  • Docker Swarm
  • Tableau
  • Power BI
  • AWS QuickSight

Skills And Technologies

AWS (S3, Lambda, Redshift, EC2, ECR, EMR, Glue, Athena, DynamoDB, Kinesis, CloudWatch), Azure (Data Lake, Data Factory, HDInsight, Data Bricks), GCP (BigQuery, Dataflow, Pub/Sub, Composer, Dataproc), Apache Hadoop (HDFS, MapReduce, Hive, Sqoop), Apache Spark, Kafka, Spark Streaming, Informatica, AWS Glue, DBT, ERwin, PowerDesigner, other industry-standard tools, Snowflake, Redshift, Oracle, SQL Server, SAP HANA, Teradata, Data Modeling (Star schema, Snowflake schema), Python, SQL, Scala, Java, Shell Scripting, Airflow, Terraform, Jenkins, GitHub, AWS SageMaker, Snowflake schema design, Logical and Physical Data Modeling, Jenkins, GitHub, Terraform, AWS CloudFormation, GIT, CloudWatch, Grafana, New Relic, Prometheus, Dynatrace, Kubernetes (Container orchestration, microservice management, Helm, Docker Swarm), Tableau (Dashboards, interactive reports), Power BI, Grafana, AWS QuickSight

Personal Information

Title: Senior Data Engineer

Timeline

Senior Data Engineer

Safeway
04.2023 - Current

Data Engineer

HCA Healthcare
09.2021 - 03.2023

Data Engineer

Mayo Clinic
09.2019 - 08.2021

Data Engineer

Sherwin Williams
05.2017 - 08.2019

Data Analyst

Cognizant Technology Solutions (CTS)
10.2015 - 11.2016

Bachelor of Science - Computer Science

Anna University
08.2012 - 05.2016