Summary
Overview
Work History
Education
Skills
Timeline
Generic

Harsha Konakalla

Data Engineer
Somerset,NJ

Summary

Dynamic Data Engineer skilled in developing ETL/ELT pipelines using Azure Data Factory and PySpark. Expertise in data quality frameworks and automation has consistently improved reporting accuracy and reduced errors, fostering collaboration across cross-functional teams.

Overview

5
5
years of professional experience

Work History

Data Engineer

CVS Pharmacy
Irving, Texas
02.2024 - Current
  • Designed and optimized ETL/ELT pipelinesusing Azure Data Factory, PySpark, and Databricks to process large-scale PBM data including claims, formulary, rebates, and drug pricing.
  • Built data quality frameworks in PySpark/SparkSQLto validate eligibility, pharmacy networks, and rebate calculations, ensuring accuracy in PBM financial reporting.
  • Developed scalable data models in Snowflake and Delta Lake for claims adjudication, drug utilization review, and underwriting analytics.
  • Implemented data ingestion pipelines from multiple sources (EDI, HL7, flat files, APIs) ensuring HIPAA compliance, PHI/PII security, and audit readiness.
  • Automated data validations, test cases, and QA checks across PBM summarization tables, reducing errors in profit & loss (P&L) reporting.
  • Collaborated with underwriters, PBM clients, and clinical teams to translate business rules into SQL, dbt transformations, and scalable data workflows.
  • Integrated monitoring, logging, and alertingusing Azure Monitor, Databricks jobs, and CI/CD pipelines (Azure DevOps)for proactive issue resolution.
  • Created metadata-driven frameworks for data lineage, governance, and audit tracking, improving transparency in PBM financial and regulatory reporting.
  • Partnered with QA/UAT teams to deliver regression testing, automation scripts, and GenAI-based validation tools for faster release cycles.
  • Documented and presented data architecture, pipelines, and validation dashboards to leadership, enabling decision-making on formulary, rebates, and network management.

Data Engineer

Morgan Stanley (Deloitte)
India, India
05.2021 - 07.2022
  • Developed scalable ETL pipelines using Apache Spark and Python, improving trade and transaction data processing reliability by 27%.
  • Optimized SQL and Hive queries for banking datasets, reducing data retrieval time by 40% and improving reporting efficiency.
  • Automated data ingestion and reconciliation workflows from multiple banking sources via Python and REST APIs, cutting manual effort by 33%.
  • Implemented Change Data Capture (CDC) pipelines in Talend and Oracle DB, ensuring accurate portfolio and trade reporting.
  • Migrated financial datasets to AWS Redshift, optimizing schema for real-time reporting and compliance dashboards.
  • Built Power BI dashboards for investment performance, portfolio analysis, and transaction monitoring to support business decisions.
  • Integrated cross-platform banking data from CRM, trading, and market feeds, enhancing data consistency and analytics accuracy.
  • Conducted data validation, reconciliation, and quality checks using Spark DataFrames and SQL for regulatory and internal audit compliance.
  • Collaborated with business analysts, traders, and finance teams to translate investment banking requirements into actionable ETL workflows and insights.

Data Engineer

Coca-Cola (Atos)
India, India
12.2017 - 04.2021
  • Developed automated ETL workflows using PySpark, Python, and SQL, reducing pipeline runtime by 25% for daily retail POS and inventory data ingestion.
  • Improved data cleaning and preprocessing efficiency by 40% using Pandas and NumPy, enabling accurate sales, promotions, and warehouse analytics.
  • Optimized SQL queries and database joinsin MySQL and MSSQL, increasing reporting performance by 20% for regional and national retail dashboards.
  • Implemented anomaly detection models (clustering & outlier detection) using Scikit-learn, identifying 15% more sales and supply chain inconsistencies proactively.
  • Reduced data duplication by 40% via unique key mappings and hash functions, ensuring consistent transaction and inventory datasets.
  • Designed and delivered interactive dashboardsusing Power BI and Tableau to monitor KPIs such as sales trends, stock levels, and promotion effectiveness.
  • Built metadata-driven data validation scriptsto ensure end-to-end data quality and integrity across ETL pipelines and Delta/Redshift tables.
  • Integrated pipelines with Azure Data Factory and Google BigQuery, improving scalability, reliability, and accessibility for business analysts and operations teams.
  • Collaborated with business stakeholders to define reporting requirements, KPIs, and edge cases, translating them into actionable analytics and automated validations.

Education

Master of Science - Computer Science

Southeast Missouri State University
Cape Girardeau, MO
12-2023

Bachelor of Technology - Electronics and Communication

Vellore Institute of Technology
Vellore, India
06.2018

Skills

  • Programming & Scripting: Python (Pandas, NumPy, SciPy, Scikit-learn, PyTorch, Matplotlib, Seaborn), Java, SQL, Shell Scripting, HTML, CSS, R
  • Big Data & ETL: Apache Spark, PySpark, Apache Kafka, Apache Airflow, AWS Glue, Talend, Informatica, Alteryx, SSIS, Azure Data Factory, Data Pipeline Orchestration, Real-time Streaming, Batch Processing
  • Databases & Data Warehousing: PostgreSQL, MySQL, Microsoft SQL Server, Oracle, Amazon Redshift, Google BigQuery, Snowflake, Teradata, Data Modeling, ER Diagrams, Sparx EA
  • Cloud Platforms: AWS (S3, Redshift, Lambda, Glue, EC2), Azure (Data Factory, Synapse, Databricks), GCP (BigQuery, Dataflow, Pub/Sub), Cloud Security & IAM
  • Data Visualization & BI: Power BI, Tableau, Looker, QlikView, Dashboard Design, KPI Tracking, Ad-hoc Reporting
  • Machine Learning & AI: Scikit-learn, PyTorch, Pandas, NumPy, Matplotlib, Seaborn, SciPy, Feature Engineering, Model Validation, Predictive Analytics
  • Version Control & CI/CD: Git, GitHub, GitLab, Jenkins, GitHub Actions, CI/CD Pipelines, Build Automation, Deployment Orchestration
  • Monitoring & Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Prometheus, Grafana, Alerting, Logging & Metrics, Data Observability
  • Data Governance & Quality: Data Validation, Data Profiling, Data Lineage, Data Quality Checks, Anomaly Detection, Metadata Management, Compliance (SOX, GDPR)
  • Methodologies: Agile, Scrum, Kanban, SAFe, SDLC, Waterfall, Test-Driven Development, DataOps Practices
  • Project Management & Collaboration: JIRA, Rally, Confluence, Stakeholder Communication, Requirement Gathering, Documentation, Team Collaboration

Timeline

Data Engineer

CVS Pharmacy
02.2024 - Current

Data Engineer

Morgan Stanley (Deloitte)
05.2021 - 07.2022

Data Engineer

Coca-Cola (Atos)
12.2017 - 04.2021

Master of Science - Computer Science

Southeast Missouri State University

Bachelor of Technology - Electronics and Communication

Vellore Institute of Technology
Harsha KonakallaData Engineer