Summary

Overview

Work History

Education

Skills

Timeline

Harsha Konakalla

Data Engineer

Somerset,NJ

Summary

Dynamic Data Engineer skilled in developing ETL/ELT pipelines using Azure Data Factory and PySpark. Expertise in data quality frameworks and automation has consistently improved reporting accuracy and reduced errors, fostering collaboration across cross-functional teams.

Overview

years of professional experience

Work History

Data Engineer

CVS Pharmacy

Irving, Texas

02.2024 - Current

Designed and optimized ETL/ELT pipelinesusing Azure Data Factory, PySpark, and Databricks to process large-scale PBM data including claims, formulary, rebates, and drug pricing.
Built data quality frameworks in PySpark/SparkSQLto validate eligibility, pharmacy networks, and rebate calculations, ensuring accuracy in PBM financial reporting.
Developed scalable data models in Snowflake and Delta Lake for claims adjudication, drug utilization review, and underwriting analytics.
Implemented data ingestion pipelines from multiple sources (EDI, HL7, flat files, APIs) ensuring HIPAA compliance, PHI/PII security, and audit readiness.
Automated data validations, test cases, and QA checks across PBM summarization tables, reducing errors in profit & loss (P&L) reporting.
Collaborated with underwriters, PBM clients, and clinical teams to translate business rules into SQL, dbt transformations, and scalable data workflows.
Integrated monitoring, logging, and alertingusing Azure Monitor, Databricks jobs, and CI/CD pipelines (Azure DevOps)for proactive issue resolution.
Created metadata-driven frameworks for data lineage, governance, and audit tracking, improving transparency in PBM financial and regulatory reporting.
Partnered with QA/UAT teams to deliver regression testing, automation scripts, and GenAI-based validation tools for faster release cycles.
Documented and presented data architecture, pipelines, and validation dashboards to leadership, enabling decision-making on formulary, rebates, and network management.

Data Engineer

Morgan Stanley (Deloitte)

India, India

05.2021 - 07.2022

Developed scalable ETL pipelines using Apache Spark and Python, improving trade and transaction data processing reliability by 27%.
Optimized SQL and Hive queries for banking datasets, reducing data retrieval time by 40% and improving reporting efficiency.
Automated data ingestion and reconciliation workflows from multiple banking sources via Python and REST APIs, cutting manual effort by 33%.
Implemented Change Data Capture (CDC) pipelines in Talend and Oracle DB, ensuring accurate portfolio and trade reporting.
Migrated financial datasets to AWS Redshift, optimizing schema for real-time reporting and compliance dashboards.
Built Power BI dashboards for investment performance, portfolio analysis, and transaction monitoring to support business decisions.
Integrated cross-platform banking data from CRM, trading, and market feeds, enhancing data consistency and analytics accuracy.
Conducted data validation, reconciliation, and quality checks using Spark DataFrames and SQL for regulatory and internal audit compliance.
Collaborated with business analysts, traders, and finance teams to translate investment banking requirements into actionable ETL workflows and insights.

Data Engineer

Coca-Cola (Atos)

India, India

12.2017 - 04.2021

Developed automated ETL workflows using PySpark, Python, and SQL, reducing pipeline runtime by 25% for daily retail POS and inventory data ingestion.
Improved data cleaning and preprocessing efficiency by 40% using Pandas and NumPy, enabling accurate sales, promotions, and warehouse analytics.
Optimized SQL queries and database joinsin MySQL and MSSQL, increasing reporting performance by 20% for regional and national retail dashboards.
Implemented anomaly detection models (clustering & outlier detection) using Scikit-learn, identifying 15% more sales and supply chain inconsistencies proactively.
Reduced data duplication by 40% via unique key mappings and hash functions, ensuring consistent transaction and inventory datasets.
Designed and delivered interactive dashboardsusing Power BI and Tableau to monitor KPIs such as sales trends, stock levels, and promotion effectiveness.
Built metadata-driven data validation scriptsto ensure end-to-end data quality and integrity across ETL pipelines and Delta/Redshift tables.
Integrated pipelines with Azure Data Factory and Google BigQuery, improving scalability, reliability, and accessibility for business analysts and operations teams.
Collaborated with business stakeholders to define reporting requirements, KPIs, and edge cases, translating them into actionable analytics and automated validations.

Education

Master of Science - Computer Science

Southeast Missouri State University

Cape Girardeau, MO

12-2023

Bachelor of Technology - Electronics and Communication

Vellore Institute of Technology

Vellore, India

06.2018

Skills

Programming & Scripting: Python (Pandas, NumPy, SciPy, Scikit-learn, PyTorch, Matplotlib, Seaborn), Java, SQL, Shell Scripting, HTML, CSS, R
Big Data & ETL: Apache Spark, PySpark, Apache Kafka, Apache Airflow, AWS Glue, Talend, Informatica, Alteryx, SSIS, Azure Data Factory, Data Pipeline Orchestration, Real-time Streaming, Batch Processing
Databases & Data Warehousing: PostgreSQL, MySQL, Microsoft SQL Server, Oracle, Amazon Redshift, Google BigQuery, Snowflake, Teradata, Data Modeling, ER Diagrams, Sparx EA
Cloud Platforms: AWS (S3, Redshift, Lambda, Glue, EC2), Azure (Data Factory, Synapse, Databricks), GCP (BigQuery, Dataflow, Pub/Sub), Cloud Security & IAM
Data Visualization & BI: Power BI, Tableau, Looker, QlikView, Dashboard Design, KPI Tracking, Ad-hoc Reporting
Machine Learning & AI: Scikit-learn, PyTorch, Pandas, NumPy, Matplotlib, Seaborn, SciPy, Feature Engineering, Model Validation, Predictive Analytics

Version Control & CI/CD: Git, GitHub, GitLab, Jenkins, GitHub Actions, CI/CD Pipelines, Build Automation, Deployment Orchestration
Monitoring & Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Prometheus, Grafana, Alerting, Logging & Metrics, Data Observability
Data Governance & Quality: Data Validation, Data Profiling, Data Lineage, Data Quality Checks, Anomaly Detection, Metadata Management, Compliance (SOX, GDPR)
Methodologies: Agile, Scrum, Kanban, SAFe, SDLC, Waterfall, Test-Driven Development, DataOps Practices
Project Management & Collaboration: JIRA, Rally, Confluence, Stakeholder Communication, Requirement Gathering, Documentation, Team Collaboration

Timeline

Data Engineer

CVS Pharmacy

02.2024 - Current

Data Engineer

Morgan Stanley (Deloitte)

05.2021 - 07.2022

Data Engineer

Coca-Cola (Atos)

12.2017 - 04.2021

Master of Science - Computer Science

Southeast Missouri State University

Bachelor of Technology - Electronics and Communication

Vellore Institute of Technology

Harsha Konakalla

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Education

Master of Science - Computer Science

Bachelor of Technology - Electronics and Communication

Skills

Timeline

Data Engineer

Data Engineer

Data Engineer

Master of Science - Computer Science

Bachelor of Technology - Electronics and Communication

Similar Profiles

Anastasios DonesAnastasios Dones

Saravanan RSaravanan R

Faiza AliFaiza Ali

Carmen D Cintrón MarreroCarmen D Cintrón Marrero

Aseal EltayebAseal Eltayeb