Summary
Overview
Work History
Skills
Timeline
Generic

Venkata P

Plano

Summary

  • Software Engineer with 9+ years of experience designing, implementing, and modernizing large-scale cloud-native data platforms across AWS, Azure, and GCP.
  • Proven expertise in Databricks Lakehouse architecture, Delta Lake, Delta Live Tables (DLT), and Unity Catalog for enterprise data governance and lineage.
  • Skilled in Apache Spark, PySpark, Spark SQL, and performance tuning of batch and streaming workloads for high-throughput analytics.
  • Hands-on experience in data pipeline development (ETL/ELT), real-time streaming, and medallion architecture (Bronze/Silver/Gold).
  • Successfully led data migrations from legacy DWH, on-prem Hadoop, Hive, Teradata, and Netezza to cloud-based Databricks Lakehouse solutions.
  • Deep knowledge of cloud-native services: AWS (S3, EMR, Glue, Lambda, Redshift), Azure (ADLS Gen2, Synapse, ADF, Databricks), GCP (GCS, BigQuery, Dataproc, Composer).
  • Expertise in security and governance: RBAC/ABAC, PII masking, column-level security, encryption, and IAM integration for enterprise compliance.
  • Experienced in DevOps and CI/CD: GitHub/GitLab, Azure DevOps, Databricks Repos, Terraform automation, and deployment pipelines.
  • Trusted technical advisor for architecture design, migration strategy, and platform modernization, delivering scalable, high-performance analytics solutions for stakeholders.
  • Strong foundation in data modeling, data mesh concepts, dimensional modeling, and building enterprise-ready, cost-optimized cloud data platforms.

Overview

10
10
years of professional experience

Work History

Software Developer

PNC Bank
02.2022 - Current

Project: ICREMS(Intelligent Credit Risk Evaluation & Monitoring System)

  • Designed Databricks Lakehouse architecture using Gold, Silver layers to ingest, curate, and serve enterprise credit-risk datasets at scale.
  • Built Apache Spark batch pipelines for borrower, exposure, and transaction data with optimized partitioning, caching, and AQE.
  • Implemented Delta Lake tables with ACID transactions, schema evolution, and time-travel for regulatory audit and replay use cases.
  • Engineered ingestion frameworks conceptually aligned with Delta Live Tables (DLT) including SCD handling, data quality checks, and SLA monitoring.
  • Migrated legacy Oracle and Cassandra credit datasets into a cloud-native Lakehouse backed by AWS S3 and Delta formats.
  • Designed near real-time Spark Structured Streaming pipelines consuming Kafka events for continuous credit monitoring and alerting.
  • Applied Unity Catalog–style governance patterns including RBAC, column-level security, and PII masking across curated datasets.
  • Optimized Spark workloads through file compaction, Z-Ordering, and adaptive query execution, improving analytical query performance.
  • Developed Databricks SQL–ready analytical views for portfolio risk, delinquency trends, and exposure analytics consumed by BI tools.
  • Integrated MLflow-compatible experiment tracking for LLM-assisted borrower scoring and risk classification workflows.
  • Implemented cloud security and IAM integration using AWS IAM roles, encryption at rest/in transit, and audit logging.
  • Acted as data architecture advisor, guiding stakeholders on Lakehouse adoption, migration strategy, cost optimization, and scalability.

Software Developer

CVS
12.2019 - 01.2022

Project: Digital Prior Authorization Platform (DPAP)

  • Designed Lakehouse-style data architecture to process clinical, pharmacy, and claims data across raw, curated, and analytics zones.
  • Built PySpark ETL pipelines to ingest high-volume healthcare datasets into Delta Lake on Azure Data Lake Gen2.
  • Implemented batch and micro-batch processing for near real-time authorization workflows using Spark SQL.
  • Applied Dimensional Modeling to curated healthcare datasets supporting regulatory and operational analytics.
  • Modernized legacy workflows by migrating from traditional batch systems to Azure Databricks–based pipelines.
  • Implemented data quality rules, schema validation, and lineage tracking to support HIPAA-compliant governance.
  • Designed secure access controls using RBAC, encryption, and integration with Azure AD .
  • Built Databricks SQL–ready datasets and views powering compliance dashboards and utilization reporting.
  • Collaborated on cloud migration strategy for moving on-prem healthcare data platforms to Azure Databricks Lakehouse.
  • Optimized Spark jobs using partition pruning, caching, and efficient join strategies to reduce processing latency.
  • Enabled downstream BI and ML workloads by publishing standardized, governed Delta tables.
  • Partnered with enterprise data teams to define Lakehouse governance standards, cost controls, and operational best practices.

Software Engineer

Shopify
06.2016 - 11.2019

Project: Smart Promotions Engine

  • Designed event-driven data ingestion pipelines producing Databricks-ready JSON and Parquet datasets for analytics use cases.
  • Built scalable product and transaction data models aligned with Lakehouse and Delta modeling principles.
  • Optimized Elasticsearch-backed data pipelines supporting analytics on 10M+ product records.
  • Implemented batch-friendly and streaming-compatible schemas to support downstream Spark SQL analytics.
  • Designed cloud-native data services deployed on Kubernetes, enabling scalable data ingestion and processing.
  • Partnered with data engineering teams to publish curated datasets for growth analytics and forecasting.
  • Applied performance tuning techniques across data ingestion, indexing, and query execution paths.
  • Supported analytics and ML consumers by standardizing data contracts and event schemas

Skills

Programming Languages:
Java, Python, SQL, JavaScript, TypeScript, C

Backend Frameworks & APIs:
Spring Boot, Spring MVC, Spring Batch, Spring Security, Hibernate, JPA, RESTful APIs, GraphQL, SOAP, Drools

Frontend Technologies:
Reactjs, Angular, Angular Material, HTML5, CSS3, Apollo Client

Messaging & Integration:
Apache Kafka, RabbitMQ, Apache Camel, FHIR APIs, REST Assured

Cloud & Deployment:
AWS (ECS, EKS, EC2, S3, API Gateway, IAM, Parameter Store), Azure, GCP, Docker, Kubernetes (EKS), Jenkins, Maven, Git

Databases:
PostgreSQL, MySQL, Oracle, MongoDB, Cassandra

Big Data & Distributed Systems:
Hadoop, Hive, Spark

Monitoring & Logging:
Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)

Authentication & Security:
OAuth2, JWT, LDAP, Spring Security, API Gateway throttling, IP filtering

Testing Tools:
JUnit, TestNG, Mockito, Jasmine, REST Assured

Development Methodologies:
Agile, Scrum, TDD, CI/CD, Jenkins Pipelines, Maven

Compliance & Governance:
HIPAA (Healthcare), Financial Regulatory Standards

Operating Systems & Scripting:
Linux, Unix, Shell Scripting

Timeline

Software Developer

PNC Bank
02.2022 - Current

Software Developer

CVS
12.2019 - 01.2022

Software Engineer

Shopify
06.2016 - 11.2019
Venkata P