Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Vijay Ramaraju Jampana

Plano,TX

Summary

Experienced with designing and optimizing data pipelines to ensure seamless data flow. Utilizes advanced SQL and Python skills to create and maintain robust data architectures. Track record of implementing scalable solutions that enhance data integrity and support informed decision-making.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Data Engineer / Data Analyst

M2M Computing Solutions
05.2022 - Current
  • Architected an enterprise-grade data lakehouse on Databricks using Delta Lake and AWS S3 to consolidate 30+ siloed data sources and support analytics, reporting, and ML workloads.
  • Delivered end-to-end consulting by gathering client requirements, translating them into technical architecture diagrams, and facilitating whiteboarding sessions with data engineers and analysts.
  • Developed Spark-based ingestion pipelines for both batch and streaming data (real-time clickstream and POS logs) with custom partitioning, schema evolution handling, and late-arriving data management.
  • Optimized Spark runtime performance, reducing job latency by 55% using caching strategies, broadcast joins, and adaptive query execution tuning.
  • Created reusable reference architectures and implementation blueprints, enabling client teams to accelerate adoption of Delta Lake, streaming, and ML lifecycle workflows.
  • Built and deployed a feature engineering pipeline in Python for fraud detection models; tracked experiments and versions using MLflow; models served via REST APIs on Databricks endpoints.
  • Designed CI/CD pipelines with GitHub Actions and Terraform for infrastructure provisioning, environment promotion, and automated testing of notebooks and workflows.
  • Consulted with security and compliance teams to implement RBAC, fine-grained access control, and data encryption-at-rest/in-transit for compliance with HIPAA and GDPR regulations.
  • Defined and implemented monitoring and observability dashboards using CloudWatch and Databricks audit logs for pipeline health checks, SLAs, and usage analytics.
  • Integrated third-party tools like Tableau and Power BI via JDBC connectors to support self-service business analytics from curated Delta tables.
  • Designed and deployed Spark-based ETL workflows using PySpark to clean, transform, and enrich datasets.
  • Developed complex T-SQL stored procedures and functions for analytics.
  • Integrated version control using Git and implemented CI/CD practices for automated testing and deployment.
  • Applied star schema and dimensional modeling techniques to optimize Synapse SQL views for Power BI consumption.

Senior Software Engineer

Capgemini
06.2019 - 07.2021
  • Worked on Azure Data Factory to integrate data of both on-prem (MY SQL, Cassandra) and cloud (Blob storage, Azure SQL DB) and applied transformations to load back to Azure Synapse.
  • Monitored Spark cluster using Log Analytics and Ambari Web UI. Transitioned log storage from Cassandra to Azure SQL Datawarehouse and improved the query performance.
  • Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API).
  • Develop dashboards and visualizations to help business users analyze data as well as providing data insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.
  • Performed the migration of large data sets to Databricks (Spark), create and administer cluster, load data, configure data pipelines, loading data from ADLS Gen2 to Databricks using ADF pipelines.
  • Created various pipelines to load the data from Azure data lake into Staging SQLDB and followed by to Azure SQL DB
  • Created Databrick notebooks to streamline and curate the data for various business use cases and also mounted blob storage on Databrick.
  • Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Data bricks.

Education

Masters - Business Analytics

University of North Texas
Denton, Texas
05.2023

Skills

  • Languages: C/C, Java, Python, Scala, SQL, R
  • Cloud: AWS , Azure
  • Technologies & Tools: GitHub Actions, Airflow, Power BI, Tableau, Spark, Hive, Postgre SQL, MySQL, Snowflake, Kubernetes, Docker, Splunk, Kafka
  • ETL development
  • Data warehousing
  • Data modeling
  • Data pipeline design

Certification

  • Certifications: Snowflake snowpro certified, Oracle certified associate
  • Data Engineering on Udemy
  • Machine Learning and Deep Learning Specialization on Udemy

Timeline

Data Engineer / Data Analyst

M2M Computing Solutions
05.2022 - Current

Senior Software Engineer

Capgemini
06.2019 - 07.2021

Masters - Business Analytics

University of North Texas
Vijay Ramaraju Jampana