Summary

Overview

Work History

Education

Skills

Websites

Personal Information

Timeline

Vijitha Battula

Herndon,VA

Summary

Senior Data Engineer with 10+ years of expertise in AWS ecosystem, Spark technologies, and multi-cloud data platforms (AWS, Databricks, Snowflake), Demonstrated mastery in architecting ETL processes, optimizing cross-platform data workflows that improve performance by 60%, and building agentic AI tools using AWS Bedrock that reduce manual processes by 35%. Deep expertise in AI/ML solutions (SageMaker, Bedrock), SAP-to-AWS migrations (100+ workflows), and providing expert-level technical guidance across AWS BigData services (EMR, Glue, Athena, Redshift, Kinesis, DynamoDB). Knowledgeable with robust background in data architecture and pipeline development. Proven ability to streamline data processes and enhance data integrity through innovative solutions. Demonstrates advanced proficiency in SQL and Python, leveraging these skills to support cross-functional teams and drive data-driven decision-making.

Overview

years of professional experience

Work History

Designated Cloud Data Engineer - BigData Services

Amazon Web Services (AWS)

Herndon, Virginia, USA

09.2025 - Current

Led critical event support for Unified ops/Enterprise customers processing 10TB+ daily data across AWS BigData services (EMR, Athena, Kinesis, DynamoDB, Glue, Lakeformation, Sagemaker,Bedrock,Gen AI), providing engineer-led proactive guidance from design through post-launch retrospectives with 99.99% event success rate.
Designed and optimized customer data workflows by gathering requirements and analyzing pain points, then architecting scalable ETL pipelines and orchestration solutions using AWS best practices (Glue, Step Functions, MWAA) that improved processing efficiency and reduced operational overhead.
Drove proactive risk mitigation and capacity planning through comprehensive workload discovery, event runbook development, and monitoring validation, reducing incident response time by 40% through pre-event assessments and readiness reviews.
Built agentic AI tools using Amazon Bedrock and Snowflake MCP servers to automate customer workflows and reduce manual processes, enabling intelligent data querying and analysis across multi-cloud data platforms while improving operational efficiency by 35%.
Served as single point of technical contact for BigData workloads, delivering expert troubleshooting and architectural guidance while managing escalations, conducting post-event retrospectives, and ensuring continuous improvement across customer engagements.
Architected cross-platform data integration solutions connecting Databricks to AWS cloud infrastructure, enabling customers to leverage AWS Glue ETL jobs for processing Databricks datasets and loading transformed data into Amazon Redshift for enterprise analytics and business intelligence reporting.
Guided customers in modernizing legacy ETL workflows by migrating on-premises Databricks workloads to AWS, leveraging Glue's serverless architecture to process Databricks data at scale and deliver analytics-ready datasets to Redshift, enabling real-time business insights and data-driven decision making.
Implemented federated data access patterns using AWS Glue to query Snowflake Iceberg tables through Horizon Catalog integration and Databricks Delta Lake tables simultaneously, enabling customers to build unified data lakehouse architectures without data duplication while maintaining governance and security controls.
Designed automated SAP data integration pipelines using AWS Glue to extract data from SAP systems (S/4HANA, ECC, BW) via OData APIs and RFC connections, implementing incremental data transfer with delta tokens and change data capture mechanisms that processed terabytes of SAP data daily with 99.9% accuracy.
Led large-scale SAP-to-AWS data migration for enterprise customers, architecting and executing the migration of 100+ SAP workflows to Amazon S3 using AWS Glue ETL jobs with SAP OData and pyRFC connectors, reducing data extraction time by 50% while ensuring zero data loss during cutover.

BIGDATA/ETL CLOUD ENGINEER

Amazon Web Services (AWS)

Herndon, Virginia, USA

06.2019 - 08.2025

Resolved critical, highly complex customer problems spanning multiple AWS BigData services (EMR, Glue, Athena, DynamoDB, Kinesis, ElasticSearch) by applying advanced troubleshooting techniques across distributed systems, providing unique tailored solutions that addressed customers' individual technical and business needs.
Drove customer interactions through multiple channels including phone, chat, email, screen shares, and conference calls, working directly with AWS Service architects to reproduce, diagnose, and resolve complex technical issues while maintaining strong customer focus and service excellence.
Provided expert-level technical support for data warehouse solutions (Redshift, BigQuery, Snowflake), search engines (ElasticSearch, Apache Solr), and streaming services (Kafka, Kinesis), performing database performance tuning, troubleshooting, and optimization for enterprise-scale workloads.
Architected enterprise-scale data integration solutions by gathering business requirements and designing real-time streaming architectures using AWS Lambda to extract Salesforce data and load into DynamoDB, ensuring high availability and data consistency.
Led large-scale SAP-to-AWS data migration for enterprise customers, orchestrating the migration of 100+ SAP workflows to Amazon S3 using AWS Glue ETL jobs with SAP OData and pyRFC connectors, implementing enterprise-grade data governance with AWS Secrets Manager, VPC connectivity, and AWS Glue Data Catalog schemas that maintained data lineage and metadata consistency.
Tackled complex data quality challenges by designing automated data validation models and metrics frameworks to capture missing data in real-time integrations, developing admin scripts to reprocess failed records through SQS queues, significantly reducing errors and improving data reliability.
Demonstrated strong Linux/Unix system administration skills with expertise in system monitoring, analysis, and troubleshooting of distributed computing environments and large complex Hadoop clusters using tools like ping, traceroute, tcpdump, and system performance analyzers.
Applied expert understanding of ETL principles within Hadoop ecosystem, including proficiency in Hadoop MapReduce, Zookeeper, HBase, HDFS, Pig, Hive, and Spark for processing datasets over 10TB daily.
Optimized multi-cloud data pipelines by establishing Databricks-to-Redshift and Snowflake-to-Redshift data flows through AWS Glue, implementing incremental data loading strategies and partition pruning techniques that improved query performance by 60% and reduced data transfer costs by 35%.
Enabled seamless Databricks-Snowflake-AWS integration by configuring JDBC/ODBC connections and implementing secure authentication mechanisms, allowing customers to orchestrate end-to-end data workflows across platforms using AWS Glue jobs.
Maintained deep technical expertise across AWS's expanding portfolio of big data analytics services (Redshift, EMR, Glue, Athena, QuickSight) and Generative AI services (Bedrock, Amazon Q Developer), providing expert-level technical support for EMR (Hadoop), DynamoDB (NoSQL), ElasticSearch, Kinesis, and ML/AI solutions.
Implemented end-to-end ML pipelines using Amazon SageMaker for predictive analytics projects, integrating machine learning models with scalable ETL workflows to enable automated decision-making.
Handled high-severity situations requiring immediate resolution with strong multi-tasking skills, managing full application stacks from OS through custom applications while coordinating with cross-functional teams.
Strong analysis and troubleshooting skills with experience in Hadoop architecture, administration, and support for maintaining infrastructure of large complex systems and clusters.
Deep understanding of networking concepts and protocols (DNS, TCP/IP, DHCP, HTTPS, SSH), security best practices, virtualization technologies (VMware, Xen, Hypervisor), and cloud computing fundamentals.

DATA ENGINEER

Walmart

Bentonville, Arkansas, USA

01.2017 - 05.2019

Translated complex data structures into actionable business intelligence, enabling executive leadership to make data-driven strategic decisions across 500+ retail locations through comprehensive analytics and reporting frameworks.
Developed comprehensive Tableau dashboards with interactive visualizations and real-time data analysis capabilities, empowering stakeholders across merchandising, supply chain, and operations teams to make informed decisions with sub-minute data latency.
Engineered Python-based data visualization frameworks leveraging libraries including Matplotlib, Plotly, and Seaborn to transform complex multi-dimensional datasets into actionable insights, driving strategic initiatives that improved inventory management efficiency by 30%.
Architected and implemented efficient ETL processes using Apache Spark on EMR clusters, optimizing data transformation workflows that reduced processing time by 45% and enabled timely business insights for mission-critical analytics workloads.
Designed and implemented scalable HDFS-based data storage architectures with optimized partitioning strategies, ensuring high-throughput data ingestion and retrieval for petabyte-scale data warehouses supporting enterprise analytics.
Designed and deployed multi-region Apache Kafka clusters with disaster recovery capabilities and automated failover mechanisms, achieving 99.99% uptime and ensuring business continuity for mission-critical data streaming operations.
Optimized Kafka broker configurations and tuned performance parameters including batch size, compression codecs, and replication factors, resulting in 40% reduction in message latency and improved throughput for real-time inventory management systems.
Created streaming ETL workflows integrating Apache Kafka, Spark Streaming, and Delta Lake for real-time data warehousing, enabling near-real-time analytics on transactional data with exactly-once processing semantics.
Collaborated with cross-functional teams including data science, business intelligence, and operations to align data strategies and establish data governance frameworks, resulting in improved data quality, standardized metrics definitions, and enhanced project outcomes.
Fostered a collaborative engineering environment by mentoring junior data engineers on Spark optimization techniques, Kafka best practices, and data modeling principles, enhancing team capabilities and promoting knowledge sharing across the organization.

JR BIGDATA ENGINEER

Symbiosys Technologies

Vizag, India

08.2014 - 06.2015

Installed and configured multi-node Hadoop clusters on AWS infrastructure, implementing HDFS architecture with optimized replication factors and rack awareness to ensure high availability and fault tolerance for distributed data processing.
Designed and implemented Hive tables with optimized partitioning and bucketing strategies, creating both managed and external tables to streamline data queries and facilitate quicker insights for business decision-making.
Executed complex MapReduce jobs with custom mappers and reducers to optimize data processing workflows, resulting in substantial improvements in analytics speed and accuracy while reducing computational overhead by 30%.
Orchestrated large-scale data warehouse migration from legacy on-premises systems to AWS Redshift, implementing ETL pipelines that enhanced data retrieval efficiency by 3x and reduced operational infrastructure costs by 35%.
Scheduled and managed Oozie workflows for coordinating Hive and Pig jobs, ensuring timely execution of data processing tasks with automated dependency management and error handling mechanisms.
Managed Hadoop cluster operations including node provisioning, decommissioning, and data backup strategies using HDFS snapshots and distcp utilities, ensuring 99.5% cluster uptime and data durability.
Implemented data ingestion pipelines using Apache Sqoop and Flume to transfer data from relational databases and log files into HDFS, processing terabytes of structured and semi-structured data for analytics workloads.
Optimized Hive query performance through query tuning, indexing strategies, and partition pruning techniques, reducing query execution time by 40% for frequently accessed datasets.
Coordinated with cross-functional teams including database administrators, application developers, and business analysts to align data processing schedules and ensure seamless integration of Hadoop ecosystem with existing enterprise systems.
Fostered a collaborative engineering environment by documenting best practices for Hadoop cluster management, conducting knowledge-sharing sessions, and mentoring team members on MapReduce programming patterns and Hive optimization techniques.

Education

Master of Science - Computer Science

University of Central Missouri

Missouri, MO

12.2016

Bachelor of Science - Computer Science Engineering

Jawaharlal Nehru Technological University

AP, India

2014

Skills

HDFS
MapReduce
Apache Hive
Apache Pig
Apache Sqoop
Apache Spark
Apache Oozie
Apache Presto
Apache Flink
Apache Airflow
Apache Kafka
Spark Streaming
Delta Lake
Apache Iceberg
ETL/ELT
RDBMS
EMR
Athena
Glue
Data Pipeline
AppFlow
MWAA
IAM
S3
RDS
DynamoDB
Kinesis
EC2
Lambda
VPC
EKS
Redshift
CloudFormation
Batch
SageMaker
Bedrock
Amazon Q Developer
Lake Formation
Step Functions
CloudWatch
Secrets Manager
KMS
HealthLake

Databricks
Snowflake
Google BigQuery
Azure Data Factory
Python
SQL
Scala
PySpark
Hive-QL
Shell Scripting
Snow-SQL
Amazon Redshift
Amazon RDS
Oracle
SQL Server
PostgreSQL
MySQL
HBase
ElastiCache
MongoDB
Tableau
Power BI
Amazon QuickSight
SAP Lumira
RStudio
Matplotlib
Plotly
Eclipse
PyCharm
Visual Studio Code
Jupyter Notebooks
Zeppelin
GitHub
GitLab
CodeCommit
Docker
Kubernetes
Jenkins
Terraform
Linux
Windows
MacOS

Websites

Personal Information

Title: Data Engineer

Timeline

Designated Cloud Data Engineer - BigData Services

Amazon Web Services (AWS)

09.2025 - Current

BIGDATA/ETL CLOUD ENGINEER

Amazon Web Services (AWS)

06.2019 - 08.2025

DATA ENGINEER

Walmart

01.2017 - 05.2019

JR BIGDATA ENGINEER

Symbiosys Technologies

08.2014 - 06.2015

Master of Science - Computer Science

University of Central Missouri

Bachelor of Science - Computer Science Engineering

Jawaharlal Nehru Technological University

Vijitha Battula

Summary

Overview

Work History

Designated Cloud Data Engineer - BigData Services

BIGDATA/ETL CLOUD ENGINEER

DATA ENGINEER

JR BIGDATA ENGINEER

Education

Master of Science - Computer Science

Bachelor of Science - Computer Science Engineering

Skills

Websites

Personal Information

Timeline

Designated Cloud Data Engineer - BigData Services

BIGDATA/ETL CLOUD ENGINEER

DATA ENGINEER

JR BIGDATA ENGINEER

Master of Science - Computer Science

Bachelor of Science - Computer Science Engineering

Similar Profiles

Shivarama Krishna DakojuShivarama Krishna Dakoju

Thallam GowthamiThallam Gowthami

Prudhvi DudyalaPrudhvi Dudyala

Sachin JainSachin Jain