Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

VAMSI KRISHNA BARIGELA MAHESH

New Jersey,NJ

Summary

Experienced Data Engineer with over 3 years of expertise in designing and implementing scalable data solutions. Proficient in data integration, ETL pipeline development, and data modeling, with a strong background in leveraging big data technologies and cloud platforms for processing and analyzing large-scale datasets. Adept at collaborating with cross-functional teams to deliver high-quality data solutions that align with business goals.

Overview

7
7
years of professional experience
1
1
Certification

Work History

Data Engineer

Discover Financial Services
04.2024 - Current
  • Engaged in all phases of the Software Development Life Cycle (SDLC), including analysis, design, and development, while collaborating with the team using Agile methodologies
  • Implemented PySpark to process data from diverse RDBMS and streaming sources, utilizing Snowflake for data warehousing solutions
  • Designed and deployed end-to-end data pipelines and analytics solutions using AWS services, including EMR, EC2, S3, RDS, Lambda, Glue, SQS, and Redshift
  • Developed efficient Spark-SQL scripts for data processing and executed complex HiveQL queries on Hive tables
  • Created and managed Hive tables, implementing partitioning, dynamic partitions, and bucketing for optimized data analysis
  • Built data pipelines to extract, transform, and load (ETL) data from multiple sources into Snowflake tables to meet business requirements
  • Configured CI/CD pipelines using Git and Jenkins to streamline the deployment and management of big data architecture on AWS
  • Orchestrated workflows for large-scale data transformations using Apache Airflow and Apache Oozie to schedule and automate Hadoop jobs

Data Engineer

Discover Financial Services
01.2022 - 02.2023
  • Developed ETL pipelines to load, transform, and analyze large structured, semi-structured, and unstructured datasets using Azure Data Factory, Spark SQL, and Hive
  • Ingested data into Azure services, including Azure Data Lake, Blob Storage, and Azure SQL Data Warehouse, and processed data in Azure Databricks for analytics
  • Created pipelines in Azure Data Factory using Linked Services, Datasets, and Pipelines to extract, transform, and load data from diverse sources like Azure SQL and Blob Storage
  • Partnered with data scientists to integrate machine learning models within Spark pipelines, utilizing Spark MLlib for predictive analytics and real-time decision-making
  • Designed and implemented batch and streaming workflows in Spark for high availability and reliability of mission-critical systems
  • Enhanced the company’s big data ecosystem using Hadoop and Spark, enabling efficient processing of petabyte-scale datasets
  • Created and managed RDDs and leveraged DataFrames for efficient manipulation and analysis of structured data
  • Implemented complex Spark SQL queries for data aggregation and analysis, integrating Hive and HBase for effective data storage and retrieval
  • Configured CI/CD pipelines in Azure DevOps for automated build, test, and deployment of applications
  • Automated workflows with Apache Oozie, reducing manual data handling efforts and improving operational efficiency

Data Engineer

Amazon Development Centre
12.2017 - 12.2019
  • Designed and implemented scalable ETL pipelines for both cloud and on-premises environments, focusing on data modeling and data migration
  • Strong expertise in Apache Spark, including Spark Core, Spark SQL, and Spark Streaming, with hands-on experience developing applications for data validation, cleansing, transformation, and aggregation
  • Configured Spark Streaming to process real-time data from Apache Kafka and store it in HDFS using Scala and PySpark
  • Built partitioned and bucketed Hive tables in Parquet file formats with Snappy compression for optimized storage and query performance
  • Automated data workflows using Apache Airflow, reducing manual intervention by40% and ensuring timely data availability for analytics
  • Migrated legacy systems to cloud platforms (AWS and Azure) using AWS CloudFormation and IAM, achieving a20% reduction in operational costs while improving scalability
  • Created real-time data streaming solutions with Kafka, enabling instant access to critical data for analytics and decision-making
  • Implemented data lakes on AWS S3, leveraging partitioning and optimization techniques to enhance query performance and reduce costs
  • Collaborated with cross-functional teams using Jira, Confluence, and Git, ensuring seamless project execution
  • Designed and managed CI/CD pipelines with Jenkins, AWS CodeBuild, and Azure DevOps, reducing deployment cycles by50%
  • Migrated complex SSIS packages to Databricks, improving data processing speeds by over50% while lowering infrastructure costs

Education

Master of Science - Business Analytics

Sacred Heart University
Fairfield, CT
06.2024

Master of Science - Business with International Management

Northumbria University
Newcastle Upon Tyne, United Kingdom
01-2022

Bachelor of Science - Mechanical engineering

Vardhaman College Of Engineering
Hyderabad, Telangana
04.2017

Skills

  • Programming Languages: Scala, Python, Java, R, SQL, JavaScript
  • Big Data Technologies: Hadoop, Apache Spark, Kafka, HDFS, MapReduce, Sqoop, Hive, Pig, Flume, NiFi, Impala, Zookeeper, Yarn, Cassandra, Snowflake, Apache Flink, Airflow, Cloudera Manager
  • Cloud Platforms: AWS: S3, Lambda, Athena, EMR, Kinesis, Redshift, RDS, Step Functions, CloudWatch, ECS, ElasticSearch, SNS, Route53, IAM, Glue, CodePipeline, CodeDeploy, SageMaker, QuickSight Azure: Databricks, Blob Storage, Azure Functions, HDInsight, Stream Analytics, Event Hubs, Logic Apps, Virtual Machines, Azure Service Bus, Synapse Analytics GCP: Comprehensive understanding of data and cloud services
  • ETL & Data Storage: SSIS, SSAS, PostgreSQL, MySQL, MongoDB, Cassandra, DynamoDB, Redshift
  • Data Visualization: Tableau, Power BI, Amazon QuickSight, Grafana
  • Data Analytics & Processing: Data Manipulation, Data Cleaning, Data Integration, Data Transformation, Data Streaming, Data Pipelining
  • Machine Learning & MLOps: TensorFlow, PyTorch, Scikit-learn
  • DevOps Tools: Docker, Kubernetes, Jenkins, Git, AWS CodeBuild, AWS CodeDeploy
  • Methodologies: Agile, Waterfall, SDLC
  • Security & Networking: VPC configurations, IAM roles, Security Groups, Network Protocols

Certification

  • AWS Certified Data Engineer Associate
  • Microsoft Certified Azure Data Engineer Associate

Timeline

Data Engineer

Discover Financial Services
04.2024 - Current

Data Engineer

Discover Financial Services
01.2022 - 02.2023

Data Engineer

Amazon Development Centre
12.2017 - 12.2019

Master of Science - Business Analytics

Sacred Heart University

Master of Science - Business with International Management

Northumbria University

Bachelor of Science - Mechanical engineering

Vardhaman College Of Engineering
VAMSI KRISHNA BARIGELA MAHESH