Overview

Work History

Education

Skills

Awards

Timeline

VAMSI KRISHNA BHASHYAM

Austin,TX

Overview

years of professional experience

Work History

Cloud Data Engineer

JPMorgan Chase

04.2024 - Current

Responsible for migrating on-prem Data Lake to AWS Cloud S3 backed Data Lake
Responsible for building end to end data pipelines in cloud infrastructure
Responsible for fine-tuning, troubleshooting, and supporting the enterprise data pipelines at production scale
Written Python-based Spark applications for performing various data transformations, and other custom event processing
Involved in data cleansing, event enrichment, data aggregation, and data preparation needed for machine learning and reporting
We used Spark-SQL to read data from hive tables and perform various data cleansing, data validations, transformations, and aggregations as per downstream business team requirements
Deployed to Kubernetes, Created Pods, and managed using Kubernetes
Used Build Automation pipelines to drive all microservices builds out to the Docker registry in AWS
Automated resulting scripts and workflow using Airflow orchestration and shell scripting to ensure daily execution in production
Involved in continuous Integration of applications using Jenkins
Responsible for loading processed data to the Data Warehousing table to allow the Business reporting team to build dashboards
Work with cross functional teams within the data science, software engineering and analytics team to design, develop and execute solutions to derive business insights and solve client’s operational and strategic problems
Worked on data visualization and analytics with research scientists and business stakeholders
Superior communication skills, strong decision making and organizational skills along with outstanding analytical and problem-solving skills to undertake challenging jobs
Environment: Spark, Kafka, AWS S3, EMR, Redshift, Hive, Snowflake, EC2, Airflow, Jenkins
Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
Designed scalable and maintainable data models to support business intelligence initiatives and reporting needs.
.Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
Evaluated various tools, technologies, and best practices for potential adoption in the company''s data engineering processes.

Data Engineer

Smart and Final

08.2023 - 03.2024

Responsible for building end-to-end data pipelines in Azure cloud infrastructure, ensuring efficient data handling and processing
Developed python-based spark applications for data transformations and event processing, contributing to the refinement of data analytics and reporting capabilities
Successfully designed, developed, and maintained complex data pipelines, including a 650 TB migration using Azure Data Factory, enhancing system reliability and integrity
Experience in Azure cloud platform, managing virtual networks and VM’s, Databricks, and optimizing cloud infrastructure for data engineering tasks
Skilled in automating cloud infrastructure with ARM templates for Function apps, Key-Vaults, Virtual networks etc
Embodying the principle of full ownership in build and deployment processes
Led the development and implementation of continuous integration and deployment pipelines, incorporating Git Action workflows for automated deployment of infrastructure and applications
Implemented cost-saving strategies in data storage management, transitioning between hot, cold, and archive tiers, resulting in significant savings (approximately $200K)
Diagnose and resolve production issues and resource utilization, improving performance and costefficiency
Created customer-focused data dashboards for analytics and monitoring, utilizing Python scripting for effective data integration and sharing between ADLS and Snowflake
Composed and maintained comprehensive documentation and deployment guides to streamline and standardize the build and release procedures, ensuring best practices and team alignment
Environment: Spark, Azure, ADF, Function apps, Event-hubs, SQL, VM, Databricks, Git-Action, Snowflake, ADLS

Big Data Developer

Apple

03.2022 - 08.2023

Responsible for building end to end data pipelines in cloud infrastructure
Handled large datasets of structured, semi structured, and unstructured data using Hadoop/bigdata concepts
Written Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to No-SQl DB
Troubleshooting Spark applications for improved error tolerance and reliability
Involved in creating external Hive tables from the files stored in the S3
Optimized Hive tables utilizing partitions and bucketing to give better execution Hive QL queries
Fine-tuning spark applications/jobs to improve the efficiency and overall processing time for the pipelines
Utilized Spark in Memory capabilities, to handle large datasets
Implemented data quality checks using spark and arranged bad and passable flags on the data
Followed Agile Methodologies while working on the project
Worked with Version control for source code management, build automations for continuous integration & Crucible for code reviews
Designed, documented operational problems by following standards and procedures using JIRA
Environment: Spark, Kafka, AWS S3, EMR, Pyspark, Athena, Hive, Snowflake, EC2, Airflow, Jenkins, Docker, GIT

Cloud Data Engineer

T - Mobile

03.2021 - 02.2022

Utilized AWS to aggregate clean files in Amazon S3 and deployed files into Buckets via Amazon EC2 Clusters
Developed a data pipeline on AWS to extract data from weblogs and store it in HDFS and migrated data from AWS S3 to HDFS using Kafka
Designed a Data Quality Framework for schema validation and data profiling using Spark (PySpark)
Employed PySpark-SQL to load JSON data, create schema RDDs and DataFrames, and integrate it into Hive Tables, managing structured data with Spark-SQL
Created views and templates with Python and Django’s view controller and templating language, employing MVC architecture to deliver a user-friendly interface
Developed ETL/ELT pipelines using data technologies such as PySpark, Hive, Presto, and Databricks
Applied best practices in data architecture, integration, and governance, including Data Catalogs, Governance frameworks, Metadata management, and Data Quality solutions
Successfully implemented ETL solutions between OLTP and OLAP databases to support Decision Support Systems, with expertise across all SDLC phases
Created Python scripts for managing AWS resources via Boto3 SDK and AWS CLI, and established CI/CD pipelines using Maven, GitHub, and AWS
Specialized in real-time processing and core job development with Kafka and Spark Streaming and developed UNIX shell scripts for parameterizing Sqoop and Hive jobs
Extensively imported metadata into Hive using Python and migrated existing tables and applications to AWS
Environment: Spark, Kafka, AWS S3, EMR, Redshift, Hive, Snowflake, EC2, Airflow, Jenkins

Database Engineer

Squircle

03.2018 - 12.2020

Gather requirements for change requests with the Business team and create design documents
Created database objects like Tables, Views, Sequences, Synonyms, DB Links, Stored Procedures, Functions, Packages, Cursor, Ref Cursor and Triggers
Wrote complex SQL Statements, Complex Joins, Co-related Sub-queries, and SQL Statements with Analytical Functions
Effectively made use of Table Functions, generated columns, Indexes, Table Partitioning, Collections, and Materialized Views
Used Ref Cursors, Indexes, Joins and Exceptions extensively in coding
Tuning of the SQL queries, which takes long time to process the request using Explain Plan, Hints to reduce the response time
Performed SQL and PL/SQL tuning using tools like EXPLAIN PLAN, SQL
TRACE
Extensively used Oracle Hints to direct the optimizer to choose an optimum query Execution Plan
Extensively used Bulk Collection in PL/SQL Objects for improving the performance
Handled errors using Exception Handling extensively for debugging and maintainability
Automated Oracle execution using Unix Cron Utility in Unix Environment
Responsible for writing Unix Shell scripts for loading data using SQL
Loader
The Control Files for the tables were created and automated through UNIX shell scripts to perform data load into Oracle tables
Used SQL Loader and PL/SQL scripts to load data into the system application

Education

MS - Computer Science

Lindsey Wilson College

05.2022

Bs - B.com(Computers)

Satyabhama University

05.2018

Skills

Hadoop
Hive
Spark
Map Reduce
Sqoop
Python
SQL
Java
Scala
Bash
PyCharm
Tableau
Docker
Airflow
Jenkins
Eclipse

Git
JIRA
Oracle
MySQL
SQL Server
Sybase
MongoDB
Redshift
HBase
Maven
Gradle
AWS
Azure
Snowflake
Data Lake

Awards

Third Place Award, Lindsey Wilson College - KY HACK-A-LWC coding competition, Spring 2021

Timeline

Cloud Data Engineer

JPMorgan Chase

04.2024 - Current

Data Engineer

Smart and Final

08.2023 - 03.2024

Big Data Developer

Apple

03.2022 - 08.2023

Cloud Data Engineer

T - Mobile

03.2021 - 02.2022

Database Engineer

Squircle

03.2018 - 12.2020

Bs - B.com(Computers)

Satyabhama University

MS - Computer Science

Lindsey Wilson College

VAMSI KRISHNA BHASHYAM

Overview

Work History

Cloud Data Engineer

Data Engineer

Big Data Developer

Cloud Data Engineer

Database Engineer

Education

MS - Computer Science

Bs - B.com(Computers)

Skills

Awards

Timeline

Cloud Data Engineer

Data Engineer

Big Data Developer

Cloud Data Engineer

Database Engineer

Bs - B.com(Computers)

MS - Computer Science

Similar Profiles

PASTELL JENKINSPASTELL JENKINS

Supraj CBSupraj CB

Noemie VillanuevaNoemie Villanueva

Kimberly De La RosaKimberly De La Rosa

Taylor PearsonTaylor Pearson