Overview
Work History
Education
Skills
Awards
Timeline
Generic

VAMSI KRISHNA BHASHYAM

Austin,TX

Overview

7
7
years of professional experience

Work History

Cloud Data Engineer

JPMorgan Chase
04.2024 - Current
  • Responsible for migrating on-prem Data Lake to AWS Cloud S3 backed Data Lake
  • Responsible for building end to end data pipelines in cloud infrastructure
  • Responsible for fine-tuning, troubleshooting, and supporting the enterprise data pipelines at production scale
  • Written Python-based Spark applications for performing various data transformations, and other custom event processing
  • Involved in data cleansing, event enrichment, data aggregation, and data preparation needed for machine learning and reporting
  • We used Spark-SQL to read data from hive tables and perform various data cleansing, data validations, transformations, and aggregations as per downstream business team requirements
  • Deployed to Kubernetes, Created Pods, and managed using Kubernetes
  • Used Build Automation pipelines to drive all microservices builds out to the Docker registry in AWS
  • Automated resulting scripts and workflow using Airflow orchestration and shell scripting to ensure daily execution in production
  • Involved in continuous Integration of applications using Jenkins
  • Responsible for loading processed data to the Data Warehousing table to allow the Business reporting team to build dashboards
  • Work with cross functional teams within the data science, software engineering and analytics team to design, develop and execute solutions to derive business insights and solve client’s operational and strategic problems
  • Worked on data visualization and analytics with research scientists and business stakeholders
  • Superior communication skills, strong decision making and organizational skills along with outstanding analytical and problem-solving skills to undertake challenging jobs
  • Environment: Spark, Kafka, AWS S3, EMR, Redshift, Hive, Snowflake, EC2, Airflow, Jenkins
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
  • Designed scalable and maintainable data models to support business intelligence initiatives and reporting needs.
  • .Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
  • Evaluated various tools, technologies, and best practices for potential adoption in the company''s data engineering processes.

Data Engineer

Smart and Final
08.2023 - 03.2024
  • Responsible for building end-to-end data pipelines in Azure cloud infrastructure, ensuring efficient data handling and processing
  • Developed python-based spark applications for data transformations and event processing, contributing to the refinement of data analytics and reporting capabilities
  • Successfully designed, developed, and maintained complex data pipelines, including a 650 TB migration using Azure Data Factory, enhancing system reliability and integrity
  • Experience in Azure cloud platform, managing virtual networks and VM’s, Databricks, and optimizing cloud infrastructure for data engineering tasks
  • Skilled in automating cloud infrastructure with ARM templates for Function apps, Key-Vaults, Virtual networks etc
  • Embodying the principle of full ownership in build and deployment processes
  • Led the development and implementation of continuous integration and deployment pipelines, incorporating Git Action workflows for automated deployment of infrastructure and applications
  • Implemented cost-saving strategies in data storage management, transitioning between hot, cold, and archive tiers, resulting in significant savings (approximately $200K)
  • Diagnose and resolve production issues and resource utilization, improving performance and costefficiency
  • Created customer-focused data dashboards for analytics and monitoring, utilizing Python scripting for effective data integration and sharing between ADLS and Snowflake
  • Composed and maintained comprehensive documentation and deployment guides to streamline and standardize the build and release procedures, ensuring best practices and team alignment
  • Environment: Spark, Azure, ADF, Function apps, Event-hubs, SQL, VM, Databricks, Git-Action, Snowflake, ADLS

Big Data Developer

Apple
03.2022 - 08.2023
  • Responsible for building end to end data pipelines in cloud infrastructure
  • Handled large datasets of structured, semi structured, and unstructured data using Hadoop/bigdata concepts
  • Written Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to No-SQl DB
  • Troubleshooting Spark applications for improved error tolerance and reliability
  • Involved in creating external Hive tables from the files stored in the S3
  • Optimized Hive tables utilizing partitions and bucketing to give better execution Hive QL queries
  • Fine-tuning spark applications/jobs to improve the efficiency and overall processing time for the pipelines
  • Utilized Spark in Memory capabilities, to handle large datasets
  • Implemented data quality checks using spark and arranged bad and passable flags on the data
  • Followed Agile Methodologies while working on the project
  • Worked with Version control for source code management, build automations for continuous integration & Crucible for code reviews
  • Designed, documented operational problems by following standards and procedures using JIRA
  • Environment: Spark, Kafka, AWS S3, EMR, Pyspark, Athena, Hive, Snowflake, EC2, Airflow, Jenkins, Docker, GIT

Cloud Data Engineer

T - Mobile
03.2021 - 02.2022
  • Utilized AWS to aggregate clean files in Amazon S3 and deployed files into Buckets via Amazon EC2 Clusters
  • Developed a data pipeline on AWS to extract data from weblogs and store it in HDFS and migrated data from AWS S3 to HDFS using Kafka
  • Designed a Data Quality Framework for schema validation and data profiling using Spark (PySpark)
  • Employed PySpark-SQL to load JSON data, create schema RDDs and DataFrames, and integrate it into Hive Tables, managing structured data with Spark-SQL
  • Created views and templates with Python and Django’s view controller and templating language, employing MVC architecture to deliver a user-friendly interface
  • Developed ETL/ELT pipelines using data technologies such as PySpark, Hive, Presto, and Databricks
  • Applied best practices in data architecture, integration, and governance, including Data Catalogs, Governance frameworks, Metadata management, and Data Quality solutions
  • Successfully implemented ETL solutions between OLTP and OLAP databases to support Decision Support Systems, with expertise across all SDLC phases
  • Created Python scripts for managing AWS resources via Boto3 SDK and AWS CLI, and established CI/CD pipelines using Maven, GitHub, and AWS
  • Specialized in real-time processing and core job development with Kafka and Spark Streaming and developed UNIX shell scripts for parameterizing Sqoop and Hive jobs
  • Extensively imported metadata into Hive using Python and migrated existing tables and applications to AWS
  • Environment: Spark, Kafka, AWS S3, EMR, Redshift, Hive, Snowflake, EC2, Airflow, Jenkins

Database Engineer

Squircle
03.2018 - 12.2020
  • Gather requirements for change requests with the Business team and create design documents
  • Created database objects like Tables, Views, Sequences, Synonyms, DB Links, Stored Procedures, Functions, Packages, Cursor, Ref Cursor and Triggers
  • Wrote complex SQL Statements, Complex Joins, Co-related Sub-queries, and SQL Statements with Analytical Functions
  • Effectively made use of Table Functions, generated columns, Indexes, Table Partitioning, Collections, and Materialized Views
  • Used Ref Cursors, Indexes, Joins and Exceptions extensively in coding
  • Tuning of the SQL queries, which takes long time to process the request using Explain Plan, Hints to reduce the response time
  • Performed SQL and PL/SQL tuning using tools like EXPLAIN PLAN, SQL
  • TRACE
  • Extensively used Oracle Hints to direct the optimizer to choose an optimum query Execution Plan
  • Extensively used Bulk Collection in PL/SQL Objects for improving the performance
  • Handled errors using Exception Handling extensively for debugging and maintainability
  • Automated Oracle execution using Unix Cron Utility in Unix Environment
  • Responsible for writing Unix Shell scripts for loading data using SQL
  • Loader
  • The Control Files for the tables were created and automated through UNIX shell scripts to perform data load into Oracle tables
  • Used SQL Loader and PL/SQL scripts to load data into the system application

Education

MS - Computer Science

Lindsey Wilson College
05.2022

Bs - B.com(Computers)

Satyabhama University
05.2018

Skills

  • Hadoop
  • Hive
  • Spark
  • Map Reduce
  • Sqoop
  • Python
  • SQL
  • Java
  • Scala
  • Bash
  • PyCharm
  • Tableau
  • Docker
  • Airflow
  • Jenkins
  • Eclipse
  • Git
  • JIRA
  • Oracle
  • MySQL
  • SQL Server
  • Sybase
  • MongoDB
  • Redshift
  • HBase
  • Maven
  • Gradle
  • AWS
  • Azure
  • Snowflake
  • Data Lake

Awards

Third Place Award, Lindsey Wilson College - KY HACK-A-LWC coding competition, Spring 2021

Timeline

Cloud Data Engineer

JPMorgan Chase
04.2024 - Current

Data Engineer

Smart and Final
08.2023 - 03.2024

Big Data Developer

Apple
03.2022 - 08.2023

Cloud Data Engineer

T - Mobile
03.2021 - 02.2022

Database Engineer

Squircle
03.2018 - 12.2020

Bs - B.com(Computers)

Satyabhama University

MS - Computer Science

Lindsey Wilson College
VAMSI KRISHNA BHASHYAM