Summary
Overview
Work History
Education
Skills
Academic Projects
Timeline
Generic

Abhimanyu Abhinav

Seattle,USA

Summary

Accomplished Data Engineer with extensive experience at Amazon, specializing in AWS cloud services and big data processing. Successfully led data migrations and optimized ETL frameworks, achieving a 25% increase in query performance. Adept at analytical problem solving and developing secure data infrastructures, ensuring robust data management across diverse platforms.

Overview

9
9
years of professional experience

Work History

Data Engineer

Amazon
Seattle, USA
07.2021 - Current
  • Project NAWS: Led data migration effort from server hosted MySQL instance to AWS cloud along with BI applications
  • Optimized the migrated tables to improve on data skewness and data partitioning to fasten the BI application
  • Created an end-to-end ETL framework using AWS cloud services like S3, Redshift, Aurora MYSQL to move Lager TB size data across different data storage solutions
  • Migrated schemas of 8+ different teams from on-premise server hosted MYSQL Database to AWS Cloud MYSQL along with setting up infrastructure for secure access for users
  • Leveraged masking of IP addresses, configuring proper access for worldwide usage and setting up self - maintainable active user group
  • Worked developing a Database loader tool called Bigfoot to handle big data loads using AWS lambda, Glue and integrating it with scalable S3 service making the ETLs more seamless and efficient by reducing the data hops in the processes and having auto trigger loading
  • Configured monitoring controls and alarms for ETLs
  • Project Candidata: Revamped Data model to handle huge candidate data and their application activity
  • Leveraged AIRFLOW scheduling python ETLs catering to various ML models and BI applications
  • Collaborated with Data Science/ML teams to develop the data infrastructure servicing the ML processes
  • Redesigning and improving on existing data model leading to 25% increase in capturing web activity of a candidate and improving query performance up to 2X-6X along with reducing the scanned data size, thus improving the margins on our SLAs
  • Also, made dimension tables agnostic to source data to improve upon upstream failures
  • Used surrogate keys to mask the business logic from end users as part of making the data more secure and improving on confidentiality of PII data
  • Python ETL jobs to handle petabyte size batch data processing
  • Leveraged distributed processing in Spark and tuning the parameters to handle S3 throttling limitations from API consumption
  • Worked developing Data Quality checks for a data consumption framework using hashing for integrity check and row count checks for completeness

Data Engineer

Tata Consultancy Services
Mumbai, India
04.2016 - 05.2019
  • Scraping data from diverse platform’s and applying business logics and loading data to Snowflake (Cloud) database
  • Scripting Python programs to analyze, clean complex data from varied sources and extracted large datasets from PADB (Paracel DB) and load it in Snowflake cloud database via s3
  • Created facts and multiple dimension tables Incorporate the SCD to create the surrogate keys for the warehouse
  • Migrating close to 400 tables from existing platform to Snowflake improving query performance by 90% - 97% for medium to large data warehouses and reduced the query failures to zero for 50 concurrent users spanning across 300 queries

Education

Master’s Degree - Computer Science

Clemson University
05.2021

Bachelor’s Degree - Engineering (Information Technology)

Oriental College of Technology
05.2015

Skills

  • AWS cloud services
  • SPARK
  • Python, SQL, Scala
  • Data processing frameworks
  • Workflow orchestration
  • Data modeling
  • Data warehousing
  • Relational databases
  • NoSQL databases
  • Cloud data platforms
  • Distributed processing, AWS Glue, EMR
  • Scripting and automation
  • Analytical problem solving
  • Continuous integration and delivery
  • Version control systems
  • Infrastructure as code

Academic Projects

Worked on a developed novel algorithm which finds an exponential and logistic pattern in big data for improving its execution time and efficiency and benchmarking it on distinct computation platforms such as (GCP,AWS and Apache Spark/MapReduce framework). Implemented sampling with MapReduce to reduce execution time. Predicted exponential patterns drawn by the algorithm for a larger dataset based on coefficients calculated by the exponential function.

Timeline

Data Engineer

Amazon
07.2021 - Current

Data Engineer

Tata Consultancy Services
04.2016 - 05.2019

Master’s Degree - Computer Science

Clemson University

Bachelor’s Degree - Engineering (Information Technology)

Oriental College of Technology
Abhimanyu Abhinav