Summary
Overview
Work History
Education
Skills
LINKS
Timeline
Generic

ARJUN SHAH

MILPITAS,US

Summary

Results-driven data engineering professional with robust background in data architecture and pipeline development. Proven ability to streamline data processes and enhance data integrity through innovative solutions. Demonstrates advanced proficiency in SQL and Python, leveraging these skills to support cross-functional teams and drive data-driven decision-making.

Overview

9
9
years of professional experience

Work History

Senior Staff Data Engineer

Bill.com
09.2018 - Current
  • Among 20 (out of 1000+) engineers selected across the organization to build Flink applications for the Agentic AI initiative, delivering real-time, sanitized, and enriched Kafka topics.
  • Leading and mentoring a team of 2 junior data engineers, providing guidance on ETL pipeline development, data modeling, and performance optimization, ensuring timely project delivery and adherence to best practices
  • Built a Datalake in s3 considering best practices (columnar storage, compressed, partitioned, consolidation of files)
  • Migrated various on-prem data sources to Datalake using Glue & use Redshift Spectrum/Athena to query
  • Completed building the pipeline by pushing data to Redshift and exposing various visualization tools on top
  • Built Splunk dashboards to visualize multiple (more than 50) Glue Jobs on a daily basis
  • Built Cloudability dashboards to monitor cost for different AWS services
  • Built Data Pipelines for the 3rd Party data sources like Zendesk, Mixpanel, Salesforce and Marketo using Glue Python Shell for Extract and Glue Spark Jobs for Transform
  • Developed Gitlab pipelines for CICD
  • Designed and implemented complex dimensional and fact Datawarehouse tables using pyspark
  • Designed and developed real-time pipelines using Kinesis Streams and Kinesis Firehose
  • Architected and developed dbt with MWAA pipeline along with Gitlab CICD (dbt-redshift and dbt-trino)
  • Transitioned from Delta lake format to Iceberg with Glue & EMR for the existing Datalake
  • Developed Flink applications with various source and sink services Kinesis, Kafka, Iceberg

Data Engineer

Shutterfly
02.2016 - 09.2018
  • Design data models for managing heterogeneous data sets using Hadoop
  • Migrated data pipelines running on Hadoop cluster in data center to AWS Cloud using AWS services like EMR, Kinesis streams and Kinesis Firehose
  • Estimated cost of various AWS services along with tagging strategies which has been important part of design and decision making
  • Followed best practices to build a Datalake in s3 (columnar storage, compressed, partitioned, consolidation of files)

Data Warehouse Intern

Shutterfly
12.2015
  • Support and create new data management methods
  • Implemented a recommendation engine to predict products based on historical data using classification

Education

M.S - Computer Science

San Jose State University
San Jose, CA
05.2014

B.E. - Computer Science Engineering

San Jose State University
San Jose, CA
06.2011

Skills

  • Python, Spark, Oracle, SQL, MySQL, PostgreSQL, AWS (Glue, EMR, S3, Lambda, ECS, Bedrock, Kinesis, IAM, Flink, MWAA, Athena), Iceberg, Kafka, dbt, Gitlab, Neo4j, Windows, Unix, Linux, Mac, Starburst, Hadoop, Splunk, Datadog, Tableau

LINKS

LinkedIn: https://www.linkedin.com/in/arjunsshah/

Timeline

Senior Staff Data Engineer

Bill.com
09.2018 - Current

Data Engineer

Shutterfly
02.2016 - 09.2018

Data Warehouse Intern

Shutterfly
12.2015

B.E. - Computer Science Engineering

San Jose State University

M.S - Computer Science

San Jose State University
ARJUN SHAH