Summary

Overview

Work History

Education

Skills

LINKS

Timeline

ARJUN SHAH

MILPITAS,US

Summary

Results-driven data engineering professional with robust background in data architecture and pipeline development. Proven ability to streamline data processes and enhance data integrity through innovative solutions. Demonstrates advanced proficiency in SQL and Python, leveraging these skills to support cross-functional teams and drive data-driven decision-making.

Overview

years of professional experience

Work History

Senior Staff Data Engineer

Bill.com

09.2018 - Current

Among 20 (out of 1000+) engineers selected across the organization to build Flink applications for the Agentic AI initiative, delivering real-time, sanitized, and enriched Kafka topics.
Leading and mentoring a team of 2 junior data engineers, providing guidance on ETL pipeline development, data modeling, and performance optimization, ensuring timely project delivery and adherence to best practices
Built a Datalake in s3 considering best practices (columnar storage, compressed, partitioned, consolidation of files)
Migrated various on-prem data sources to Datalake using Glue & use Redshift Spectrum/Athena to query
Completed building the pipeline by pushing data to Redshift and exposing various visualization tools on top
Built Splunk dashboards to visualize multiple (more than 50) Glue Jobs on a daily basis
Built Cloudability dashboards to monitor cost for different AWS services
Built Data Pipelines for the 3rd Party data sources like Zendesk, Mixpanel, Salesforce and Marketo using Glue Python Shell for Extract and Glue Spark Jobs for Transform
Developed Gitlab pipelines for CICD
Designed and implemented complex dimensional and fact Datawarehouse tables using pyspark
Designed and developed real-time pipelines using Kinesis Streams and Kinesis Firehose
Architected and developed dbt with MWAA pipeline along with Gitlab CICD (dbt-redshift and dbt-trino)
Transitioned from Delta lake format to Iceberg with Glue & EMR for the existing Datalake
Developed Flink applications with various source and sink services Kinesis, Kafka, Iceberg

Data Engineer

Shutterfly

02.2016 - 09.2018

Design data models for managing heterogeneous data sets using Hadoop
Migrated data pipelines running on Hadoop cluster in data center to AWS Cloud using AWS services like EMR, Kinesis streams and Kinesis Firehose
Estimated cost of various AWS services along with tagging strategies which has been important part of design and decision making
Followed best practices to build a Datalake in s3 (columnar storage, compressed, partitioned, consolidation of files)

Data Warehouse Intern

Shutterfly

12.2015

Support and create new data management methods
Implemented a recommendation engine to predict products based on historical data using classification

Education

M.S - Computer Science

San Jose State University

San Jose, CA

05.2014

B.E. - Computer Science Engineering

San Jose State University

San Jose, CA

06.2011

Skills

Python, Spark, Oracle, SQL, MySQL, PostgreSQL, AWS (Glue, EMR, S3, Lambda, ECS, Bedrock, Kinesis, IAM, Flink, MWAA, Athena), Iceberg, Kafka, dbt, Gitlab, Neo4j, Windows, Unix, Linux, Mac, Starburst, Hadoop, Splunk, Datadog, Tableau

LINKS

LinkedIn: https://www.linkedin.com/in/arjunsshah/

Timeline

Senior Staff Data Engineer

Bill.com

09.2018 - Current

Data Engineer

Shutterfly

02.2016 - 09.2018

Data Warehouse Intern

Shutterfly

12.2015

B.E. - Computer Science Engineering

San Jose State University

M.S - Computer Science

San Jose State University

ARJUN SHAH

Summary

Overview

Work History

Senior Staff Data Engineer

Data Engineer

Data Warehouse Intern

Education

M.S - Computer Science

B.E. - Computer Science Engineering

Skills

LINKS

Timeline

Senior Staff Data Engineer

Data Engineer

Data Warehouse Intern

B.E. - Computer Science Engineering

M.S - Computer Science

Similar Profiles

Psyterica MorrisPsyterica Morris

Jeff MitchellJeff Mitchell

Daniel KluckDaniel Kluck

DOMINIQUE DUGASDOMINIQUE DUGAS

Pranav PingaliPranav Pingali