Passionate towards building data products, tools and pipeline for real time and low latency needs. Value efficiency, milestone driven and building framework balancing short and long term use cases
Building data ingestion high volume data from kafka into clickhouse taking processing away from snowflake, enabling merge/upsert and aggregate processing for batch and realtime used cases
Engineered realtime rideshare metric dashboards streaming app and graphql based noun events and dynamo db cdc data into clickhouse with flink
Built flink app with light weight transformations streaming from app events writing into s3 to allow offline query using presto/hive/spark etc including automated schema evolution and partition handling
Designed and built low latency framework for upserts , compacting data based on the pre-defined keys
Designed highly efficient version data framework for building snapshots of transaction and dimensional data
Built a framework for downloading data from 3rd party API's like google, stripe, ringcentral with oauth2
Built a EMR webapp used by other engineers to hit AWS API's to smartly use the EMR cluster to create cluster, run job's terminate cluster for the ETL/Analytics pipelines
Built data pipeline for loading app events using Spark/Hive using EMR
Built webapp to archive source data into S3
Lead and Engineered product LISTT for linkedin member segmentation on various attributes dimensional and metrics sourced from hadoop used as search engine by analytical and marketing teams
Built a framework gator for developing a snapshot and rolling window aggregate framework.This framework is used in computing aggregates in a incremental fashion.
Designed and developed the data processing
linkedin.com’s impressions and page view data for Linkedin executive team using Hadoop MR
Built events pipeline for lead generation product using Hadoop Mapreduce programs.
Develop real time and intuitive analytical decisions by running hive queries on the HDFS files to facilitate quick decisions in data scrubbing/ pixel fire, publisher form allocation based on the visitor traffic quality, prior UVC data by publisher, form fields and lead feedback data
Java, Python, Go,
undefined