Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Timeline
Generic

ARUN KUSHWAH

San Francisco,CA

Summary

A highly skilled and results-driven Solution Architect and Lead Data Engineer with 9+years of experience in designing and implementing scalable data solutions, leveraging AWS and PySpark. Expert in architecting end-to-end cloud-based data platforms and driving complex data engineering projects. Proven ability to lead high-performing teams to deliver innovative solutions that meet business needs while ensuring high performance, scalability, and security. Passionate about driving innovation in the data engineering field by leveraging cutting-edge technologies, with a strong focus on business value, data integrity, and performance. Adept at collaborating with business and technical stakeholders to ensure the successful delivery of complex data-driven solutions.

Overview

10
10
years of professional experience
1
1
Certification

Work History

Technical Lead / Principal Software Engineer 2

Saama Technologies
07.2020 - Current
  • Client: Genentech
  • Technologies used: Pyspark, SQL, AWS, Python, Airflow, Jira, GitHub
  • Designing and deploying data architectures using AWS services such as S3, Redshift, EMR, Lambda, and Glue to support big data processing and analytics at scale.
  • Expertise in building and optimizing ETL pipelines, data lakes, and real-time data processing systems using PySpark and AWS Glue.
  • Interact and collaborate with multiple teams such as Data Intelligence, Platform, QA, and Business taking ownership of the entire lifecycle of data from requirements gathering to end use case
  • Leading a team of 8-10 associates – Planning the project timelines & managing their day to day tasks to ensure efficient delivery
  • Enhanced system performance with thorough code reviews, debugging, and optimization techniques.
  • Worked in an Agile environment with daily scrums and sprint planning

Senior Software Engineer

R Systems International
01.2020 - 07.2020
  • Client: VF (Retail)
  • Technologies used: AWS (Lambda, S3, Cloudwatch, CodeCommit, CodBuilder, DynamoDB, Glue), PySpark
  • Developed Lambda function (Server less) for validating file like json, XML
  • Connect Lambda function with Trigger-based functionality so jobs can run automatically when certain event occurred
  • Designed and developed CI/CD Pipeline for code migration on AWS
  • Worked on Code Commit, and Code Builder to make the process generalize
  • Worked on ETL pipelines to transform data in various formats, such as CSV, JSON, and Parquet, using PySpark SQL and PySpark DataFrame API
  • Contributed to build a robust repository of commonly used metrics called as Cookbook, which was used by multiple verticals to implement standard metrics in just few lines

Data Analyst

Pentation Analytics Pvt. Ltd
01.2018 - 07.2019
  • Technologies used: Cloudera, Impala, Hue, Hive, Sqoop, Spark, Pyspark, Python, Pentaho, Oracle PL/SQL
  • Requirement gathering across various sources to understand the data and scale for cluster setup
  • Integrated multiple sources of disparate data into cohesive datasets using ETL processes, improving overall analytic capabilities.
  • Designed pipelines to pull data from various different Relational databases into BDL
  • Designed and scheduled various jobs using Pentaho ETL tool and Sqoop for daily incremental data pull
  • Periodic data validation to maintain integrity
  • Create user roles and provide secure access to users
  • Regulatory Reporting and Automated Data Flow (ADF)
  • Automating Regulatory Reports (returns) being filed to the Reserve Bank of India
  • Understanding business requirements from stakeholders and implementing solution that pushes data into ADF in the XBRL format stated by RBI to remove manual intervention
  • Developing completely automated workflows for reports/use-cases being handled manually earlier saving ~2000 Man-hours/month worth of manual effort
  • Customer Dedupe Engine across applications
  • Created a customer repository base from customers across various applications of the bank
  • Created a consolidated Negative List data repository to check the new customer against
  • Designed an algorithm that scores based on customer details to identify if it is an existing customer across same or different application to reduce redundancy and save time in customer onboarding

Hadoop Developer

Hack Planet Technologies Private Limited
07.2015 - 01.2018
  • Technologies used: Hadoop, Hive, Sqoop, PySpark
  • The purpose of this project to load and process the huge Volume of structured transaction data from RDBMS system to HDFS file system and present the processed report in a more visual manner which will help the Management people to find the top insured customers of the month and offer them with more policies of purchase and many more which aimed to increase their product sales in the current market
  • Developed Sqoop Scripts to create incremental dataflow between RDBMS tables and Hive relations
  • Created Hive tables to load large sets of structured data coming from SQLServer
  • Developed and worked Sqoop scripts with incremental load to populate Hive External tables

Education

B.Tech - Computer Science & Engineering

ITM UNIVERSITY
06-2015

Skills

  • AWS Spark Pyspark Python Glue Athena SQL Snowflake Jira Airflow Cloudera Impala Hadoop Sqoop Hive GitHub CI/CD Docker Pentaho Big Data Data Engineering Excellent communication and relationship management skills

Certification


  • AWS Certified Solution Architect Associate
  • Snow Flake - Snow Pro Core Cetrified
  • Jun 2023 : Astronomer certification : Apache Airflow fundamentals

Accomplishments

  • Jan 2022 : Saama Execution excellence Award
  • Q3 2022 : Saama Value Champion Award
  • Jan 2023 : Saama Execution excellence Award

Timeline

Technical Lead / Principal Software Engineer 2

Saama Technologies
07.2020 - Current

Senior Software Engineer

R Systems International
01.2020 - 07.2020

Data Analyst

Pentation Analytics Pvt. Ltd
01.2018 - 07.2019

Hadoop Developer

Hack Planet Technologies Private Limited
07.2015 - 01.2018

B.Tech - Computer Science & Engineering

ITM UNIVERSITY
ARUN KUSHWAH