Summary
Overview
Work History
Education
Skills
Timeline
Hi, I’m

Aayush K

Cloud Data Engineer
Plano,TX
Aayush K

Summary

Experienced Data Engineer with 7 years in IT Industry. Proficient in designing, developing, and implementing data models for enterprise applications. Skilled in Data Lake, Data Warehousing, ETL pipelines & Data Visualization. Strong expertise in Cloud platforms - AWS, Azure, including services like EC2, S3, EMR, Redshift, IAM, Glue, and more. Hands-on with Azure Data Factory, Blob Storage, Databricks, and Snowflake database. Built pipelines, automated workflows, and utilized CI/CD tools. Proficient in SQL Python, JIRA, and other business intelligence tools. Seeking a contract position to leverage skills in cloud monitoring, deployment, and problem-solving for professional challenges. Accomplished engineer with leadership acumen.

Overview

8
years of professional experience
5
years of post-secondary education

Work History

Texas Health and Human Services Commission
Austin

AWS Data Engineer
10.2022 - Current

Job overview

  • I am responsible for managing and operating batch data pipelines on AWS EMR
  • My tasks involved using PySpark scripts to transform and load data into an S3 Data Lake, sourced from business application APIs and logs
  • Responsibilities-
  • Utilized AWS Kinesis to extract data from API gateways and third-party sources and leveraged AWS Kinesis Firehose to load the extracted data into the S3 storage
  • Wrote Kafka producers to stream data from external REST APIs to Kafka topics
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and wrote Spark Python code to process the stream
  • Implemented automated triggers using AWS Lambda to streamline workflow processes
  • I strategized, developed, and validated ETL processes within AWS Glue to seamlessly transfer Campaign data from external sources, such as ORC/Parquet Files, to AWS Redshift
  • Enabled Amazon SNS on EMR and Glue to receive notifications regarding pipeline executions and failures
  • Worked on multi-tier applications using AWS services (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM)
  • Created redshift steaming tables in AWS also created and managed S3 buckets using AWS GUI/CLI
  • Migrated data from object storage (S3) to AWS redshift using copy commands
  • Created Py-spark scripts to move data from S3 into data bricks tables
  • Leveraged AWS Athena for querying S3 data by utilizing AWS Glue Crawler and AWS Glue catalog for metadata management
  • Focused on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
  • Utilized CloudWatch for pipeline monitoring and configured alarms for efficient monitoring
  • Created insightful visualizations and dashboards using AWS Quick Sight
  • Developed PySpark jobs using Python in the test environment for faster data processing and used Spark SQL for querying
  • Automated CI/CD data pipelines using AWS Code Pipeline and deployed code into EC2 instances using AWS Code Deploy
  • Monitored and reported any unusual activities or security incidents related to data vaults, utilizing AWS CloudTrail and
  • CloudWatch to detect and respond to potential security breaches
  • Proficiently used GitHub for version control and JIRA for tracking and updating Epic tasks throughout each sprint, ensuring project milestones were met
  • Created snowflake stages (concerning every public cloud) and used copy commands to migrate data from object storage to the snowflake data warehouse.

Disney Streaming Services
Seattle

AWS Data Engineer
04.2022 - 09.2022

Job overview

  • Performed configuration, deployment, and support of cloud services in Amazon Web Services (AWS)
  • Used Airflow to build a task orchestrator on AWS that can schedule jobs in data pipeline
  • Worked on data pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity
  • Managed to create tables, materialized views and secured procedures in snowflake and redshift
  • Created Snowflake authorized views for exposing data to other teams
  • Understanding of CI/CD principles, familiar with version control systems (Git)
  • Managed to create tables, materialized views and secured procedures in snowflake and redshift
  • Developed PySpark Streaming by consuming static and streaming data from different sources
  • Involved in designing, Development, and deployment of complex SQL queries with respect to snowflake using GitLab
  • Strong understanding of the principles of Data Warehousing concepts using facts table, dimension tables, and Star/Snowflake
  • Schema Modelling
  • Business knowledge on functioning Disney Streaming Services
  • Creating spark clusters and configuring high concurrence clusters using Databricks to speed up the preparation of high-quality data
  • Updated documentations like unit testing documents, and knowledge base documents for Disney Streaming Services
  • Developed Airflow DAGs in python by importing the Airflow libraries and utilized Airflow to schedule automatically trigger and execute data ingestion pipeline
  • Created and managed looker dashboards also created data visuals for data science team.

TDS Telecom
, Wisconsin

Data Engineer
03.2021 - 03.2022

Job overview

  • Archived large, unnecessary data stored in S3 buckets to Deep Glacier for cost optimization
  • Implemented a centralized data lake on Amazon S3, utilizing Glue Crawler to crawl and organize datasets stored in S3
  • Built ETL pipelines to extract unstructured data from DynamoDB, transform it into structured data, and store it in RDS
  • Worked on creating data pipelines with Airflow to schedule PySpark jobs for performing incremental loads and used Flume for weblog server data
  • Created Airflow Scheduling scripts in Python
  • Created PySpark code that uses Spark SQL to generate data frames from Avro formatted raw layer and writes them to data service layer internal tables as orc format
  • Monitored service logs and set up alarms using Amazon CloudWatch
  • Executed SQL queries in Amazon Redshift to fulfill data analysis requirements
  • Developed parallel reports using SQL and Python to validate the daily, monthly, and quarterly reports
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala, and Hive to perform Streaming
  • ETL
  • Leveraged GitHub for version control and implemented CI/CD scripts for seamless migration to Snowflake
  • Utilized GitLab to automate CI/CD scripts and schedule background jobs, ensuring efficient and streamlined processes
  • Actively participated in Agile methodologies, creating, and updating user stories for each sprint.

CTDI
, Texas

Big Data Engineer
07.2020 - 12.2020

Job overview

  • Created multiple data processing tasks using PySpark that included reading data from external sources, merge data, perform data enrichment and load in to target data destinations
  • Processed Kafka streams using PySpark on Databricks and saved processed data to Synapse Analytics
  • Used Spark-SQL to load JSON data and create Schema RDD and loaded it into Hive Tables & Cassandra
  • Participated in documenting Data Migration & Pipeline for smooth transfer of project from development to testing environment and then moving the code to production
  • Working experience with data streaming process with Kafka, Apache Spark, Hive
  • Experienced in handling large datasets using partitions, Spark in-memory capabilities, Broadcasts in Spark, effective & efficient
  • Joins, Transformation and other during ingestion process itself
  • Worked on tuning PySpark applications to set Batch Interval time, level of Parallelism and memory tuning
  • Implemented near-real time data processing using Stream Sets and Spark framework
  • Implemented simple to complex transformation on Streaming Data and Datasets
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, Spark, Python, Sqoop, Oozie
  • Build a program with Python and execute it in EMR to run Data validation between raw source file and Snowflake target tables
  • Coordinated with team and developed framework to generate Daily ad hoc reports and extract data from various enterprise servers using PySpark
  • Data Cleaning, Data pre-processing and generating new data using R Studio.

McKesson
, Texas

Data Engineer
05.2019 - 06.2020

Job overview

  • Conducted data cleansing for unstructured dataset by applying Informatica Data Quality to identify potential errors and improve data integrity and data quality
  • Developed PL/SQL triggers and master tables for automatic creation of primary keys
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop
  • Worked on Hive queries and Python Spark SQL to create HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL databases, and a variety of portfolios
  • Loaded data into Spark RDD and in-memory data computation to generate the output response stored datasets into HDFS/
  • Amazon S3 storage/ relational databases
  • Worked on tuning Spark applications to set Batch Interval time, level of Parallelism and memory tuning
  • Implemented near-real time data processing using Stream Sets and Spark framework
  • Developed Apache Spark jobs using Python in the test environment for faster data processing and used Spark SQL for querying
  • Used Hadoop Spark Docker container for validating data load for test/ dev-environments
  • Implemented multiple generalized solution model using Google AutoML
  • Extensive expertise using the core Spark APIs and processing data on an Dataproc cluster
  • Worked with building SSH tunnel to Google DataProc to access to yarn manager to monitor spark jobs.

Inflexion Analytix Pvt. Ltd

Data Analyst
07.2016 - 06.2018

Job overview

  • Developing dashboards to track productivity and expedite remediation of issues
  • Write data definition language or data manipulation language SQL commands
  • Creating and executing queries utilizing various data sources to provide business information
  • Installing SQL Server DB and power BI, moved customer data given in CSV format into SQL Server DB
  • Developing an analytical product for warranty management using power BI, SQL Server, Azure, and power BI gateway
  • Worked on SQL queries to query the Repository DB to find the deviations from Company's ETL Standards for the objects created by users such as Sources, Targets, Transformations, Log Files, Mappings, Sessions and Workflows.

Dynamic IT Solutions Pvt. Ltd

Jr. Data Analyst
05.2015 - 07.2016

Job overview

  • Develop and manage data databases that support performance improvement
  • Develop and manage reports on multiple key performance indicators and metrics across Revenue Cycle Management
  • Develops and evaluates network performance criteria and measurement methods
  • Assist Managers in identifying capabilities and Processes that drive continuous improvement
  • Analyze our game data by cohort to provide suggestion to the marketing team to improve the performance of our acquisition campaign
  • Created complex SQL queries and scripts to extract, aggregate and validate data from MS SQL, Oracle, and flat files using
  • Informatica and loaded into a single data warehouse repository.

Education

The University of Texas At Dallas
Richardson, TX

Master of Science from Financial Mathematics (STEM)
08.2018 - 12.2020

University of Delhi
New Delhi, India

Bachelor of Commerce With Honors from Business Education
08.2013 - 05.2016

Skills

CloudFront, Route53, DynamoDB, Code Pipeline, EKS, Athena, Quick Sightundefined

Timeline

AWS Data Engineer

Texas Health and Human Services Commission
10.2022 - Current

AWS Data Engineer

Disney Streaming Services
04.2022 - 09.2022

Data Engineer

TDS Telecom
03.2021 - 03.2022

Big Data Engineer

CTDI
07.2020 - 12.2020

Data Engineer

McKesson
05.2019 - 06.2020

The University of Texas At Dallas

Master of Science from Financial Mathematics (STEM)
08.2018 - 12.2020

Data Analyst

Inflexion Analytix Pvt. Ltd
07.2016 - 06.2018

Jr. Data Analyst

Dynamic IT Solutions Pvt. Ltd
05.2015 - 07.2016

University of Delhi

Bachelor of Commerce With Honors from Business Education
08.2013 - 05.2016
Aayush KCloud Data Engineer