Summary

Overview

Work History

Education

Skills

Timeline

Hi, I’m

Aayush K

Cloud Data Engineer

Plano,TX

Summary

Experienced Data Engineer with 7 years in IT Industry. Proficient in designing, developing, and implementing data models for enterprise applications. Skilled in Data Lake, Data Warehousing, ETL pipelines & Data Visualization. Strong expertise in Cloud platforms - AWS, Azure, including services like EC2, S3, EMR, Redshift, IAM, Glue, and more. Hands-on with Azure Data Factory, Blob Storage, Databricks, and Snowflake database. Built pipelines, automated workflows, and utilized CI/CD tools. Proficient in SQL Python, JIRA, and other business intelligence tools. Seeking a contract position to leverage skills in cloud monitoring, deployment, and problem-solving for professional challenges. Accomplished engineer with leadership acumen.

Overview

years of professional experience

years of post-secondary education

Work History

Texas Health and Human Services Commission
Austin

AWS Data Engineer

10.2022 - Current

Job overview

I am responsible for managing and operating batch data pipelines on AWS EMR
My tasks involved using PySpark scripts to transform and load data into an S3 Data Lake, sourced from business application APIs and logs
Responsibilities-
Utilized AWS Kinesis to extract data from API gateways and third-party sources and leveraged AWS Kinesis Firehose to load the extracted data into the S3 storage
Wrote Kafka producers to stream data from external REST APIs to Kafka topics
Wrote Spark-Streaming applications to consume the data from Kafka topics and wrote Spark Python code to process the stream
Implemented automated triggers using AWS Lambda to streamline workflow processes
I strategized, developed, and validated ETL processes within AWS Glue to seamlessly transfer Campaign data from external sources, such as ORC/Parquet Files, to AWS Redshift
Enabled Amazon SNS on EMR and Glue to receive notifications regarding pipeline executions and failures
Worked on multi-tier applications using AWS services (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM)
Created redshift steaming tables in AWS also created and managed S3 buckets using AWS GUI/CLI
Migrated data from object storage (S3) to AWS redshift using copy commands
Created Py-spark scripts to move data from S3 into data bricks tables
Leveraged AWS Athena for querying S3 data by utilizing AWS Glue Crawler and AWS Glue catalog for metadata management
Focused on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
Utilized CloudWatch for pipeline monitoring and configured alarms for efficient monitoring
Created insightful visualizations and dashboards using AWS Quick Sight
Developed PySpark jobs using Python in the test environment for faster data processing and used Spark SQL for querying
Automated CI/CD data pipelines using AWS Code Pipeline and deployed code into EC2 instances using AWS Code Deploy
Monitored and reported any unusual activities or security incidents related to data vaults, utilizing AWS CloudTrail and
CloudWatch to detect and respond to potential security breaches
Proficiently used GitHub for version control and JIRA for tracking and updating Epic tasks throughout each sprint, ensuring project milestones were met
Created snowflake stages (concerning every public cloud) and used copy commands to migrate data from object storage to the snowflake data warehouse.

Disney Streaming Services
Seattle

AWS Data Engineer

04.2022 - 09.2022

Job overview

Performed configuration, deployment, and support of cloud services in Amazon Web Services (AWS)
Used Airflow to build a task orchestrator on AWS that can schedule jobs in data pipeline
Worked on data pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity
Managed to create tables, materialized views and secured procedures in snowflake and redshift
Created Snowflake authorized views for exposing data to other teams
Understanding of CI/CD principles, familiar with version control systems (Git)
Managed to create tables, materialized views and secured procedures in snowflake and redshift
Developed PySpark Streaming by consuming static and streaming data from different sources
Involved in designing, Development, and deployment of complex SQL queries with respect to snowflake using GitLab
Strong understanding of the principles of Data Warehousing concepts using facts table, dimension tables, and Star/Snowflake
Schema Modelling
Business knowledge on functioning Disney Streaming Services
Creating spark clusters and configuring high concurrence clusters using Databricks to speed up the preparation of high-quality data
Updated documentations like unit testing documents, and knowledge base documents for Disney Streaming Services
Developed Airflow DAGs in python by importing the Airflow libraries and utilized Airflow to schedule automatically trigger and execute data ingestion pipeline
Created and managed looker dashboards also created data visuals for data science team.

TDS Telecom
, Wisconsin

Data Engineer

03.2021 - 03.2022

Job overview

Archived large, unnecessary data stored in S3 buckets to Deep Glacier for cost optimization
Implemented a centralized data lake on Amazon S3, utilizing Glue Crawler to crawl and organize datasets stored in S3
Built ETL pipelines to extract unstructured data from DynamoDB, transform it into structured data, and store it in RDS
Worked on creating data pipelines with Airflow to schedule PySpark jobs for performing incremental loads and used Flume for weblog server data
Created Airflow Scheduling scripts in Python
Created PySpark code that uses Spark SQL to generate data frames from Avro formatted raw layer and writes them to data service layer internal tables as orc format
Monitored service logs and set up alarms using Amazon CloudWatch
Executed SQL queries in Amazon Redshift to fulfill data analysis requirements
Developed parallel reports using SQL and Python to validate the daily, monthly, and quarterly reports
Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala, and Hive to perform Streaming
ETL
Leveraged GitHub for version control and implemented CI/CD scripts for seamless migration to Snowflake
Utilized GitLab to automate CI/CD scripts and schedule background jobs, ensuring efficient and streamlined processes
Actively participated in Agile methodologies, creating, and updating user stories for each sprint.

CTDI
, Texas

Big Data Engineer

07.2020 - 12.2020

Job overview

Created multiple data processing tasks using PySpark that included reading data from external sources, merge data, perform data enrichment and load in to target data destinations
Processed Kafka streams using PySpark on Databricks and saved processed data to Synapse Analytics
Used Spark-SQL to load JSON data and create Schema RDD and loaded it into Hive Tables & Cassandra
Participated in documenting Data Migration & Pipeline for smooth transfer of project from development to testing environment and then moving the code to production
Working experience with data streaming process with Kafka, Apache Spark, Hive
Experienced in handling large datasets using partitions, Spark in-memory capabilities, Broadcasts in Spark, effective & efficient
Joins, Transformation and other during ingestion process itself
Worked on tuning PySpark applications to set Batch Interval time, level of Parallelism and memory tuning
Implemented near-real time data processing using Stream Sets and Spark framework
Implemented simple to complex transformation on Streaming Data and Datasets
Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, Spark, Python, Sqoop, Oozie
Build a program with Python and execute it in EMR to run Data validation between raw source file and Snowflake target tables
Coordinated with team and developed framework to generate Daily ad hoc reports and extract data from various enterprise servers using PySpark
Data Cleaning, Data pre-processing and generating new data using R Studio.

McKesson
, Texas

Data Engineer

05.2019 - 06.2020

Job overview

Conducted data cleansing for unstructured dataset by applying Informatica Data Quality to identify potential errors and improve data integrity and data quality
Developed PL/SQL triggers and master tables for automatic creation of primary keys
Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop
Worked on Hive queries and Python Spark SQL to create HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL databases, and a variety of portfolios
Loaded data into Spark RDD and in-memory data computation to generate the output response stored datasets into HDFS/
Amazon S3 storage/ relational databases
Worked on tuning Spark applications to set Batch Interval time, level of Parallelism and memory tuning
Implemented near-real time data processing using Stream Sets and Spark framework
Developed Apache Spark jobs using Python in the test environment for faster data processing and used Spark SQL for querying
Used Hadoop Spark Docker container for validating data load for test/ dev-environments
Implemented multiple generalized solution model using Google AutoML
Extensive expertise using the core Spark APIs and processing data on an Dataproc cluster
Worked with building SSH tunnel to Google DataProc to access to yarn manager to monitor spark jobs.

Inflexion Analytix Pvt. Ltd

Data Analyst

07.2016 - 06.2018

Job overview

Developing dashboards to track productivity and expedite remediation of issues
Write data definition language or data manipulation language SQL commands
Creating and executing queries utilizing various data sources to provide business information
Installing SQL Server DB and power BI, moved customer data given in CSV format into SQL Server DB
Developing an analytical product for warranty management using power BI, SQL Server, Azure, and power BI gateway
Worked on SQL queries to query the Repository DB to find the deviations from Company's ETL Standards for the objects created by users such as Sources, Targets, Transformations, Log Files, Mappings, Sessions and Workflows.

Dynamic IT Solutions Pvt. Ltd

Jr. Data Analyst

05.2015 - 07.2016

Job overview

Develop and manage data databases that support performance improvement
Develop and manage reports on multiple key performance indicators and metrics across Revenue Cycle Management
Develops and evaluates network performance criteria and measurement methods
Assist Managers in identifying capabilities and Processes that drive continuous improvement
Analyze our game data by cohort to provide suggestion to the marketing team to improve the performance of our acquisition campaign
Created complex SQL queries and scripts to extract, aggregate and validate data from MS SQL, Oracle, and flat files using
Informatica and loaded into a single data warehouse repository.

Education

The University of Texas At Dallas
Richardson, TX

Master of Science from Financial Mathematics (STEM)

08.2018 - 12.2020

University of Delhi
New Delhi, India

Bachelor of Commerce With Honors from Business Education

08.2013 - 05.2016

Skills

CloudFront, Route53, DynamoDB, Code Pipeline, EKS, Athena, Quick Sightundefined

Timeline

AWS Data Engineer

Texas Health and Human Services Commission

10.2022 - Current

AWS Data Engineer

Disney Streaming Services

04.2022 - 09.2022

Data Engineer

TDS Telecom

03.2021 - 03.2022

Big Data Engineer

CTDI

07.2020 - 12.2020

Data Engineer

McKesson

05.2019 - 06.2020

The University of Texas At Dallas

Master of Science from Financial Mathematics (STEM)

08.2018 - 12.2020

Data Analyst

Inflexion Analytix Pvt. Ltd

07.2016 - 06.2018

Jr. Data Analyst

Dynamic IT Solutions Pvt. Ltd

05.2015 - 07.2016

University of Delhi

Bachelor of Commerce With Honors from Business Education

08.2013 - 05.2016

Similar Profiles

Zaria CooperZaria Cooper
Human Services Specialist II at Texas Health and Human Services CommissionHuman Services Specialist II at Texas Health and Human Services Commission
Sachin KambleSachin Kamble
Community Development and Training Specialist at Texas Health And Human Services CommissionCommunity Development and Training Specialist at Texas Health And Human Services Commission
Maribel Cantu EspinozaMaribel Cantu Espinoza
Investigator at Texas Health and Human Services Commission, Office of Inspector GeneralInvestigator at Texas Health and Human Services Commission, Office of Inspector General
Cynthia GoertzCynthia Goertz
Public Health Prevention Specialist at Texas Health And Human Services CommissionPublic Health Prevention Specialist at Texas Health And Human Services Commission
TaylorAmir DayTaylorAmir Day
Receptionist/Guest Service Representative at J&P Asset ManagementReceptionist/Guest Service Representative at J&P Asset Management

CREATE PROFILE

Summary

Overview

Work History

Texas Health and Human Services CommissionAustin

Job overview

Disney Streaming ServicesSeattle

Job overview

TDS Telecom, Wisconsin

Job overview

CTDI, Texas

Job overview

McKesson, Texas

Job overview

Inflexion Analytix Pvt. Ltd

Job overview

Dynamic IT Solutions Pvt. Ltd

Job overview

Education

The University of Texas At DallasRichardson, TX

University of DelhiNew Delhi, India

Skills

Timeline

AWS Data Engineer

AWS Data Engineer

Data Engineer

Big Data Engineer

Data Engineer

The University of Texas At Dallas

Data Analyst

Jr. Data Analyst

University of Delhi

Similar Profiles

Zaria CooperZaria Cooper

Sachin KambleSachin Kamble

Maribel Cantu EspinozaMaribel Cantu Espinoza

Cynthia GoertzCynthia Goertz

TaylorAmir DayTaylorAmir Day

Texas Health and Human Services Commission
Austin

Disney Streaming Services
Seattle

TDS Telecom
, Wisconsin

CTDI
, Texas

McKesson
, Texas

The University of Texas At Dallas
Richardson, TX

University of Delhi
New Delhi, India