Summary
Overview
Work History
Education
Skills
Websites
Certification
Accomplishments
Projects
Timeline
Generic

Akshay Singh Thakur

Houston,TX

Summary

Experienced Data Engineer with a Master's in Data Science, skilled in using AWS, Azure, and Databricks to enhance data workflows and analytics. Known for crafting effective solutions that boost data handling and decision-making across various industries.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Data Engineer

Alice, LLC
11.2023 - Current
  • Implemented Apache Airflow Dynamic DAGs to streamline data retrieval from multiple APIs and load it into Databricks' Delta Live Tables, enhancing data management and accessibility
  • Constructed fault-tolerant pipelines using Apache Kafka and Amazon Kinesis for real-time data ingestion into Redshift, reducing processing latency by 25% and ensuring timely insights
  • Orchestrated a seamless migration from PostgreSQL to AWS Redshift via S3 using Talend, transferring 10TB of data within a stringent 3-week deadline, ensuring data accuracy and governance
  • Leveraged Apache Flink for advanced windowing and event-time processing in conjunction with Databricks' Delta Lake for time series analysis, enabling the identification of customer retention trends and a 10% reduction in churn rate
  • Integrated AWS Lambda with Simple Queue Service (SQS) for data enrichment workflows, processing over 100,000 daily messages and driving a 20% increase in user engagement through personalized real-time notifications
  • Collaborated with business users, project leaders, and developers to gather requirements and create logical and physical data models on OLAP Star Schema, ensuring alignment with business objectives
  • Ensured data security and compliance with GDPR and HIPAA by implementing robust security measures to protect sensitive personal and financial information in highly scalable AWS environments.

Graduate assistant

University of Houston
08.2022 - 05.2024
  • Designed and implemented Python-based solutions to extract data from PDF documents containing university student information, optimizing the data retrieval process
  • Developed a Python script using the AWS Textract API that reduced data processing time by 30% through efficient extraction of data from PDFs
  • Streamlined data processing workflows with custom algorithms for data cleansing and normalization, ensuring high data integrity for subsequent storage in a MySQL database
  • Successfully deployed data extraction scripts to AWS using GitLab CI/CD pipelines, which enhanced deployment efficiency and reduced downtime, critical for maintaining student data accessibility
  • Leveraged AWS Lambda for event-driven execution of scripts and AWS API Gateway to create secure API endpoints, enabling real-time processing and access to student data, supporting timely academic and administrative decisions.

Data Engineer

TATA Consultancy Services (TCS)
11.2020 - 12.2021
  • Developed and managed end-to-end data pipelines with Azure Data Factory and Databricks, tailored for insurance data evaluation and extraction
  • Enhanced data processing efficiency by 15% through optimized loading into Azure Blob Storage, using Azure SQL and Synapse Analytics for in-depth gene data analysis
  • Implemented high-throughput real-time data ingestion systems using Azure Event Hubs and Apache Kafka, achieving a rate of 12,000 events per second
  • Utilized Azure Data Lake Storage for scalable and secure data handling
  • Designed and built a predictive analytics model using Azure Machine Learning to forecast insurance claims and detect fraud, integrating seamlessly with data pipelines
  • Collaborated with senior architects to maintain advanced data architectures, ensuring robust documentation and compliance with data governance standards.

Education

Master of Science - Data Science

University of Houston
Houston, TX
05.2001 -

Skills

Pythonundefined

Websites

Certification

AWS Solutions Architect

Accomplishments

  • Rewarded as the "Best Employee of the Month" for three consecutive months in TCS.
  • Secured a third position in the prestigious Tech Innovators Challenge 2022 by designing an advanced, AI-driven analytics tool on Python that significantly enhances data processing capabilities for small businesses.
  • Nominated and voted for by colleagues for exceptional teamwork, collaboration, and contribution to projects.

Projects

UAV-Based Bridge Crack Detection Using Deep Learning, Deployed an advanced deep learning framework using UAV-captured imagery to identify structural cracks in bridges, applying the LeNet model for precise segmentation., Keras, ImageDataGenerator, Delivered a high model accuracy of 84% on both training and validation datasets, validating the robustness of the model and the effectiveness of the training approach.

Prediction of Downhole Equipment Using Sensor Data, Built a machine learning model using binary classification techniques to predict equipment failures, specifically focusing on oil pipeline leakages. This model analyzed data collected from targeted sensor readings, ensuring accurate failure prediction., ARIMA, Utilized feature ranking within the machine learning framework to pinpoint which sensors were most predictive of failures, directing maintenance efforts more efficiently and enhancing equipment reliability. 

Intrusion Detection Prediction Model, Produced machine learning pipeline to develop a predictive model capable of identifying intrusions in computer systems, significantly enhancing security measures., Pandas, Matplotlib, Sci-kit Learn, The model played a crucial role in bolstering system security by proactively detecting and addressing potential security breaches, thereby mitigating risks and safeguarding data integrity.

Timeline

Data Engineer

Alice, LLC
11.2023 - Current

Graduate assistant

University of Houston
08.2022 - 05.2024

Data Engineer

TATA Consultancy Services (TCS)
11.2020 - 12.2021

Master of Science - Data Science

University of Houston
05.2001 -
Akshay Singh Thakur