Summary
Overview
Work History
Education
Skills
Certification
Websites
Timeline
Generic
SIDDHESH SHAJI

SIDDHESH SHAJI

Mumbai,Maharashtra

Summary

  • 3x AWS-certified Data Specialist with 3 years of hands-on experience in data analysis, engineering, machine learning, and data science. Skilled in a wide range of supervised and unsupervised learning techniques including regression, classification, clustering, time series forecasting, computer vision, and NLP. Proven ability to design and deploy cloud-based solutions, with a focus on Generative AI and Large Language Models (LLMs). Experienced with key AWS services such as SageMaker, EMR, Glue, Lambda, QuickSight, Bedrock, Kafka, and Kinesis. Proficient in Python, Spark, SQL, and R.

Overview

7
7
years of professional experience
1
1
Certification

Work History

AWS Data Platform Operations Specialist

Mactores Inc.
Mumbai, India
10.2023 - Current
  • Developed real-time Spark streaming jobs to fetch data from Apache Kafka, running on EMR clusters to process large datasets using various Spark operations, including joins, filters, data manipulation, and caching
  • Optimized the pipeline to ensure efficient processing and wrote the results to Amazon Timestream for real-time analytics and reporting
  • Worked on AWS Glue-based data pipeline jobs triggered by S3 inserts via AWS Lambda, automating data ingestion and processing pipelines
  • Designed and implemented workflows to load data into Apache Iceberg tables, ensuring efficient storage and query performance
  • Designed and implemented real-time dashboards using Amazon QuickSight to visualize data from MSSQL and Timestream databases
  • Engineered and automated Glue ELT jobs to efficiently transfer and transform terabytes of data from Amazon S3 to Redshift, ensuring high scalability and reliability
  • Optimized Redshift query performance by leveraging Workload Management (WLM), configuring distribution and sort keys, implementing Short Query Acceleration (SQA), and fine-tuning query queues for enhanced processing
  • Designed and implemented real-time dashboards in Amazon QuickSight, seamlessly visualizing complex datasets from AWS Redshift to empower data-driven decision-making
  • Conducted in-depth tuning of configurations to balance workloads effectively, minimize query contention, and improve resource allocation for diverse user groups
  • Implemented dynamic scaling of queues to adapt to changing workloads, ensuring optimal query performance during peak usage times
  • Developed a custom fine-tuned LLM (Llama 3) for translating VerticaDB SQL to DuckDB SQL, leveraging LoRA for efficient training
  • Implemented a Retrieval-Augmented Generation (RAG) system using FAISS/ChromaDB, indexing syntax differences to enhance translation accuracy
  • Integrated AWS Bedrock (Anthropic Claude) for hybrid AI-powered SQL conversion, ensuring high-quality outputs through model fallback
  • Automated post-processing & optimization of translated queries, benchmarking execution speeds between VerticaDB and DuckDB for performance validation.
  • Tech Stack: Amazon QuickSight, AWS Lambda, AWS Glue, Amazon EMR, Apache Spark, Apache Iceberg, Apache Kafka, Amazon Timestream, MSSQL, Python, PySpark, Amazon S3, AWS Redshift, AWS Bedrock, LLM, Gen AI, Anthropic Claude, DuckDB, AWS EKS, AWS RDS, Airflow

Python Programmer Intern

SiAnth Inc.
Alpharetta, USA
03.2023 - 09.2023
  • Leveraged Amazon Connect to design an automated call center system that optimized call flow and improved customer experience
  • Implemented features like IVR (Interactive Voice Response), call routing, and call recording to provide a seamless and personalized experience to customers
  • Engineered an end-to-end data pipeline for seamless collection, processing, and storage of customer trace records (CTR) data from Connect calls, employing Kinesis Firehose, Glue, Lambda, Lex, Athena, and S3 to automate ingestion, processing, and storage, thereby guaranteeing high data quality and reliability
  • Integrated with QuickSight to visualize CTR data and provided stakeholders with actionable insights into call center performance, customer/agent behavior, and service quality
  • Leveraged Contact Lens to design and implement a proof of concept real-time sentiment analysis of call recordings and sentiment data extraction, to identify positive and negative sentiments, sentiment trends, and common themes
  • Tech Stack: Python, AWS(Connect, Glue ETL, Lambda Functions, S3, Cloudwatch and cloudtrail), SQL(Athena), Lex chatbots, Quicksight for visualization, Kinesis Firehose for streaming data and Contact lens for sentiment analysis

Data Science Intern

Johnson and Johnson (via. Kars Etech)
NJ, USA
06.2022 - 10.2022
  • Devised and implemented a custom NLP solution, achieving 98% accuracy by combining cosine similarity, Naive Bayes, and fuzzy matching to address variations in client company names, enhancing data quality and reducing downstream errors
  • Conducted a thorough analysis of industry standards and best practices related to forecasting techniques(LSTM, ARIMA, SARIMA, FBprophet, DeepAR, etc.)
  • Leveraged Dataiku DSS as the primary platform for data preprocessing in terms of taking care of missing values, outliers, aggregation, and visualization and implemented two case studies on demand forecasting using univariate and multivariate techniques
  • Utilized statistical and deep learning-based algorithms, such as ARIMA, Feed Forward Neural Network, and the AWS-based DeepAR model, for forecasting demand.
  • Achieved error scores of 11.1% and 7.2% for two case studies respectively, indicating high accuracy and improved decision-making for the business
  • Tech Stack: Python, Dataiku DSS, AWS S3, SQL, NLP, DeepAR, ARIMA, FBProphet, keras, Auto-ML

Junior Software Engineer

Majesco Pvt. Ltd.
Navi Mumbai, India
07.2018 - 02.2019
  • Tested and Fixed Defects in the Application: Conducted thorough testing, including manual and automated approaches, to identify and resolve any issues, resulting in a stable and reliable application.
  • Integrated Software Services using eiConsole PilotFish: Seamlessly integrated the EBPP application with various third-party services through the use of the eiConsole PilotFish integration platform.
  • Followed the Agile Project Management Approach: Utilized Agile methodologies to collaborate efficiently with cross-functional teams, deliver incremental updates, and respond to changing requirements effectively.

Education

Master's degree - Data Science

Stevens Institute of Technology
New Jersey, USA
12.2022

Bachelor's degree - Electronics Engineering

R.A.I.T
Navi Mumbai, India
06.2018

Skills

  • Spark streaming
  • Data pipeline
  • AWS
  • Real-time analytics
  • Data visualization
  • Query optimization
  • Machine learning
  • Statistical analysis
  • Gen AI
  • LLMs
  • Deep Learning
  • Python
  • SQL
  • Databricks
  • Dataiku
  • Sagemaker

Certification

  • AWS Certified Machine Learning Specialist
  • AWS Certified Data Analytics Specialist
  • AWS Certified DevOPS Professional

Timeline

AWS Data Platform Operations Specialist

Mactores Inc.
10.2023 - Current

Python Programmer Intern

SiAnth Inc.
03.2023 - 09.2023

Data Science Intern

Johnson and Johnson (via. Kars Etech)
06.2022 - 10.2022

Junior Software Engineer

Majesco Pvt. Ltd.
07.2018 - 02.2019

Master's degree - Data Science

Stevens Institute of Technology

Bachelor's degree - Electronics Engineering

R.A.I.T
SIDDHESH SHAJI