Summary

Overview

Work History

Education

Skills

Certification

Websites

Timeline

SIDDHESH SHAJI

Mumbai,Maharashtra

Summary

3x AWS-certified Data Specialist with 3 years of hands-on experience in data analysis, engineering, machine learning, and data science. Skilled in a wide range of supervised and unsupervised learning techniques including regression, classification, clustering, time series forecasting, computer vision, and NLP. Proven ability to design and deploy cloud-based solutions, with a focus on Generative AI and Large Language Models (LLMs). Experienced with key AWS services such as SageMaker, EMR, Glue, Lambda, QuickSight, Bedrock, Kafka, and Kinesis. Proficient in Python, Spark, SQL, and R.

Overview

years of professional experience

Certification

Work History

AWS Data Platform Operations Specialist

Mactores Inc.

Mumbai, India

10.2023 - Current

Developed real-time Spark streaming jobs to fetch data from Apache Kafka, running on EMR clusters to process large datasets using various Spark operations, including joins, filters, data manipulation, and caching
Optimized the pipeline to ensure efficient processing and wrote the results to Amazon Timestream for real-time analytics and reporting
Worked on AWS Glue-based data pipeline jobs triggered by S3 inserts via AWS Lambda, automating data ingestion and processing pipelines
Designed and implemented workflows to load data into Apache Iceberg tables, ensuring efficient storage and query performance
Designed and implemented real-time dashboards using Amazon QuickSight to visualize data from MSSQL and Timestream databases
Engineered and automated Glue ELT jobs to efficiently transfer and transform terabytes of data from Amazon S3 to Redshift, ensuring high scalability and reliability
Optimized Redshift query performance by leveraging Workload Management (WLM), configuring distribution and sort keys, implementing Short Query Acceleration (SQA), and fine-tuning query queues for enhanced processing
Designed and implemented real-time dashboards in Amazon QuickSight, seamlessly visualizing complex datasets from AWS Redshift to empower data-driven decision-making
Conducted in-depth tuning of configurations to balance workloads effectively, minimize query contention, and improve resource allocation for diverse user groups
Implemented dynamic scaling of queues to adapt to changing workloads, ensuring optimal query performance during peak usage times
Developed a custom fine-tuned LLM (Llama 3) for translating VerticaDB SQL to DuckDB SQL, leveraging LoRA for efficient training
Implemented a Retrieval-Augmented Generation (RAG) system using FAISS/ChromaDB, indexing syntax differences to enhance translation accuracy
Integrated AWS Bedrock (Anthropic Claude) for hybrid AI-powered SQL conversion, ensuring high-quality outputs through model fallback
Automated post-processing & optimization of translated queries, benchmarking execution speeds between VerticaDB and DuckDB for performance validation.
Tech Stack: Amazon QuickSight, AWS Lambda, AWS Glue, Amazon EMR, Apache Spark, Apache Iceberg, Apache Kafka, Amazon Timestream, MSSQL, Python, PySpark, Amazon S3, AWS Redshift, AWS Bedrock, LLM, Gen AI, Anthropic Claude, DuckDB, AWS EKS, AWS RDS, Airflow

Python Programmer Intern

SiAnth Inc.

Alpharetta, USA

03.2023 - 09.2023

Leveraged Amazon Connect to design an automated call center system that optimized call flow and improved customer experience
Implemented features like IVR (Interactive Voice Response), call routing, and call recording to provide a seamless and personalized experience to customers
Engineered an end-to-end data pipeline for seamless collection, processing, and storage of customer trace records (CTR) data from Connect calls, employing Kinesis Firehose, Glue, Lambda, Lex, Athena, and S3 to automate ingestion, processing, and storage, thereby guaranteeing high data quality and reliability
Integrated with QuickSight to visualize CTR data and provided stakeholders with actionable insights into call center performance, customer/agent behavior, and service quality
Leveraged Contact Lens to design and implement a proof of concept real-time sentiment analysis of call recordings and sentiment data extraction, to identify positive and negative sentiments, sentiment trends, and common themes
Tech Stack: Python, AWS(Connect, Glue ETL, Lambda Functions, S3, Cloudwatch and cloudtrail), SQL(Athena), Lex chatbots, Quicksight for visualization, Kinesis Firehose for streaming data and Contact lens for sentiment analysis

Data Science Intern

Johnson and Johnson (via. Kars Etech)

NJ, USA

06.2022 - 10.2022

Devised and implemented a custom NLP solution, achieving 98% accuracy by combining cosine similarity, Naive Bayes, and fuzzy matching to address variations in client company names, enhancing data quality and reducing downstream errors
Conducted a thorough analysis of industry standards and best practices related to forecasting techniques(LSTM, ARIMA, SARIMA, FBprophet, DeepAR, etc.)
Leveraged Dataiku DSS as the primary platform for data preprocessing in terms of taking care of missing values, outliers, aggregation, and visualization and implemented two case studies on demand forecasting using univariate and multivariate techniques
Utilized statistical and deep learning-based algorithms, such as ARIMA, Feed Forward Neural Network, and the AWS-based DeepAR model, for forecasting demand.
Achieved error scores of 11.1% and 7.2% for two case studies respectively, indicating high accuracy and improved decision-making for the business
Tech Stack: Python, Dataiku DSS, AWS S3, SQL, NLP, DeepAR, ARIMA, FBProphet, keras, Auto-ML

Junior Software Engineer

Majesco Pvt. Ltd.

Navi Mumbai, India

07.2018 - 02.2019

Tested and Fixed Defects in the Application: Conducted thorough testing, including manual and automated approaches, to identify and resolve any issues, resulting in a stable and reliable application.
Integrated Software Services using eiConsole PilotFish: Seamlessly integrated the EBPP application with various third-party services through the use of the eiConsole PilotFish integration platform.
Followed the Agile Project Management Approach: Utilized Agile methodologies to collaborate efficiently with cross-functional teams, deliver incremental updates, and respond to changing requirements effectively.

Education

Master's degree - Data Science

Stevens Institute of Technology

New Jersey, USA

12.2022

Bachelor's degree - Electronics Engineering

R.A.I.T

Navi Mumbai, India

06.2018

Skills

Spark streaming
Data pipeline
AWS
Real-time analytics
Data visualization
Query optimization
Machine learning
Statistical analysis

Gen AI
LLMs
Deep Learning
Python
SQL
Databricks
Dataiku
Sagemaker

Certification

AWS Certified Machine Learning Specialist
AWS Certified Data Analytics Specialist
AWS Certified DevOPS Professional

Websites

Timeline

AWS Data Platform Operations Specialist

Mactores Inc.

10.2023 - Current

Python Programmer Intern

SiAnth Inc.

03.2023 - 09.2023

Data Science Intern

Johnson and Johnson (via. Kars Etech)

06.2022 - 10.2022

Junior Software Engineer

Majesco Pvt. Ltd.

07.2018 - 02.2019

Master's degree - Data Science

Stevens Institute of Technology

Bachelor's degree - Electronics Engineering

R.A.I.T

SIDDHESH SHAJI

Summary

Overview

Work History

AWS Data Platform Operations Specialist

Python Programmer Intern

Data Science Intern

Junior Software Engineer

Education

Master's degree - Data Science

Bachelor's degree - Electronics Engineering

Skills

Certification

Websites

Timeline

AWS Data Platform Operations Specialist

Python Programmer Intern

Data Science Intern

Junior Software Engineer

Master's degree - Data Science

Bachelor's degree - Electronics Engineering

Similar Profiles

NAGA NIKHILESWAR REDDY DADIREDDYNAGA NIKHILESWAR REDDY DADIREDDY

ASHWITA GAJAREASHWITA GAJARE

Anupama RamadharAnupama Ramadhar

Devang GangapurwalaDevang Gangapurwala

Hridayesh AroraHridayesh Arora