3x AWS-certified Data Specialist with 3 years of hands-on experience in data analysis, engineering, machine learning, and data science. Skilled in a wide range of supervised and unsupervised learning techniques including regression, classification, clustering, time series forecasting, computer vision, and NLP. Proven ability to design and deploy cloud-based solutions, with a focus on Generative AI and Large Language Models (LLMs). Experienced with key AWS services such as SageMaker, EMR, Glue, Lambda, QuickSight, Bedrock, Kafka, and Kinesis. Proficient in Python, Spark, SQL, and R.
Overview
7
7
years of professional experience
1
1
Certification
Work History
AWS Data Platform Operations Specialist
Mactores Inc.
Mumbai, India
10.2023 - Current
Developed real-time Spark streaming jobs to fetch data from Apache Kafka, running on EMR clusters to process large datasets using various Spark operations, including joins, filters, data manipulation, and caching
Optimized the pipeline to ensure efficient processing and wrote the results to Amazon Timestream for real-time analytics and reporting
Worked on AWS Glue-based data pipeline jobs triggered by S3 inserts via AWS Lambda, automating data ingestion and processing pipelines
Designed and implemented workflows to load data into Apache Iceberg tables, ensuring efficient storage and query performance
Designed and implemented real-time dashboards using Amazon QuickSight to visualize data from MSSQL and Timestream databases
Engineered and automated Glue ELT jobs to efficiently transfer and transform terabytes of data from Amazon S3 to Redshift, ensuring high scalability and reliability
Optimized Redshift query performance by leveraging Workload Management (WLM), configuring distribution and sort keys, implementing Short Query Acceleration (SQA), and fine-tuning query queues for enhanced processing
Designed and implemented real-time dashboards in Amazon QuickSight, seamlessly visualizing complex datasets from AWS Redshift to empower data-driven decision-making
Conducted in-depth tuning of configurations to balance workloads effectively, minimize query contention, and improve resource allocation for diverse user groups
Implemented dynamic scaling of queues to adapt to changing workloads, ensuring optimal query performance during peak usage times
Developed a custom fine-tuned LLM (Llama 3) for translating VerticaDB SQL to DuckDB SQL, leveraging LoRA for efficient training
Implemented a Retrieval-Augmented Generation (RAG) system using FAISS/ChromaDB, indexing syntax differences to enhance translation accuracy
Integrated AWS Bedrock (Anthropic Claude) for hybrid AI-powered SQL conversion, ensuring high-quality outputs through model fallback
Automated post-processing & optimization of translated queries, benchmarking execution speeds between VerticaDB and DuckDB for performance validation.
Leveraged Amazon Connect to design an automated call center system that optimized call flow and improved customer experience
Implemented features like IVR (Interactive Voice Response), call routing, and call recording to provide a seamless and personalized experience to customers
Engineered an end-to-end data pipeline for seamless collection, processing, and storage of customer trace records (CTR) data from Connect calls, employing Kinesis Firehose, Glue, Lambda, Lex, Athena, and S3 to automate ingestion, processing, and storage, thereby guaranteeing high data quality and reliability
Integrated with QuickSight to visualize CTR data and provided stakeholders with actionable insights into call center performance, customer/agent behavior, and service quality
Leveraged Contact Lens to design and implement a proof of concept real-time sentiment analysis of call recordings and sentiment data extraction, to identify positive and negative sentiments, sentiment trends, and common themes
Tech Stack: Python, AWS(Connect, Glue ETL, Lambda Functions, S3, Cloudwatch and cloudtrail), SQL(Athena), Lex chatbots, Quicksight for visualization, Kinesis Firehose for streaming data and Contact lens for sentiment analysis
Data Science Intern
Johnson and Johnson (via. Kars Etech)
NJ, USA
06.2022 - 10.2022
Devised and implemented a custom NLP solution, achieving 98% accuracy by combining cosine similarity, Naive Bayes, and fuzzy matching to address variations in client company names, enhancing data quality and reducing downstream errors
Conducted a thorough analysis of industry standards and best practices related to forecasting techniques(LSTM, ARIMA, SARIMA, FBprophet, DeepAR, etc.)
Leveraged Dataiku DSS as the primary platform for data preprocessing in terms of taking care of missing values, outliers, aggregation, and visualization and implemented two case studies on demand forecasting using univariate and multivariate techniques
Utilized statistical and deep learning-based algorithms, such as ARIMA, Feed Forward Neural Network, and the AWS-based DeepAR model, for forecasting demand.
Achieved error scores of 11.1% and 7.2% for two case studies respectively, indicating high accuracy and improved decision-making for the business
Tested and Fixed Defects in the Application: Conducted thorough testing, including manual and automated approaches, to identify and resolve any issues, resulting in a stable and reliable application.
Integrated Software Services using eiConsole PilotFish: Seamlessly integrated the EBPP application with various third-party services through the use of the eiConsole PilotFish integration platform.
Followed the Agile Project Management Approach: Utilized Agile methodologies to collaborate efficiently with cross-functional teams, deliver incremental updates, and respond to changing requirements effectively.