Summary
Overview
Work History
Education
Skills
Affiliations
Timeline
Generic

Sandesh Reddy Kothi

Northville

Summary

Dynamic Sr. Data Scientist at DataFactz with expertise in AWS and advanced AI technologies. Achieved 90% accuracy in intelligent document processing and engineered real-time ML pipelines serving 500,000 predictions per hour. Proven ability to optimize data workflows and ensure compliance, demonstrating strong analytical and problem-solving skills.

Overview

6
6
years of professional experience

Work History

Sr. Data Scientist

DataFactz
Northville
02.2023 - Current
  • Architected and deployed RAG-based intelligent document processing systems using AWS Bedrock and SageMaker, achieving 90% accuracy in domain-specific queries across over 50 million documents.
  • Developed agent-based workflows for automated data classification and PII detection, processing over 10 million records daily with 99.5% accuracy.
  • Implemented advanced re-ranking algorithms with FAISS vector indexing, enhancing LLM output quality by 45% for enterprise search capabilities.
  • Built real-time ML inference pipelines using AWS SageMaker and Lambda functions, serving over 500,000 predictions per hour with sub-50ms latency.
  • Engineered complex ETL workflows with AWS Glue, processing over 2TB of data daily with 99.9% reliability and automated error handling.
  • Established comprehensive PII detection pipelines using AWS Comprehend Medical, ensuring GDPR and HIPAA compliance across all data lakes.
  • Designed scalable data lake architecture on AWS S3 with Lake Formation governance, implementing fine-grained access controls and compliance monitoring.
  • Implemented end-to-end encryption for sensitive data using AWS KMS and tokenization services for PII protection.

Azure Data Engineer

DataFactz
Hyderabad
06.2020 - 07.2022
  • Architected and deployed scalable data lake infrastructure using AWS S3, managing 100TB+ of structured and unstructured data with automated lifecycle policies.
  • Developed high-performance ETL pipelines with Apache Spark and AWS Glue, processing over 5TB daily from 200+ sources with 99.8% reliability.
  • Implemented real-time streaming architectures with Apache Kafka and AWS Kinesis, handling over 2M events per second with sub-second latency.
  • Engineered complex data transformation workflows using Apache Airflow, orchestrating multi-stage pipelines with robust error recovery mechanisms.
  • Established enterprise data warehousing solutions using Snowflake and Amazon Redshift, optimizing table designs and automating performance tuning.
  • Developed NoSQL database solutions leveraging MongoDB and DynamoDB for high-velocity ingestion, employing sharding strategies for sub-10ms query response.
  • Optimized query performance across platforms using advanced SQL techniques, achieving an 80% enhancement in analytical query response times.
  • Implemented comprehensive data governance using AWS Lake Formation and custom frameworks for data lineage tracking and compliance reporting.

Data Engineer

Aarmec Technologies
Bangalore
01.2019 - 05.2020
  • Optimized SQL queries and Java applications, reducing latency and enhancing scalability.
  • Implemented Snowflake models integrated with Tableau and Power BI for interactive dashboards.
  • Automated data ingestion using Apache Airflow and Talend, ensuring clean data delivery.
  • Architected multi-tier AWS applications utilizing EC2, RDS, SQS, and CloudFormation.
  • Developed React components with Redux Promise API for async calls.
  • Deployed Spark and Scala code in GCP's Hadoop cluster to improve data processing.
  • Conducted POCs on Neo4j and BI solutions, enhancing the analytics platform.
  • Leveraged AWS Glue to crawl S3 data lake, populating the Data Catalog.

Education

Masters(M.S) - Business And Data Analytics

University of New Haven
New Haven, CT, USA
05-2024

Bachelors(B.Tech) - B.Tech

National Institute of Technology
Warangal, Telangana, India
05-2019

Skills

  • Cloud platforms: Azure and AWS
  • Data integration: SQL and PL/SQL
  • Query languages: SQL and T-SQL
  • Big data tools: Apache Spark and Databricks
  • AWS services: EC2 and S3
  • ETL tools: Informatica and SSIS
  • Data warehousing: Snowflake and Redshift
  • SQL databases: Oracle and MySQL
  • NoSQL databases: Cassandra and MongoDB
  • AI technologies: Gen AI and Transformers
  • Version control: Git
  • BI tools: Tableau and Power BI
  • Operating systems: Linux and Windows

Affiliations

AI Generated Text Detection:

● Engineered an NLP pipeline for AI-generated content classification using TF-IDF and Multinomial Naive Bayes, achieving 95.4% accuracy on unseen data.

● Applied advanced preprocessing techniques (tokenization, stemming, stop-word and accent removal) to enhance model performance.

● Developed and deployed an interactive, production-ready Streamlit interface for real-time AI-generated text detection.

● Aligned with AI trustworthiness trends by contributing to the detection of synthetic text across digital content platforms.

Tools Used: Streamlit, Scikit-learn, NLTK, Pandas, NumPy.

Early Detection of Potato Diseases Using CNN:

● Designed and trained a convolutional neural network (CNN) to classify potato leaf diseases (Late Blight, Early Blight) with 96.4% accuracy.

● Deployed the model as a web application using Streamlit for real-time disease identification with average prediction latency < 5 seconds.

● Performed image augmentation (rotation, flipping) to improve generalization and robustness of the model.

● Promoted agricultural AI applications by facilitating precision farming and rapid disease response.

Tools Used: Python, TensorFlow, Keras, Streamlit, NumPy, Pillow (PIL).

Timeline

Sr. Data Scientist

DataFactz
02.2023 - Current

Azure Data Engineer

DataFactz
06.2020 - 07.2022

Data Engineer

Aarmec Technologies
01.2019 - 05.2020

Masters(M.S) - Business And Data Analytics

University of New Haven

Bachelors(B.Tech) - B.Tech

National Institute of Technology