Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

SAHITHYA MANNARU

Jersey City

Summary

Data Scientist with over 6 years of hands-on experience in designing and deploying end-to-end solutions in the financial services and cybersecurity domains. Highly skilled in developing scalable data pipelines, building intelligent systems using Large Language Models (LLMs) like OpenAI, HuggingFace, and LLaMA, and leveraging advanced analytics for real-time insights. Proficient in Python, SQL, Informatica ETL, and AWS, with strong capabilities in data engineering, NLP, and GenAI-powered applications. Demonstrated success in delivering AI-first solutions for top-tier clients including US Bank, Bank of America, and General Motors. Adept at working in Agile environments, collaborating with cross-functional teams, and aligning technical delivery with business objectives.

Overview

7
7
years of professional experience
1
1
Certification

Work History

AI/ML Data Scientist

Synchreon
10.2023 - Current

Client: USBank
Project: PitchDeck(MongoDB, FAISS, Huggingface, MinIo, SERP API, Docker)

  • Implemented seamless file upload support for PDFs and Word documents using both PandasAI and OpenAI
  • Utilized OpenAI and PandasAI for extracting plots and summaries from uploaded documents like PDFs and Word documents
  • Led the adaptation of the application by checking the outputs given by language models like Falcon, Hugging Face, Llama, and Langchain, broadening the platform's capabilities for users
  • Employed MongoDB as the cornerstone of our data infrastructure, ensuring seamless storage and retrieval of structured data. This implementation facilitated efficient organization and access to critical information necessary for in-depth sectoral and stock research reports, enabling portfolio managers to make well-informed decisions
  • Utilized FAISS (Facebook AI Similarity Search) to bolster search capabilities, enabling rapid and accurate similarity searches for stocks based on key features. This optimization significantly enhanced portfolio management strategies by providing portfolio managers with actionable insights into potential investments and risk management
  • Integrated Hugging Face's Transformers library for advanced natural language processing tasks, such as sentiment analysis of market news and research reports. This integration empowered portfolio managers with deeper insights into market sentiments, enabling them to react swiftly to changing market conditions and investor sentiments
  • Orchestrated the deployment of MinIO as our cloud-native object storage solution, ensuring secure and scalable storage of unstructured data essential for comprehensive research reports. This implementation streamlined the management of documents, multimedia assets, and other critical data, facilitating efficient collaboration and decision-making processes
  • Leveraged scikit-learn (SCIKIT), including sector clustering and stock performance prediction. By analyzing historical data and identifying patterns, this approach provided portfolio managers with valuable predictive analytics, facilitating proactive investment strategies and risk mitigation measures
  • SERP API for automated web scraping and data extraction from financial websites and news portals. This automation minimized manual efforts and ensured the accuracy and timeliness of data collected for research reports
  • Employed MongoDB for structured data management, facilitating quick access to critical information for in-depth research reports and informed investment strategies

Client: Bank of America
Project:
RiskControl AI(OpenAI, LangChain, Agentic AI, MongoDB, Hadoop, MySQL, MinIO, Informatica ETL)

  • Implemented data ingestion pipelines to collect cybersecurity data from multiple authoritative sources, automating ingestion using scheduled jobs to keep threat intelligence up to date
  • Developed ingestion workflows to fetch application-specific vulnerabilities from various 3rd-party security tools, and processed the data through a transformation pipeline before storing it in a centralized data lake and structured MySQL database for efficient access
  • Designed and deployed microservices-based API layer responsible for executing business logic and security controls using an integrated rules engine, ensuring modularity and scalability of security automation workflows
  • Engineered NLP pipelines to parse CWE (Common Weakness Enumeration) data and calculate real-time cyber risk scores, enriching threat models with contextual insights
  • Developed a conversational AI bot interface leveraging LLMs (OpenAI, LangChain) to interact with enterprise users, enabling intuitive querying, risk exploration, and scenario analysis via natural language
  • Applied data science-driven mapping models to relate vulnerabilities, exposures, assets, and business functions, supporting real-time impact analysis and scenario-based simulations
  • Enabled secure object storage using MinIO for unstructured cybersecurity artifacts and integrated results into the iTRACC platform, extending its capabilities for proactive threat response and regulatory compliance
  • Coordinated the deployment of MinIO for secure cloud storage, streamlining document management and enhancing team collaboration
  • Fostered a culture of innovation within the team, encouraging knowledge sharing and collaboration to drive continuous improvement in AI/ML projects

Data Engineer

General Motors
04.2022 - 04.2023
  • Leveraged Python and Unix shell scripting to efficiently process and manipulate large datasets, leading to a remarkable 20% improvement in data accuracy and a 15% reduction in cleaning time. Automated scripts significantly enhanced data processing efficiency
  • Developed and optimized sophisticated data integration workflows using Informatica ETL, ensuring seamless data flow between various systems. Streamlined data integration, improved data reliability, and enhanced overall system performance
  • Conducted extensive data querying, optimization, and maintenance tasks on Oracle Exadata databases, resulting in a remarkable 70% improvement in data processing and retrieval speeds. Collaborated with database administrators and data engineers to optimize SQL queries for enhanced performance
  • Collaborated with cross-functional teams to successfully implement Agile methodologies, fostering efficient collaboration between developers, data analysts, and stakeholders. Agile adoption led to improved project execution, faster delivery, and continuous process improvement
  • Designed, developed, and implemented highly scalable and robust data pipelines on AWS (utilizing Kinesis, S3, EMR, Athena, and Redshift) for real-time processing of petabyte-scale data. Enabled the organization to handle vast data volumes and derive valuable insights
  • Utilized Informatica to skillfully design and implement efficient ETL processes, reducing data integration time by 30% and improving data accuracy by 20%. Created data mappings, workflows, and transformations for smooth data flow and seamless integration
  • Implemented and maintained highly efficient shell scripts for automating routine data tasks, such as data extraction and loading. Resulted in a substantial 40% reduction in manual effort and significantly improved data processing efficiency
  • Project: PitchDeck(MongoDB, FAISS, Huggingface, Minio, SERP API, Docker)

Software Engineer

Quantum Technologies Private Limited
08.2018 - 05.2021
  • Executed over 250 test scripts using Python to validate critical software elements, ensuring the high performance of the application with an impressive accuracy rate of 99.8%. The comprehensive testing strategy fortified software functionality and enhanced reliability
  • Engineered SQL queries and database protocols, leading to a remarkable 70% improvement in data processing and retrieval speeds. This optimization fostered a 10% increase in transparency and an overall organizational efficiency improvement of 39%, empowering data-driven decision-making
  • Collaborated closely with software engineers to seamlessly integrate new business models into existing systems, harnessing Python's capabilities to elevate overall system functionality by 25%. Simultaneously, a 20% reduction in manual workload streamlined operational efficiency
  • Facilitated the development of 10 internal operating systems, significantly shortening feedback turnover time and elevating customer satisfaction by 15%. This optimization was achieved by enhancing backend system stability and ensuring seamless data flow across platforms
  • Spearheaded the migration from on-premise servers to AWS cloud infrastructure (EC2, S3, RDS), catalyzing seamless scalability and cost optimization. This transformation harnessed the benefits of cloud computing, ensuring high availability
  • Leveraged PySpark for proficiently processing and analyzing big data, unlocking valuable data-driven insights and enabling evidence-based decision-making. The adoption of PySpark empowered the organization to efficiently handle vast datasets
  • Employed Informatica to expertly design, develop, and optimize data integration workflows, facilitating seamless data flow and integration across systems. This meticulous approach streamlined data accessibility and ensured timely information delivery
  • Project: PitchDeck(MongoDB, FAISS, Huggingface, Minio, SERP API, Docker)

Education

Master of Science - Computer Science

Stevens Institute of Technology
Hoboken, NJ
05.2023

BTech - mechanical engineering

Jawaharlal Nehru Institute of Technology
04.2018

Skills

  • Programming Languages : Python, Java, R, SQL, PLSQL, NoSQL, Unix Shell Scripting
  • Databases : MySQL, MongoDB, Oracle, PostgreSQL, Snowflake, Oracle Exadata
  • Cloud Technology : Amazon Web Services (AWS)
  • Big Data : Apache Hadoop, HDFS, MapReduce, Hive, HBase, Spark (PySpark)
  • ML Frameworks : Flask, NumPy, Pandas, Scikit-learn, TensorFlow, Keras, Matplotlib, Seaborn
  • Web Development : HTML5, CSS3
  • Scheduling & CI/CD Tools : Airflow, GitLab, Jenkins, Kubernetes, Jira, Ansible
  • Data Warehouse : Prism, data mapping, Informatica ETL
  • Large Language Models : HuggingFace, OpenAI, Llama
  • Other Tools : Power BI, Tableau, Excel, AutoSys

Certification

  • AWS Certified Solutions Architect- Associate
  • IBM Python Certified
  • HackerRank Python Certification

Timeline

AI/ML Data Scientist

Synchreon
10.2023 - Current

Data Engineer

General Motors
04.2022 - 04.2023

Software Engineer

Quantum Technologies Private Limited
08.2018 - 05.2021

BTech - mechanical engineering

Jawaharlal Nehru Institute of Technology

Master of Science - Computer Science

Stevens Institute of Technology
SAHITHYA MANNARU