Overview
Work History
Education
Skills
Certification
Projects
Timeline
Generic

RISHI ALLA

Ashburn,VA

Overview

1
1
year of professional experience
1
1
Certification

Work History

Associate Data Scientist

LinQuest
08.2023 - Current

- Created text classification models to help automate transformation of raw data to relevant domain specific labeled datasets.

- Created pipelines that takes in uploaded documents (PDFs, Word documents, OCR'd scanned documents, etc.) and cleans it to be used in NLP tasks.

- Used NLP methods and packages (spaCy, BERT, Hugging Face transformers, etc.) to extract metadata (titles, keywords, and summaries) from uploaded documents to custom search engine.

- Use fuzzy matching to find similarities between entries in multiple datasets to create a master dataset with all relevant information within.

USSF Software Developer Intern

LinQuest
05.2023 - 08.2023

- Decoupled hard-coded data from investment tool and implemented functionality which allows users to select data through a GUI which queries into a SQL Database.

- Improved upon existing Plotly Dashboard by making it more user friendly allowing users not well versed in code to utilize the investment tool.

- Created visualizations to view differences in portfolio weights and constraints and improved existing data visualizations to make it easier to understand for all users.

Education

M.S. - Data Science

George Mason University
Fairfax, VA
05.2024

B.S. - Data Science

George Mason University
Fairfax, VA
05.2023

Skills

  • Python, SQL, R, C, Fortran
  • Docker
  • Spark
  • Plotly
  • AWS
  • NLP: spaCy, BERT, Hugging Face, LangChain
  • Modeling: Support Vector Machines (SVM), Decision Trees, Naive Bayes, Deep Learning (Transformers, Convolutional Neural Networks, LLMs)
  • Python Packages including Scikit-learn, Pandas, Matplotlib, Seaborn, Numpy, Pytorch, TensorFlow
  • GUI Development with Dash

Certification

  • AWS Solutions Architect Associate Certified (Issued by AWS)
  • DeepLearning.AI TensorFlow Developer (Issued by Coursera)

Projects

Patent Doc Code Classification
- Worked on a model that takes textual data of patent descriptions
and classifies them based on document codes.
- Implementing ability to extract text from images of pdfs using pytesseract.
Airline Delay Model
- Created a regression model in TensorFlow to help predict monthly airline
delays caused to factors controllable by airline carriers.
Plant Disease Classification
- Created multiple classification models including k-nearest neighbors and
convolutional neural networks to help detect disease on plant leaves using a
large image dataset.

Timeline

Associate Data Scientist

LinQuest
08.2023 - Current

USSF Software Developer Intern

LinQuest
05.2023 - 08.2023

M.S. - Data Science

George Mason University

B.S. - Data Science

George Mason University
RISHI ALLA