- Created text classification models to help automate transformation of raw data to relevant domain specific labeled datasets.
- Created pipelines that takes in uploaded documents (PDFs, Word documents, OCR'd scanned documents, etc.) and cleans it to be used in NLP tasks.
- Used NLP methods and packages (spaCy, BERT, Hugging Face transformers, etc.) to extract metadata (titles, keywords, and summaries) from uploaded documents to custom search engine.
- Use fuzzy matching to find similarities between entries in multiple datasets to create a master dataset with all relevant information within.
- Decoupled hard-coded data from investment tool and implemented functionality which allows users to select data through a GUI which queries into a SQL Database.
- Improved upon existing Plotly Dashboard by making it more user friendly allowing users not well versed in code to utilize the investment tool.
- Created visualizations to view differences in portfolio weights and constraints and improved existing data visualizations to make it easier to understand for all users.
Patent Doc Code Classification
- Worked on a model that takes textual data of patent descriptions
and classifies them based on document codes.
- Implementing ability to extract text from images of pdfs using pytesseract.
Airline Delay Model
- Created a regression model in TensorFlow to help predict monthly airline
delays caused to factors controllable by airline carriers.
Plant Disease Classification
- Created multiple classification models including k-nearest neighbors and
convolutional neural networks to help detect disease on plant leaves using a
large image dataset.