Working as a core member of a global news service's data science team to deliver high quality content and enrichment for production applications (using Python, NLP, spacy, fastapi, prodigy, transformer, embeddings, NER, Relationship extraction, corenlp, spark, elastic search, solr, AWS microservices)
- Developed negative news attribution model to identify negative news phrases in the document and attributable relationship to a person/company using span categorizer, dependency parsing, spacy projects, prodigy with a F1 score of 78%. Processed 2 Billion records for pre indexing results for negative news search query of our product Diligence.
- Developed Ingestion Pipeline and helped the Product team to ingest PII data. Built a secure pipeline for retrieval data (100M records) from 3rd party vendors and preprocessed the data to normalize and standardize encrypted data using AWS Services, Datalake and Databricks unity catalogs. Designed the Solr schema and ingested the data in Solr for the app team to use and search against.
- POC project for business to analyze the customer feedback on powering up our product with AI – using natural language query for customers and auto generated Biographies for the executive data based on selected features using LLM Claude2
- Developed information extraction end to end pipeline for company executive's employment/eduction history using spark, spacy transformers, corenlp KBP, custom NER(Title, Education, Duration), relation extraction for processing 20 million docs and optimized spacy transformer models using onnx to increase performance 13 times.
- Working on building entity linking system to resolve commerical companies from knowledge base using elastic search and feature extractions via industry embeddings, entity description embedding and information extraction.
- Kickstarted the CI/CD effort and migrated existing REST microservices to cloud to automate on-commit build and test-runs along with sonarqube setup for code quality and coverage. Designed, implemented and deployed microservices Unique Stamper generation RESTful API which is used to generate unique stamps for 20+ million NEWS content