Summary
Overview
Work History
Education
Skills
Personal Projects and Hackathons
Timeline
Generic

Sandeep Pilania

Linkedin.com/in/sandeep-pilania/

Summary

Skilled machine learning professional with 4+ years of experience in designing and developing ML models/ML pipelines with 5 years of software engineering background. Possesses a diverse experience in collating and analyzing data, text mining, NLP, developing new/enhancing existing models with end to end ML workflow and MLOps for productionalize the solutions. Proficient in collaborating with teams of high-performing professionals for developing ML frameworks and solutions for catapulting business growth.

Overview

12
12
years of professional experience

Work History

Data Scientist

LEXISNEXIS
Raleigh, NC
07.2018 - Current

Working as a core member of a global news service's data science team to deliver high quality content and enrichment for production applications (using Python, NLP, spacy, fastapi, prodigy, transformer, embeddings, NER, Relationship extraction, corenlp, spark, elastic search, solr, AWS microservices)

  • Developed negative news attribution model to identify negative news phrases in the document and attributable relationship to a person/company using span categorizer, dependency parsing, spacy projects, prodigy with a F1 score of 78%. Processed 2 Billion records for pre indexing results for negative news search query of our product Diligence.
  • Developed Ingestion Pipeline and helped the Product team to ingest PII data. Built a secure pipeline for retrieval data (100M records) from 3rd party vendors and preprocessed the data to normalize and standardize encrypted data using AWS Services, Datalake and Databricks unity catalogs. Designed the Solr schema and ingested the data in Solr for the app team to use and search against.
  • POC project for business to analyze the customer feedback on powering up our product with AI – using natural language query for customers and auto generated Biographies for the executive data based on selected features using LLM Claude2
  • Developed information extraction end to end pipeline for company executive's employment/eduction history using spark, spacy transformers, corenlp KBP, custom NER(Title, Education, Duration), relation extraction for processing 20 million docs and optimized spacy transformer models using onnx to increase performance 13 times.
  • Working on building entity linking system to resolve commerical companies from knowledge base using elastic search and feature extractions via industry embeddings, entity description embedding and information extraction.
  • Kickstarted the CI/CD effort and migrated existing REST microservices to cloud to automate on-commit build and test-runs along with sonarqube setup for code quality and coverage. Designed, implemented and deployed microservices Unique Stamper generation RESTful API which is used to generate unique stamps for 20+ million NEWS content

Software Engineer Intern

LEXISNEXIS
Raleigh
05.2017 - 08.2017

Named Entity Recognition with Machine Learning ( using Python, Core NLP, NLTK, and GloVE embedding)

  • Extracted 26 entities in over 30 terabytes of case law documents which achieved precision of 80% and recall of 94% and deployed the extraction engine on AWS EC2 instance

Sr Software Engineer

Bank of America
India
07.2012 - 07.2016

Application Monitoring and Service Operations Online Portal ( using C#, .Net, JavaScript)

  • Development of web application to provide customized dashboard for application monitoring, job monitoring, server monitoring and to facilitate service operations team with automation of Task management, Event management, Reporting and Scheduling. Portal helped a team of 50+ people to onitored around 400+ applications and associated jobs and servers.

Education

Master of Science - Computer Science

North Carolina State University
Raleigh, NC
05.2018

Bachelor of Technology - Computer Science

VIT University
India
05.2012

Skills

  • Machine Learning Skills: NLP, Deep learning, ML algorithms, Information Retrieval, Text Mining, Clustering, SVM, Embedding Models, MLOps, Parsing techniques - Dependancy parsing, POS tagging
  • Enterprise Systems/Platforms: AWS( EC2, ECS, EMR) , Jenkins, Spark
  • Programming Language: Python (TensorFlow, PyTorch, spaCy, Transformers, Pandas, Numpy, SciKit Learn, Flask, FastAPI), Scala, SQL, Java
  • Tools and Technologies: Elastic search, git-lfs, DVC, git, Springboot, sbt, maven

Personal Projects and Hackathons

  • Text to Search Query: Built a natural language to search query solution using chatgpt prompt engineering and elastic search to produce an interactive application where user can search for potential donor profiles and filter the results just using natural language.
  • Query Intent Hackathon: Built a Query Intent Classifier using ensembled approach with Logistic regression and bidirectional GRU deep learning model.
  • Global News Hackathon: Built a news plus portal to provide news articles with ensemble of abstractive and extractive summary, topic classification, integrity.
  • Image Classification: A 2-layer Convolution Neural Net trained on CIFAR-10 dataset.
  • SlackBot: A chatbot integrated in Slack to collect daily agile standup status and provide visualizations for sprint burndown and velocity.
  • Amazon's Next Headquater: Developed score based metrics in amalgamation with Logistic Regression in Python to rank candidate cities for HQ bid.

Timeline

Data Scientist

LEXISNEXIS
07.2018 - Current

Software Engineer Intern

LEXISNEXIS
05.2017 - 08.2017

Sr Software Engineer

Bank of America
07.2012 - 07.2016

Master of Science - Computer Science

North Carolina State University

Bachelor of Technology - Computer Science

VIT University
Sandeep Pilania