Summary
Overview
Work History
Education
Skills
Timeline
Generic

Preyaa Atri

Lead Data Science Engineer
New York,NY

Summary

Team oriented lead data science engineer seeking to leverage 7+ years of experience in building data intensive applications, tackling challenging architectural and scalability problems. Patient problem solver with an appreciation for clean code.

Overview

7
7
years of professional experience

Work History

Lead Data Science Engineer

MSC Industrial Supply
01.2022 - Current
  • Develop and maintain Data Engineering applications.
  • Built a machine learning system that predicts hardware malfunction with more than 80% accuracy.
  • Improved the efficiency of a customer recommendation engine by 33%.
  • Utilize Generative AI, specifically a Large Language Model (LLM), to create a Natural Language to Code Generator.
  • Orchestrate data pipelines by Airflow DAGs(Python), & Dbt.
  • Implement SCD Type 1 & Type 2 methodologies using SQL.
  • Built and optimize data warehouse on Google Cloud Platform, using BiqQuery, Composer, Cloud Storage, Cloud Functions and Dataform, to reduce costs by 30%.

Data Engineer

ON Q FINANCIAL
08.2020 - 12.2021
  • Data pipeline creation/maintenance using Python & SQL.
  • Design reliable and scalable data pipelines using AWS EMR.
  • Tune table designs in Amazon Redshift.

Data Science Intern

GRAVY ANALYTICS
05.2019 - 09.2019
  • Collaborate on ETL (Extract, Transform, Load) tasks, using Databricks.
  • Transfer data from database to Elasticsearch, using Java.
  • Design visualizations to communicate insights using Tableau.

Associate Data Engineer

AMDOCS DEVELOPMENT CENTRE
07.2016 - 10.2017
  • Integrate Kafka with Spark Streaming, using Scala.
  • Manage HDFS and load unstructured data.
  • Cleaning and statistical analysis of data using Python and R.

Data Analyst

COGNIZANT TECHNOLOGY SOLUTIONS
07.2014 - 07.2016
  • Import and export data using Flume and Kafka.
  • Partitioning and bucketing of data in Hive.
  • Migrate Data Warehouse from Oracle to AWS Redshift.
  • Real-time batch processing with Spark, over AWS EMR.

Education

Master of Science - Data Analytics Engineering

Volgenau School of Engineering, George Mason University
Fairfax, VA

Bachelor of Technology - Computer Science

Institute of Engineering, Bundelkhand University

Skills

  • Big Data Ecosystems - HDFS, Hive, Pig, Sqoop, Flume, Kafka, Oozie, HBase, Spark, Zookeeper
  • Languages/Concepts - Scala, Python, R, Java, SQL, Machine Learning, NLP, HiveQL
  • Databases - MySQL, PostgreSQL, MongoDB, DynamoDB, Oracle 12c
  • Tools & Utilities - Tableau, Power BI, Looker, Control M, Autosys, SQL Developer, Elastic Search and Kibana, Jenkins, Hadoop Framework, Microsoft Azure, Docker, Dbt, Databricks
  • AWS/Cloud Services - S3, EC2, Redshift, EMR, Data Lakes, Lambda
  • GCP - BigQuery, Composer, Cloud Storage, Dataform, Dataflow, Vertex AI, Apache Airflow, Pub/Sub, Data Fusion
  • Machine Learning Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, K-Means Clustering, Support Vector Machines

Timeline

Lead Data Science Engineer

MSC Industrial Supply
01.2022 - Current

Data Engineer

ON Q FINANCIAL
08.2020 - 12.2021

Data Science Intern

GRAVY ANALYTICS
05.2019 - 09.2019

Associate Data Engineer

AMDOCS DEVELOPMENT CENTRE
07.2016 - 10.2017

Data Analyst

COGNIZANT TECHNOLOGY SOLUTIONS
07.2014 - 07.2016

Master of Science - Data Analytics Engineering

Volgenau School of Engineering, George Mason University

Bachelor of Technology - Computer Science

Institute of Engineering, Bundelkhand University
Preyaa AtriLead Data Science Engineer