Summary
Overview
Work History
Education
Skills
Timeline
Hi, I’m

Ananya Mishra

Los Angeles
Ananya Mishra

Summary

Senior data engineer with extensive knowledge of modern data ecosystem and platform design principles, coding experience across a variety of languages, strong ETL & database management knowledge, and analytics/visualization expertise, dedicating to delivering actionable insights from large and complex datasets.

Overview

8
years of professional experience

Work History

Curative Health

Senior Data Engineer
03.2022 - Current

Job overview

  • Contribute to migration of data warehouse from AWS Redshift to Snowflake (SQL, Terraform, BuildKite)
  • Create, deploy, monitor, and debug ETL pipelines using Prefect workflow automation and Fivetran to bring in data from 3rd party sources and other internal teams (Python, SQL)
  • Develop new models and improve upon existing business objects in conjunction with analyst team using dbt ELT analytics layer (SQL, jinja)
  • Decommission visualization layer on Metabase and move analytics content to Looker (Python, SQL)

Multiple Companies

Senior Data Engineer (Contractor)
11.2020 - 03.2022

Job overview

  • IoT/Wearable Biometrics Manufacturer (6 months)
    Main projects included rearchitecture of partner data sharing platform on AWS Lambda and upgrades to data sharing REST API on AWS API Gateway, as well as client technical advisement on API usage best practices.
  • Remote Job Recruitment Platform (6 months)
    Primary responsibilities included maintenance of analytics platform on dbt Cloud and migration of Salesforce integration to Hightouch reverse ETL platform for improved consistency. In addition deployed several auxiliary microservices in support of data platform using Docker/Terraform.
  • Major Online Music Retailer (6 months)
    Assisted in development and migration of ETL pipelines on legacy Pentaho platform to modern Google Cloud BigQuery tasks

One Medical

Data Engineer
10.2019 - 11.2020

Job overview

  • Collaborated with Data Engineering team on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability (Python, Airflow)
  • Addressed ad hoc analytics requests and facilitated data acquisitions to support internal projects, special projects and investigations.
  • Contributed to internal activities for overall process improvements, efficiencies and innovation.
  • Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines including HIPAA and SOX.
  • Generated detailed studies on potential third-party data handling solutions, verifying compliance with internal needs and stakeholder requirements.

Grove Collaborative

Data Engineer
12.2017 - 06.2019

Job overview

  • Built cohesive data platform based on Amazon Redshift to integrate analytics across the organization including operations, sales, merchandising, and marketing data
  • Implemented & evangelized Apache Airflow for superior management of ETL tasks and centralization of data transport across the engineering organization
  • Created analytical models based on user behavior data to fuel growth and increase customer retention

Whistle Labs

Data Scientist
07.2016 - 12.2017

Job overview

  • Responsible for device analytics platform and infrastructure
  • Implemented realtime log processing and data visualization architecture utilizing Google Protobuf, Python, AWS Lambda/Redshift/Kinesis, LogEntries, and open source tools such as Apache Airflow and Airbnb's Superset
  • Built platform to monitor manufacturing test processes and prevent anomalies in end-user experience
  • Created dashboards used company-wide and by C-suite to monitor device performance in the field
  • Empowered firmware team to iterate quickly using realtime statistics on device performance characteristics
  • Contributed to open source projects such as Apache Airflow.

MemSQL

Forward Deployed Engineer
11.2015 - 07.2016

Job overview

  • Responsible for conducting proofs-of-concept and demos for enterprise and technology customers
  • Led technical discussions with customer development teams as product expert on MemSQL
  • Integrated product into customer environments by architecting schemas and optimizing SQL for distributed architectures
  • Built realtime data pipelines by leveraging open-source tools such as Apache Kafka and Spark
  • Prototyped future product features by writing Python and bash scripts to automate database operations
  • Demoed business intelligence capabilities of MemSQL by integrating Tableau, Looker, MicroStrategy, etc.
  • Project managed product fixes and enhancement requests, escalating to Engineering when required.

Education

Cornell University
, Ithaca, NY

Bachelor of Arts from Physics
2013

University Overview

  • Dean's List
  • Recipient of Cornell Alumni Association of Greater Houston scholarship

Galvanize
, San Francisco

Data Science Immersive from Data Science & Engineering
11.2015

University Overview

Underwent a 12-week immersive program (graduated early at the 6-week mark) covering the following topics:

• Exploratory Data Analysis and Software Engineering Best Practices
• Statistical Inference, Bayesian Methods, A/B Testing, Multi-Armed Bandit
• Regression, Regularization, Gradient Descent
• Supervised Machine Learning: Classification, Validation, Ensemble Methods
• Clustering, Topic Modeling (NMF, LDA), NLP
• Network Analysis, Matrix Factorization, and Time Series
• Hadoop, Hive, and MapReduce
• Data Visualization with D3.js, Data Products, and Fraud Detection Case Study

Skills

  • Python (numpy, pandas, nltk, statsmodels, scikit-learn)
  • SQL RDBMS (Snowflake, Redshift, SingleStore, MySQL, PostgreSQL)
  • Infrastructure (docker, terraform/grunt, Kubernetes/EKS, buildkite)
  • UNIX/bash scripting, git, GitHub
  • AWS Ecosystem (Kinesis, Lambda, EC2, ECR, S3, RDS, Firehose, Athena, Glue Catalog, DMS)
  • Data Orchestration/Big Data (Prefect, Dagster, Airflow, Hive, Spark, Pig, Kafka)
  • Analytics, Dimensional Modeling, Machine Learning
  • Scripting, ETL, and Database Management
  • Data Platform Architecture & Scaling
  • Data Security (SOC, HIPAA)

Timeline

Senior Data Engineer

Curative Health
03.2022 - Current

Senior Data Engineer (Contractor)

Multiple Companies
11.2020 - 03.2022

Data Engineer

One Medical
10.2019 - 11.2020

Data Engineer

Grove Collaborative
12.2017 - 06.2019

Data Scientist

Whistle Labs
07.2016 - 12.2017

Forward Deployed Engineer

MemSQL
11.2015 - 07.2016

Cornell University

Bachelor of Arts from Physics

Galvanize

Data Science Immersive from Data Science & Engineering
Ananya Mishra