Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

Andrew Reinke

Data Scientist Engineer
Vancouver,Washington

Summary

Innovative, life long learner focused on growth and the discovery of new opportunities. Motto: automate everything, except maybe cooking.

  • United States Citizen
  • New Zealand Permanent Resident

Overview

21
21
years of professional experience
4
4
years of post-secondary education

Work History

Senior Data Engineer

Datum Consulting Group
Washington, D.C.
10.2022 - Current
  • Developed Python scripts for AWS Lambda jobs using Docker. Created Python functions to process incoming JSON and pass the data to AWS Aurora Postgres RDS. Uploaded to AWS Elastic Container Registry (ECR) and attached to AWS Lambda Functions. Modified VPC/Security Groups/Subnets to allow for Lambda to access RDS/S3/OpenSearch.
  • Wrote scripts to push data from Postgres RDS to AWS Open Search (AWS Elastic Search) using JSON payloads.
  • Wrote Postgres Procedures, called from Python, which allowed multiple CRUD operations to act as a single transaction.
  • Developed Python scripts on DataBricks for processing Terrabyte sized tables including developing matching algorithms to find similar records.
  • Developed Pyspark scripts for file ingestion, cleaning and analytics problems. Later scaled those Pyspark scripts by creating AWS Elastic Map Reduce (AWS EMR) clusters.
  • Developed EMR Workbooks (Jupyter Lab Notebooks) attached to EMR Clusters.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Used Bash/Linux shell scripting and Python to design and update databases.
  • Employed data cleansing methods, significantly Enhanced data quality.
  • Contributed to internal activities for overall process improvements, efficiencies and innovation.
  • Loaded data sets into RDS Aurora and RDS Postgres using psql called from an EC2 w/ primary key creation.
  • Benchmarked performance of RDS Aurora vs RDS Postgres containing exact datasets using Postgres pgbench utility.
  • Built and configured Athena Lambda Functions to connect to RDS Aurora and RDS Postgres allowing for cross database joins in SQL.
  • Configured RDS Postgres and RDS Aurora to connect to EC2 instances.
  • Built Python in DataGrip using SSH Python Interpreter to connect to remote EC2 instance, developing locally, saving to GitLab for CI/CD.
  • Connected to AWS Athena, AWS RDS Aurora, AWS RDS Postgres using Python Libraries: PySpark, PyAthena, Psycopg2 to allow for ETL processes.
  • Created IceBerg datalake in AWS Athena using Python calling Athena as well as created IceBerg Tables in Athena using Athena SparkEngine Workgroup environment.
  • Converted RDS Postgres to RDS Aurora using Postgres to Aurora Read Replica migration route.
  • Created VPC groups, IAM Policies.
  • Created database connections in Data Grip, to RDS Aurora and RDS Postgres from DataGrip and Pycharm using SSH tunnuel on an EC2.
  • Automated FTP file ingestion using Python Paramiko and private keys passing files and directories to Postgres and later dataframes which would trigger Python functions based on file name and type, later saving data to AWS EFS Elastic File System allowing for cross availability zone and mount point on any EC2 Linux in any AZ. This essentially created a unlimited NAT drive from any EC2.
  • Helped designed Engineering Requirement Diagrams (ERD) using Lucid Chart.

Senior Data Engineer / Lead Data Scientist

Bill.Com
San Francisco, CA
04.2019 - Current
  • Built API pipelines to allow for automated industry classification codes to be applied for sales leads as well as customers and vendors. This allowed for targeted sales followups based on momentum within certain industries resulting in lower marketing costs and better customer acquisition metrics.
  • Extracted LDA data from customers and imported into Neo4j allowing for customer and lead clustering analysis. The clustering analysis allowed for tailored and higher precision Machine Learning models to be applied to the cluster for better customer and lead classification.
  • Built API's to ingest socisal media data from Reddit and Meltwater pushing results to AWS S3 parquet files. From there, Athena tables were built from the S3 location and an AWS QuickSight Dashboard was created to filter results including using of Quicksight Calculated Fields, Filters, and Parameters to allow for ad-hoc searches on results.
  • Built Python API to read from F.R.E.D. (Economic Research, Federal Reserve Bank of St. Louis Missouri) then pushed data to AWS Quicksight Dashboards for CPI comparisons, industry metrics, etc.
  • Converted Python Pandas scripts to Python PySpark, reducing processing time from 6 weeks to 2 hours. This involved moving from Python Record Linkage to Pyspark's Ceja Jaro Winkler algorithm including conversion of Pandas transformations to Pyspark equivalents.
  • Built Net Present Value (NPV ) calculations, using Python, on each customer's revenue stream allowing us to find those industries most valuable.
  • Built and lead Invoice2Go's customer to Bill.com's customer matching program. The python code (using recordlinkage library) and pipeline were automated and scaled on Sagemaker. This allowed us to de-duplicate same customers across multiple companies.
  • Built and maintained an internal search engine which scanned millions of websites using Python, saving the results into Athena for quickly finding cohorts matching certain criteria.
  • Built financial prediction models using StatsModel's Sarimax / (seasonal Arima) time series forecasting in Python including walk forward analysis allowing for predictions 12 months ahead with 84% accuracy.
  • SageMaker Jupyter Lab notebook development including bash terminal scripting for automated library installs.
  • Python script automation using AWS EC2 instance resulting in 24/7 availability of Google Sheets custom calculations integrated with Athena.
  • ARIMA Forecasting Model in Python with Seasonality Smoothing and Seasonality Parameter Optimization.
  • Created Mixpanel API scripts in Python to push events from datalake to Mixpanel. This allowed value added metrics to be displayed alongside and within standard and custom Mixpanel reports.
  • Created Twitter API scripts in Python to pull in Twitter data for certain keywords and network diagrams.
  • Created tSNE visual based on multidimensional data to visualize multiple features and their correlations. This allowed for marketing and optimization of certain processes.
  • Created matching system in Python to match same customers from dissimilar datasets. After purchasing another company, customers from two separate companies needed to be matched to each other. Using Python FuzzyMatcher and RecordLinkage a pipeline was built to match same customers using a probability of match.
  • Created Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) for NLP Topic Modeling on 80,000 comments.
  • Pushed NLP data from Athena to Slack using Slack's API Webhooks written in Python scheduled in Cron. This allowed for near Real Time Customer comment distribution to affected stake holders.
  • Website reviews extracted using Python Requests Google Big Query Machine Learning Create Model: BoostedTreeClassifier, Auto_ML including using of Predict for realtime pipelining of predictions.
  • GitLab use for pushing Python code to repositories for CI/CD jobs Built NLP extraction processes in Python using TextBlob to extract Noun Phrases, Parts of Speech, Sentiment, Polarity, Subjectivity, N Grams Created Machine Learning Text Classification Model in SKLearn ( Multi Class Naive Bayes).
  • Created Machine Learning Models in H2O.ai Driverless including Python H2O.ai run locally on laptop. Models used: Auto_ML Leaderboard, XGBoost Created Machine Learning Models using SKLearn including the use of the Pipeline which allows for transformations and splitting of data to be streamlined.
  • Built pipelines from Athena to Google Sheets and back to Athena allowing Sheets to automatically update.
  • Applied Machine Learning models (Logistics Regression, Random Forest Classifier) for prediction of user behavior in Python. This involved AWS Command Line (aws-cli) calls to pull datasets from AWS Athena into Python Pandas Data Frames. Predictions from models were uploaded using Python to AWS S3 allowing Athena tables to self update.
  • AWS Quick Sight Dashboard development including use of Calculated Fields driven from Spice pulled from Athena Views.
  • AWS Athena SQL query development / analytics including using AWS S3 command line scripting and transformation queries in Python.
  • Extensive Presto SQL development.
  • Heat Maps, Tree Maps, Word Clouds Driven by Data Pulled From AWS Athena.
  • Converted Big Query SQL to Amazon Red Shift SQL developing extensively in Metabase with AWS Red Shift as Big Data Engine GCP Command Line Scripting Big Query Using BQ Load, GS Util including parameterized Big Query SQL and bash "code that makes code" Big Query Data Studio Development AWS QuickSight Big Data Dashboard Development For International Payments Foreign Exchange FX Dashboard.
  • Created SQL scripts using Google's Advanced Window Analytic Functions Ran Sentiment Analysis and Natural Language Processing NLP on user comments' using Python w/ libraries: Pandas, Spacy, Scipy (Spatial), NLTK / Vader (SentimentIntensityAnalyzer) to reduce manual comment processing resulting in 80% reduction of labor for faster Net Promoter Score reporting.
  • Created Google Data Studio and Google Sheets graphs such as GeoMaps, Scatter Plots and Bubble Charts to display and analyze data extracted.
  • Foreign Exchange FX Data Analysis on an ongoing basis including Revenue Calculations, dashboarding, and ongoing spread discrepancies.
  • Extensive R plotting using ggPlot and Seaborne.
  • Extracted data from Google Big Query and ran Linear and Polynomial Regression analysis using R.
  • Extracted raw JSON data from MixPanel and converted to Google Big Query tables using Linux Debian commands such as JQ, BQ, CURL, SED, BASH Automated ETL for A/B Web Experiments so Analysts could see results of experiments automatically without needing to preprocess their data.
  • Created ETL scripts that extracted data from MixPanel (MP) using MP's API scripted in Google Cloud Shell and converted to Google Big Query tables for the Data Analytics teams.
  • Developed Google Big Query SQL scripts to extract and transform data for Bill.Com's Product Growth Unit Department.
  • Google Big Query API calls and Google Cloud Storage API Calls, data from Mixpanel was transferred and stored in GCP Platform.
  • SQL / Linux Scripting / ETL / AWS Athena Python R/pp/p Mixpanel Rest API calls were created to transfer hundreds of GB's of user experience data from Mixpanel into Google Big Query for analysis.
  • Used Inferential Boostrapping to successfully predict Profit Loss in very near term months.

Senior Data Integration Engineer

Enable Networks Limited
Christchurch, New Zealand
09.2018 - 11.2018
  • Developed Microsoft SSIS ETL scripts to automate the creation of purchase order lines from SQL Server and legacy applications into Microsoft Dynamics NAV.
  • This involved creating XML API calls and handling API responses including errors and exceptions.
  • Developed Microsoft SSIS ETL scripts to automate creation of new leads into Sales Rabbit for door to door sales.
  • This involved creating JSON API calls and handling API responses including errors and exceptions.
  • Created SQL Server queries, views, functions and procedures to import, cleanse and move data from multiple external systems into and out of Microsoft SQL Server.

Senior Data Architect / Senior Data Engineer

Parts Quarry LLC
Wilmington, DE
01.2007 - 08.2018
  • Developed a large data warehouse (Kimball Method) which connected to the US Defense Logistics Information System (DLIS).
  • Developed a US Military supply parts business, conceptualizing the company from inception including managing a team of 15 data entry, 1 contract administrator and 1 warehouse operator all remotely.
  • Developed Microsoft SQL Server / MySQL / MariaDb SQL Procedures for automating large complex data migration ETL processes. The information was refreshed and maintained automatically using Linux Bash scripts and SQL procedures.
  • Created Linux IpTables filtering script to reduce server hacking attempts by 99%. ( counted failed root login attempts by IP, (using system logs) and ignored additional login attempts by same IP for x minutes)) Extensive use of SQL Procedure writing including query tuning and optimization; index analysis and optimization; slow and general query log analysis. Scripted hot backups of MySQL and MariaDb.
  • Scripted the creation of and maintained multiple Linux distributions including Debian, Centos, Ubuntu, and Linux Mint.
  • Created Microsoft SQL Server / MySQL / MariaDb backup and restoration procedures.
  • Developed and maintained remote server backup processes in Linux using rclone/rsync/ssh allowing for fast server to server transfer speeds (DropBox and Google Drive among others).
  • Designed a mechanism using Linux Bash and SQL to import all open RFQs (Request for Quote) received by US Military for Foreign Military Sales within NATO. Maintained DNS records including DKIM and SPF1 records used for email security.
  • Mapped the processes of the data entry department, optimized those processes and developed a PHP / MySQL web based data entry system used for an overseas data entry team which reduced data entry costs by +90% and errors by +99%.
  • Created a quote requesting system that automatically faxed and emailed tailored RFQs to specific vendors tied directly to the data warehouse. This created the workload, for the overseas data entry team of 20,000 pages of quotes a month.
  • Developed reports in Microsoft Visual Studio.
  • Optimized profit and revenue by developing a Monte Carlo Simulation on historical quotation data. The resulting analysis led to the recreation of our pricing and sourcing policies which decreased work load by 80% and allowed the company to keep 95% of net profit compared to pre-optimization.
  • Mapped the processes of the contract administrator, optimized those processes and developed a PHP / MySQL web based contract administration management system. This resulted in reducing contract administration costs by 60% and eliminated errors.
  • Mapped the processes of the parts packaging department, optimized those processes and recreated the workflow.
  • Developed all the software needed to encode the RFID shipping tags, automate the production of DD250 forms and all needed milspec packaging forms and all shipping labels. This software ensured the compliance with government contract regulations and guidelines.
  • Automated invoice generation with the US Defense Department's Wide Area Workflow (WAWF) website using the DLA's sftp flat file upload schema to Ogden Utah.
  • Developed a batch file creator to import data into and synchronize contract administration system with QuickBooks and MySQL.

BI Developer / Consultant

Information Technologies Group Inc
Anchorage, AK
01.2002 - 01.2007
  • Mapped specific accounting processes for British Petroleum (BP) to facilitate the consolidation of Philips Petroleum accounting processes in Dallas Texas from Anchorage Alaska.
  • Developed and scripted the creation of 500 cloud servers using Amazon AWS, Google Cloud Compute and Digital Ocean used for an affiliate marketing sales company.
  • These processes created 500 separate and unique web carts using X-Cart that worked with Google Shopping.
  • The result scaled a single webcart and website to 500 webcarts and websites for a larger market profile.
  • Conducted needs and requirements analyses with clients for GCI in Anchorage Alaska.
  • I defined areas of opportunities for efficiency gains and conceived and implemented innovative solutions to document key personnel processes for faster training of new personnel.
  • Oversaw the elimination of ongoing labor costs for Petroleum Equipment Services, creating automation routines.
  • Analyzed historical data for City of Anchorage's Worker Training Program.
  • Managed a team of data entry clerks that collated all data and created reports.

Algorithmic Trading Systems Developer - (hobby)

-
Anchorage, AK
  • Developed a personal server cluster for backtesting 50 years of historical Chicago Mercantile Exchange Futures data for automated algorithmic trading systems.
  • This allowed for backtesting trading strategies on Live Cattle, Lean Hogs and certain spread trading on the Soybean Crush.
  • Developed and scripted the creation of 1000 Cloud Servers using Digital Ocean, AWS, and Google Cloud Compute.
  • This allowed for the backtesting of 60 years of data to find every possible 1-5 day patterns that lead to large movements in the underlying instruments.
  • Developed automated trading systems that are currently trading in the Forex and Equities Markets using Python, Linux Bash, and MariaDb all in the cloud.
  • Developed automated trading systems using Interactive Brokers API and ibPy python wrapper.
  • Developed automated trading systems using Oanda's API.
  • This included JSON calls and data extraction methods, stop market, stop limit orders.
  • Pushed Polygon.IO data to Big Query using API calls.
  • This created tables with Billions of records available for analysis during market hours.

Education

MBA - Business Management (1 Year)

University of Alaska Anchorage (1 Year)
Anchorage, AK
06.1997 - 06.1997

Bachelor of Arts - Finance

University of Alaska Anchorage
Anchorage, AK
06.1996 - 06.1996

Data Science: Inferential Thinking - Simulation

University of California, Berkeley (edX)
Berkeley, CA
07.2021 - 07.2021

Data Science: Computational Thinking With Python

University of California, Berkeley (edX)
Berkeley, CA
08.2021 - 08.2021

Data Science: Machine Learning And Predictions

University of California, Berkeley (edX)
Berkeley, CA
09.2021 - 09.2021

NLP - Natural Language Processing With Python

Udemy.com
06.2019 - 06.2021

Python For Data Science And Machine Learning

Udemy.com
10.2020 - 10.2021

Data Science And Machine Learning Bootcamp With R

Udemy.com
01.2021 - 01.2021

Python For Financial Analysis & Algorithmic Trades

Udemy.com
05.2020 - 05.2021

Skills

LinuxDatabase DevelopmentData Warehouse DevelopmentSQL Procedure WritingLinux Cloud DevelopmentPythonBashAlgorithmic Trading SystemsSQLGoogle Big QueryGoogle Cloud Shell

Machine Learning Models - scikit

Orange Data Mining

Timeline

Senior Data Engineer

Datum Consulting Group
10.2022 - Current

Data Science: Machine Learning And Predictions

University of California, Berkeley (edX)
09.2021 - 09.2021

Data Science: Computational Thinking With Python

University of California, Berkeley (edX)
08.2021 - 08.2021

Data Science: Inferential Thinking - Simulation

University of California, Berkeley (edX)
07.2021 - 07.2021

Data Science And Machine Learning Bootcamp With R

Udemy.com
01.2021 - 01.2021

Python For Data Science And Machine Learning

Udemy.com
10.2020 - 10.2021

Python For Financial Analysis & Algorithmic Trades

Udemy.com
05.2020 - 05.2021

NLP - Natural Language Processing With Python

Udemy.com
06.2019 - 06.2021

Senior Data Engineer / Lead Data Scientist

Bill.Com
04.2019 - Current

Senior Data Integration Engineer

Enable Networks Limited
09.2018 - 11.2018

Senior Data Architect / Senior Data Engineer

Parts Quarry LLC
01.2007 - 08.2018

BI Developer / Consultant

Information Technologies Group Inc
01.2002 - 01.2007

MBA - Business Management (1 Year)

University of Alaska Anchorage (1 Year)
06.1997 - 06.1997

Bachelor of Arts - Finance

University of Alaska Anchorage
06.1996 - 06.1996

Algorithmic Trading Systems Developer - (hobby)

-
Andrew ReinkeData Scientist Engineer