Summary
Overview
Work History
Education
Skills
Websites
Kaggle
Personal Information
Timeline
Generic

Fariha Baloch

Erie,USA

Summary

Experienced in working on complex operations focused projects, identifying key questions to address data challenges, formulating hypotheses, and finding creative solutions for all audiences. Proficient with clustering, classification & regression modeling, statistical analysis using Python, SQL, PANDAS and AWS.

Overview

16
16
years of professional experience

Work History

Data Engineer

Workday
Boulder, Colorado
11.2023 - Current
  • Architected and developed ETL data pipeline using tools like AWS Glue, Airbyte, AWS RDS and Tableau that brought in data from several disjoint sources together and helped the teams build resiliency metrics from one unified platform.
  • Researched on the ETL tools available in the market today and decided to host Airbyte OSS on AWS EKS cluster instead of buying licencse for the cloud version. This saved significant number of dollars for the org. Spent time learning the tool and wrote source code for internal data sources that were made part of unified data pipeline.
  • Wrote code for IaaS in Terraform and ArgoCD. This included using Helm module in Terraform to bring up FluentBit to collect logs and metrics from EKS cluster. Writing code for IaaS ensures that we can save time in case the current infrastructure becomes offline or a new duplicate configs are needed for DR or other purposes.
  • Worked with the internal teams to build Developer Experience metrics in Grafana that shows a team's strengths and weaknesses during different SDLC phases including their SLO/SLI metrics that helps with forecasting issues in production pipeline. By following SLO violations in their services, recently a team was able to find out an issue hours before customer reported and were ready with the root cause.

Data Analyst

Workday
Boulder, Colorado
01.2022 - 11.2023
  • Applied topic modeling using LDA and BERTopic models to a set of transcripts to reduce time to in extracting the summary of the interviews. This helped saved a few hours in data analysis portion of the
  • Collected, analyzed and visualized data using Tableau from several teams in the organization to recognize the opportunities for the teams to invest in and make critical decisions using data. The results were used by different orgs to help plan their future product goals while keeping their developer’s interest at the forefront.

Data Scientist

Planetary Care
12.2020 - 12.2021
  • Extracted Reddit posts using PushShift Reddit API to create a sentiment analysis tool that takes a key phrase and finds sentiments in the posts and generates scatter plot of the sentiments' intensity over a time period set by the user. The tool is developed with Dash using plotly graphs and hosted on AWS beanstalk for non technical teams to interact easily.
  • Created a tool using Genism library that extracts the keywords, the key phrases and the summary from any PDF document or any website. The code is used to identify and promote relevant literature on the regenerative agriculture. By using this code, the research team saved time by reading only the articles that are relevant to a customer's need.

Data Scientist

FinGoal (Techstars/MetLife '20)
08.2020 - 12.2021
  • To enrich the credit card transactional data and understand a user's preferred qualities in a restaurant, performed web scarping of the attributes published on the restaurants' yelp webpages using requests and BeautifulSoup libraries. Saved time in data collection process by using webscraper instead of manual addition of the fields.
  • The data was used to give a highly personal advice to the customers, based on their favored choice of the attributes of a restaurant.
  • Performed clustering of the merchants in a credit card transactional data with similar attributes. The summaries from each merchant's wiki-page were extracted using the Wikipedia app and transformed to vectors using TF-IDF and Doc2vec models. This give the model extra help in predicting customer's preferences and resulted in better clustering results.

Product Validation Lead

Intel Corporation
Folsom, CA
01.2017 - 01.2018
  • Lead a PCI-E based SSD project for the validation team to ensured validation activities were carried out according to established guidelines from the org.
  • Strategized the influx of the work for several validation sub-teams and streamlined the outflux of results for several of Intel’s top customers.
  • Presented status using several indicators of the project progress and the quality of the products to the management keeping the program release on time and mitigated risks as soon as those arose.

QA Lead and Scrum Master

NetApp
Wichita, Kansas
09.2007 - 01.2017
  • Scrum Master for multiple technical projects
  • Leveraged Agile principles to keep teams on track and become self-organized
  • Helped build new metrices that reflected a product’s health on the data collected from all sub-teams and aggregated results in one visual chart.

Education

Career Track Certification in Data Science - Data Science

Springboard
10.2020

Certification in Data Analysis -

Cornell University
04.2018

PhD in Electrical Engineering -

Wichita State University
05.2014

Skills

  • AWS Glue
  • AWS EKS
  • Python
  • SQL
  • ML
  • ETL
  • Terraform
  • Data Analysis
  • Requirements Gathering
  • Key Performance Indicators

Kaggle

www.kaggle.com/fariha23

Personal Information

Title: Data Expert | Scrum Master

Timeline

Data Engineer

Workday
11.2023 - Current

Data Analyst

Workday
01.2022 - 11.2023

Data Scientist

Planetary Care
12.2020 - 12.2021

Data Scientist

FinGoal (Techstars/MetLife '20)
08.2020 - 12.2021

Product Validation Lead

Intel Corporation
01.2017 - 01.2018

QA Lead and Scrum Master

NetApp
09.2007 - 01.2017

Career Track Certification in Data Science - Data Science

Springboard

Certification in Data Analysis -

Cornell University

PhD in Electrical Engineering -

Wichita State University
Fariha Baloch