Overview
Work History
Education
Skills
Timeline
Generic

Satya Anusha Sripathi

Plano,TX

Overview

6
6
years of professional experience

Work History

Senior Data Engineer

8451
07.2019 - 07.2023

Project: clickstream

  • Designed, Developed and maintained software solutions in Hadoop cluster and its components using Cloudera, HDFS, yarn, Pyspark, airflow, Databricks, Azure and UNIX shell scripting.
  • Migrated clickstream data from on-prem to azure storage and automated jobs using azure data factory and databricks to schedule on daily-basis.
  • Extract, transform and load data from on-prem to Azure data storage services using a combination of Azure data factory, spark, spar-sql and azure delta lake.
  • Built POC using delta live tables to move from on-prem to azure which involved a lot of analysis and debugging sessions to use cloud files and databricks techniques and features.
  • Analyzed spark architecture during POC which includes spark core, Dataframes, spark streaming, worker nodes, driver memory, executor memory, stages, auto scaling and execution hierarchy.
  • Built delta live tables for streaming clickstream data in Databricks environment, reading files as delta format and saving files as parquet in azure blob storage.
  • Retrieved Azure cost usage reports using powerBI which gives visualization charts of each job’s average cost on monthly and fiscal calendar basis.

Project: CCPA

  • Worked on California consumer privacy act project (CCPA law) to remove private data of Kroger customers from target locations using sha tokens and encryption techniques.
  • Developed orchestration process using airflow scheduled using cron scheduling techniques on daily and weekly run basis.
  • Retired projects from google cloud platform and made sure data and workflows were successfully migrated from Google cloud platform and storage systems. (GCP, GCS).
  • Built pipelines using Nifi RabbitMQ services to build to make data available in different target systems.



Data Engineer

Worldpay
02.2018 - 07.2019

Worldpay Group is a payment processing company. The company provides payment services for mail order and Internet retailers, as well as point of sale transactions.

  • Designed POC for building data marts in Hadoop environment to retire traditional code in PL/SQL.
  • Ingested transactional data from various source systems i.e BPM, FICO into Hadoop ecosystem using components hive, pig, spark, map reduce, impala, oozie workflow.
  • Developed Spark code using Scala and Spark-SQL/Streaming for transforming raw data into HDFS.
  • Implemented SQOOP import to load encrypted card data from RDBMS ( sql server, DB2) to unix server. Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS and created hive tables on top of it.
  • Performed SQL Joins among Hive tables to get input for Spark batch process. Migrated HiveQL queries on structured data into Spark QL to improve performance.
  • Pulled data from salesforce and applied ETL using data stage IBM and informatica to load data applying different quality rules into centralized Hadoop platform
  • Developed workflow in Oozie and automate the tasks using TWS scheduler.

Graduate Assistant

University Of Illinois, Springfield
01.2017 - 12.2017
  • As a graduate assistant, Contributed to research and data analysis within [Academic department] landscape.
  • Administered coursework, graded assignments and provided constructive feedback.
  • Assisted faculty members with data collection for potential academic publications.
  • Gathered, reviewed, and summarized literature from scientific journals such as SciFinder and PubMed and produced graphs and other scientific calculations using MS Excel.

Education

Master of Science - Computer Science

University of Illinois At Springfield
Springfield, IL
12.2017

Bachelor of Science - Computer Science

Jawaharlal Nehru Technological University
Hyderabad
05.2016

Skills

  • python
  • spark
  • Shell
  • SQL
  • SparkSql
  • HiveQL
  • Scala
  • Microsoft Azure
  • Google cloud platform
  • Airflow
  • AWS
  • Hadoop ecosystem
  • MongoDB, Hbase
  • Git
  • Teamscity, Jenkins
  • Unix
  • Datastage IBM

Timeline

Senior Data Engineer

8451
07.2019 - 07.2023

Data Engineer

Worldpay
02.2018 - 07.2019

Graduate Assistant

University Of Illinois, Springfield
01.2017 - 12.2017

Master of Science - Computer Science

University of Illinois At Springfield

Bachelor of Science - Computer Science

Jawaharlal Nehru Technological University
Satya Anusha Sripathi