Summary
Overview
Work History
Education
Skills
Timeline
Generic

Jana Bhogireddy

Jacksonville,FL

Summary

Overall 9 years of professional IT experience which includes 5+ years of experience in Big data ecosystem & related technologies like Hadoop HDFS, Spark, Kafka Streaming, Scala, Python, Apache Pig, Hive, Spark, Sqoop, HBase, Oozie, AWS cloud and 4+ years in Data warehouse Implementation. Committed job seeker with a history of meeting company needs with consistent and organized practices. Skilled in working under pressure and adapting to new situations and challenges to best enhance the organizational brand.

Overview

10
10
years of professional experience

Work History

Data Engineer III

Survey Monkey
California, CA
05.2022 - Current
  • Designed and built out a flagship customer data product on Snowflake Cloud Data Warehouse allowing the Sales, Marketing, Customer Operations, Product teams to have a clear understanding and representation of our customers to drive the business insights using AWS services, Airflow, PySpark, Snowflake and data modeling concepts
  • Developed a centralized data pipeline to handle GDPR compliance and Data Anonymization using dynamic conditional masking across the Company level Enterprise data warehouse with python, DBT and SQL stored procedures, which strengthened the trust of security team by 70%
  • Executed all data initiatives by establishing a semantic layer at a daily granularity atop our dimensions and fact tables to support our CEO's transformative endeavor, this involved curating over 120 key company health metrics into a strategic dashboard tailored for C-suite executives and investors, facilitating data-driven decision-making processes
  • Ingested real-time billing data from STRIPE APIs and marketing data from Google Analytics APIs into Snowflake using Fivetran and DBT configured for recurring 15-minute updates
  • Resolved anomalies in Monte Carlo simulations through reverse engineering to validate the legitimacy of the detected anomalies, resulting in improving the data quality by 70%
  • Architected and implemented a multi-dimensional galaxy data model for product usage data in Snowflake using SQLDBM, powered by Fivetran, DBT, and Airflow
  • Built a common AWS cost optimization framework using python to terminate all EMR and EC2 instances seamlessly when not in use, which reduced over all AWS spending by 30%
  • Using GitHub Actions established a comprehensive Continuous Integration and Continuous Delivery (CI/CD) platform that empowers you to streamline and automate your build, testing, and deployment workflows
  • Created custom Airflow operators using python to interact with services like EMR, EC2, Athena, S3, DynamoDB and Snowflake which are being used with in Enterprise by ~30 teams.

Data Engineer III

NIKE
Portland, OR
03.2021 - 05.2022
  • Build and implemented the Loyalty Customer Analytics and Segmentation model on Advanced data Analytics layer on Databrick DeltaLake
  • Worked with data scientists to design and build analytics tools that fulfill R&D, product, operational and reporting needs throughout the data engineering teams
  • Created pipelines in AWS pipelines to extract, transform and load data from different sources using Databricks and scheduled the workflows using Logic Apps
  • Created aggregated datasets for Loyalty Customer Lifetime Value (CLV), Customer Retention and Growth (CRG) and Segmentation using Spark, SQL, Python and AWS
  • Implemented BetaGammaFitter (BGF), GammaGammaFitter (GGF) models and RFMA models.

Data Engineer III

Florida Blue
Jacksonville, FL
03.2017 - 02.2021
  • Implemented a robust data pipeline using Spark to ingest data from sources, including DB2 and SQL Server databases
  • Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive Applied complex transformations rules to the data within the Spark environment
  • Use Scala with frameworks like Apache Spark to process large-scale data sets efficiently
  • Spark has a Scala API that is widely used for distributed data processing
  • Loaded processed data efficiently into Netezza for further analysis and reporting
  • Developed procedures to extract data from the final processing tables
  • Utilized SFTP to transmit data to external stakeholders, including Welltok, DHCS, and other vendors
  • ETL pipeline through well-organized shell scripts
  • Scheduled and monitored the entire ETL pipeline using Control-M
  • Automated solutions to identify and categorize members based on age using defined business rules, distinguishing between those over 65 and under 65
  • Implemented logic to apply age-related filters on data, ensuring accurate classification of members within the data sets.

Data Engineer

Deutsche Bank
Jacksonville, FL
06.2014 - 03.2017
  • Application design, development, customization, and implementation using PySpark transformation in AWS environment
  • Created a master validation script using python to check for missing values and duplicate values in enterprise data warehouse tables by comparing it with source database which yielded 50% increase in data quality
  • Build SalesDataProduct1 & 2 for commercial analytics to plan their marketplace inventory
  • Designed and developed data pipelines using bash scripting to implement SCD1 on dimension tables and SCD2 on fact tables to maintain historical data with the current data in enterprise data warehouse
  • Transformed raw Kafka data into a readable format, enhancing data accessibility and usability
  • Enhanced existed pipelines for stage1 and stage 2 environments for downstream consumers
  • Scheduling ETL workflows with Airflow, worked on AWS EMR clusters Developed pipelines in Spark with Python modules to ingest and process data from Kafka, ensuring seamless integration.

Education

MCA -

JNTU University

Skills

  • Python
  • Bash Scripting
  • SQL
  • Scala
  • Data Engineering: Apache Airflow, DBT, Docker, Shell Scripting, Control-M and MWAA
  • Analytics and Visualization: Tableau, Power BI, Quicksight, Google Analytics
  • Databases: Snowflake, Amazon Redshift, PostgreSQL, sql-server, MongoDB
  • Big Data: Spark, PySpark, Hadoop, Hive, Kafka
  • Tools: Fivetran, HighTouch, Monte Carlo, SqlDBM, Talendm, IBM Infosphere DataStage, Git, GitHub, Bigbucket, JIRA
  • Cloud: S3, EC2, EMR, Athena, Redshift, DynamoDB, Kinesis, Glue
  • Scripting Languages:- Shell Script and Java Script
  • Data Modeling:- SQL DBM
  • Data Security:- Encryptions,RBAC and Conditional Data Anonymization

Timeline

Data Engineer III

Survey Monkey
05.2022 - Current

Data Engineer III

NIKE
03.2021 - 05.2022

Data Engineer III

Florida Blue
03.2017 - 02.2021

Data Engineer

Deutsche Bank
06.2014 - 03.2017

MCA -

JNTU University
Jana Bhogireddy