Summary

Overview

Work History

Education

Skills

Timeline

Jana Bhogireddy

Jacksonville,FL

Summary

Overall 9 years of professional IT experience which includes 5+ years of experience in Big data ecosystem & related technologies like Hadoop HDFS, Spark, Kafka Streaming, Scala, Python, Apache Pig, Hive, Spark, Sqoop, HBase, Oozie, AWS cloud and 4+ years in Data warehouse Implementation. Committed job seeker with a history of meeting company needs with consistent and organized practices. Skilled in working under pressure and adapting to new situations and challenges to best enhance the organizational brand.

Overview

years of professional experience

Work History

Data Engineer III

Survey Monkey

California, CA

05.2022 - Current

Designed and built out a flagship customer data product on Snowflake Cloud Data Warehouse allowing the Sales, Marketing, Customer Operations, Product teams to have a clear understanding and representation of our customers to drive the business insights using AWS services, Airflow, PySpark, Snowflake and data modeling concepts
Developed a centralized data pipeline to handle GDPR compliance and Data Anonymization using dynamic conditional masking across the Company level Enterprise data warehouse with python, DBT and SQL stored procedures, which strengthened the trust of security team by 70%
Executed all data initiatives by establishing a semantic layer at a daily granularity atop our dimensions and fact tables to support our CEO's transformative endeavor, this involved curating over 120 key company health metrics into a strategic dashboard tailored for C-suite executives and investors, facilitating data-driven decision-making processes
Ingested real-time billing data from STRIPE APIs and marketing data from Google Analytics APIs into Snowflake using Fivetran and DBT configured for recurring 15-minute updates
Resolved anomalies in Monte Carlo simulations through reverse engineering to validate the legitimacy of the detected anomalies, resulting in improving the data quality by 70%
Architected and implemented a multi-dimensional galaxy data model for product usage data in Snowflake using SQLDBM, powered by Fivetran, DBT, and Airflow
Built a common AWS cost optimization framework using python to terminate all EMR and EC2 instances seamlessly when not in use, which reduced over all AWS spending by 30%
Using GitHub Actions established a comprehensive Continuous Integration and Continuous Delivery (CI/CD) platform that empowers you to streamline and automate your build, testing, and deployment workflows
Created custom Airflow operators using python to interact with services like EMR, EC2, Athena, S3, DynamoDB and Snowflake which are being used with in Enterprise by ~30 teams.

Data Engineer III

NIKE

Portland, OR

03.2021 - 05.2022

Build and implemented the Loyalty Customer Analytics and Segmentation model on Advanced data Analytics layer on Databrick DeltaLake
Worked with data scientists to design and build analytics tools that fulfill R&D, product, operational and reporting needs throughout the data engineering teams
Created pipelines in AWS pipelines to extract, transform and load data from different sources using Databricks and scheduled the workflows using Logic Apps
Created aggregated datasets for Loyalty Customer Lifetime Value (CLV), Customer Retention and Growth (CRG) and Segmentation using Spark, SQL, Python and AWS
Implemented BetaGammaFitter (BGF), GammaGammaFitter (GGF) models and RFMA models.

Data Engineer III

Florida Blue

Jacksonville, FL

03.2017 - 02.2021

Implemented a robust data pipeline using Spark to ingest data from sources, including DB2 and SQL Server databases
Spark code using Scala and Spark-SQL/Streaming for faster processing of data
Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive Applied complex transformations rules to the data within the Spark environment
Use Scala with frameworks like Apache Spark to process large-scale data sets efficiently
Spark has a Scala API that is widely used for distributed data processing
Loaded processed data efficiently into Netezza for further analysis and reporting
Developed procedures to extract data from the final processing tables
Utilized SFTP to transmit data to external stakeholders, including Welltok, DHCS, and other vendors
ETL pipeline through well-organized shell scripts
Scheduled and monitored the entire ETL pipeline using Control-M
Automated solutions to identify and categorize members based on age using defined business rules, distinguishing between those over 65 and under 65
Implemented logic to apply age-related filters on data, ensuring accurate classification of members within the data sets.

Data Engineer

Deutsche Bank

Jacksonville, FL

06.2014 - 03.2017

Application design, development, customization, and implementation using PySpark transformation in AWS environment
Created a master validation script using python to check for missing values and duplicate values in enterprise data warehouse tables by comparing it with source database which yielded 50% increase in data quality
Build SalesDataProduct1 & 2 for commercial analytics to plan their marketplace inventory
Designed and developed data pipelines using bash scripting to implement SCD1 on dimension tables and SCD2 on fact tables to maintain historical data with the current data in enterprise data warehouse
Transformed raw Kafka data into a readable format, enhancing data accessibility and usability
Enhanced existed pipelines for stage1 and stage 2 environments for downstream consumers
Scheduling ETL workflows with Airflow, worked on AWS EMR clusters Developed pipelines in Spark with Python modules to ingest and process data from Kafka, ensuring seamless integration.

Education

MCA -

JNTU University

Skills

Python
Bash Scripting
SQL
Scala
Data Engineering: Apache Airflow, DBT, Docker, Shell Scripting, Control-M and MWAA
Analytics and Visualization: Tableau, Power BI, Quicksight, Google Analytics
Databases: Snowflake, Amazon Redshift, PostgreSQL, sql-server, MongoDB

Big Data: Spark, PySpark, Hadoop, Hive, Kafka
Tools: Fivetran, HighTouch, Monte Carlo, SqlDBM, Talendm, IBM Infosphere DataStage, Git, GitHub, Bigbucket, JIRA
Cloud: S3, EC2, EMR, Athena, Redshift, DynamoDB, Kinesis, Glue
Scripting Languages:- Shell Script and Java Script
Data Modeling:- SQL DBM
Data Security:- Encryptions,RBAC and Conditional Data Anonymization

Timeline

Data Engineer III

Survey Monkey

05.2022 - Current

Data Engineer III

NIKE

03.2021 - 05.2022

Data Engineer III

Florida Blue

03.2017 - 02.2021

Data Engineer

Deutsche Bank

06.2014 - 03.2017

MCA -

JNTU University

Jana Bhogireddy

Summary

Overview

Work History

Data Engineer III

Data Engineer III

Data Engineer III

Data Engineer

Education

MCA -

Skills

Timeline

Data Engineer III

Data Engineer III

Data Engineer III

Data Engineer

MCA -

Similar Profiles

Shanu ChoudharyShanu Choudhary

Ashish KumarAshish Kumar

LAVANYA YARAGALALAVANYA YARAGALA

Jyoti VermaJyoti Verma