Summary

Overview

Work History

Education

Skills

Websites

Certification

Work Honors And Awards

Timeline

Krishna Papana

Houston,TX

Summary

Overall 3+ years of overall experience as Data Engineer in design, development Data-driven and analytical innovator with excellent problem-solving skills, business acumen, and passion for contributing to large-scale data ingestion and research initiatives. Prolific and intuitive collaborator with track record partnering with stakeholders, web developers, and database architects.

Overview

years of professional experience

Certification

Work History

Data Engineer

Dish Networks

Englewood, CO

08.2021 - Current

Involved in creating data ingestion pipelines for collecting customer data and subjects’ data from various external sources like FTP Servers and S3 buckets
Involved in migrating existing Teradata Datawarehouse to AWS S3 based data lakes
Involved in migrating existing traditional ETL jobs to Pyspark and Hive Jobs on new cloud data lake
Wrote complex spark applications for performing various de-normalization of the datasets and creating a unified data analytics layer for downstream teams
Developed series of data ingestion jobs for collecting the data from multiple channels and external applications in Scala
Primarily responsible for fine-tuning long running spark applications, writing custom spark udfs, troubleshooting failures etc
Involved in building a real time pipeline using Kafka and Spark streaming for delivering event messages to downstream application team from an external rest-based application
Involved in creating Hive scripts for performing adhoc data analysis required by the business teams
Worked extensively on migrating on prem workloads to AWS Cloud
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop
Implemented Spark using Scala and Spark SQL for faster testing and processing of data
Migrated Map-reduce jobs to Spark applications built on Scala and integrated with Apache Phoenix and HBase
Worked on writing different RDD (Resilient Distributed Datasets) transformations and actions using Scala and Pyspark
Worked on utilizing AWS cloud services like S3, EMR, Redshift, Athena and Glue Metastore.

Data Software Engineer

Citi Group

Hyderabad, India

06.2020 - 07.2021

Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena
Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services
Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams
Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud
Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production
Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines
Worked on writing different RDD (Resilient Distributed Datasets) transformations and actions using Scala Spark
Worked on full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption
Developed Spark scripts by using Python shell commands as per the requirement
Implemented Spark using Scala and Spark SQL for faster testing and processing of data
Worked on automating the infrastructure setup, launching and termination EMR clusters etc
Created Hive external tables on top of datasets loaded in AWS S3 buckets and created various hive scripts to produce series of aggregated datasets for downstream analysis
Build real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift
Worked on creating Kafka producers using Kafka Java Producer Api for connecting to external Rest live stream application and producing messages to Kafka topic.

Education

Master’s in information technology -

Belhaven University

04.2023

Bachelor of Engineering in Computer Science -

Vel Tech University

05.2020

Skills

Bigdata Ecosystem : Spark, Hive, HDFS, Yarn, Impala, HBase, Sqoop, Airflow, Kafka

Hadoop Distribution : Hortonworks, Cloudera, AWS EMR

NO SQL Databases : HBase, Cassandra, MongoDB

Cloud Services : AWS S3, EMR, Redshift, Athena, Glue Meta store

Programming Languages : Java, Scala, and Python

Databases : Oracle, MySQL, PostgreSQL, Teradata

Build Tools : Jenkins, Maven, ANT

Development methodologies : Agile/Scrum

Websites

https://www.linkedin.com/in/krishna-kousik-reddy-80a178261

Certification

Azure DP-203 Certified

Work Honors And Awards

ON THE SPOT, May 2022 Dish Networks Given to an employee for their hard work and excellence
BEST TEAM, May2021, Citi group For remarkable team performance

Timeline

Data Engineer

Dish Networks

08.2021 - Current

Data Software Engineer

Citi Group

06.2020 - 07.2021

Master’s in information technology -

Belhaven University

Bachelor of Engineering in Computer Science -

Vel Tech University