Summary
Overview
Work History
Education
Skills
Websites
Certification
Work Honors And Awards
Timeline
Generic

Krishna Papana

Houston,TX

Summary

Overall 3+ years of overall experience as Data Engineer in design, development Data-driven and analytical innovator with excellent problem-solving skills, business acumen, and passion for contributing to large-scale data ingestion and research initiatives. Prolific and intuitive collaborator with track record partnering with stakeholders, web developers, and database architects.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Data Engineer

Dish Networks
08.2021 - Current
  • Involved in creating data ingestion pipelines for collecting customer data and subjects’ data from various external sources like FTP Servers and S3 buckets
  • Involved in migrating existing Teradata Datawarehouse to AWS S3 based data lakes
  • Involved in migrating existing traditional ETL jobs to Pyspark and Hive Jobs on new cloud data lake
  • Wrote complex spark applications for performing various de-normalization of the datasets and creating a unified data analytics layer for downstream teams
  • Developed series of data ingestion jobs for collecting the data from multiple channels and external applications in Scala
  • Primarily responsible for fine-tuning long running spark applications, writing custom spark udfs, troubleshooting failures etc
  • Involved in building a real time pipeline using Kafka and Spark streaming for delivering event messages to downstream application team from an external rest-based application
  • Involved in creating Hive scripts for performing adhoc data analysis required by the business teams
  • Worked extensively on migrating on prem workloads to AWS Cloud
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data
  • Migrated Map-reduce jobs to Spark applications built on Scala and integrated with Apache Phoenix and HBase
  • Worked on writing different RDD (Resilient Distributed Datasets) transformations and actions using Scala and Pyspark
  • Worked on utilizing AWS cloud services like S3, EMR, Redshift, Athena and Glue Metastore.

Data Software Engineer

Citi Group
06.2020 - 07.2021
  • Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena
  • Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services
  • Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams
  • Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud
  • Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production
  • Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines
  • Worked on writing different RDD (Resilient Distributed Datasets) transformations and actions using Scala Spark
  • Worked on full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption
  • Developed Spark scripts by using Python shell commands as per the requirement
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data
  • Worked on automating the infrastructure setup, launching and termination EMR clusters etc
  • Created Hive external tables on top of datasets loaded in AWS S3 buckets and created various hive scripts to produce series of aggregated datasets for downstream analysis
  • Build real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift
  • Worked on creating Kafka producers using Kafka Java Producer Api for connecting to external Rest live stream application and producing messages to Kafka topic.

Education

Master’s in information technology -

Belhaven University
04.2023

Bachelor of Engineering in Computer Science -

Vel Tech University
05.2020

Skills

    Bigdata Ecosystem : Spark, Hive, HDFS, Yarn, Impala, HBase, Sqoop, Airflow, Kafka

    Hadoop Distribution : Hortonworks, Cloudera, AWS EMR

    NO SQL Databases : HBase, Cassandra, MongoDB

    Cloud Services : AWS S3, EMR, Redshift, Athena, Glue Meta store

    Programming Languages : Java, Scala, and Python

    Databases : Oracle, MySQL, PostgreSQL, Teradata

    Build Tools : Jenkins, Maven, ANT

    Development methodologies : Agile/Scrum

Certification

Azure DP-203 Certified

Work Honors And Awards

  • ON THE SPOT, May 2022 Dish Networks Given to an employee for their hard work and excellence
  • BEST TEAM, May2021, Citi group For remarkable team performance

Timeline

Data Engineer

Dish Networks
08.2021 - Current

Data Software Engineer

Citi Group
06.2020 - 07.2021

Master’s in information technology -

Belhaven University

Bachelor of Engineering in Computer Science -

Vel Tech University
Krishna Papana