Summary
Overview
Work History
Education
Skills
Timeline
Generic

Chetan Chowdary Boyapati

St Louis,MO

Summary

Results-oriented IT professional with extensive experience in all phases of the Software Development Life Cycle (SDLC). Skilled in data analysis, design, development, testing, and deployment of software systems. Strong expertise in the Hadoop ecosystem, including HDFS, Spark, MapReduce, Hive, Pig, YARN, Oozie, Sqoop, Flume, Kafka, and NoSQL databases like HBase and Cassandra. Proven ability to leverage Spark and Scala APIs for efficient data processing and comparison with Hive and SQL. Well-versed in data migration projects and proficient in both on-premises and cloud environments. Adept at creating and orchestrating data pipelines using Oozie and Airflow. Experienced in using cloud services such as Amazon EMR, S3, EC2, Redshift, Athena, Google GCS, Dataproc clusters, Airflow, BigQuery, and Logging. Solid understanding of distributed systems design, HDFS architecture, MapReduce, and Spark processing frameworks. Skilled in developing highly scalable data transformations using Spark RDDs, DataFrames, Spark SQL, and Spark Streaming. Proficient in troubleshooting Spark failures and optimizing long-running Spark applications. Excellent knowledge of SQL and proficiency in working with various databases including Oracle, MySQL, and Teradata. Strong team player with exceptional communication, analytical, presentation, and interpersonal skills. Proficient in Core Java concepts and experienced in using project management tools like JIRA, source code management with GIT, continuous integration with Jenkins, and code reviews with Crucible.

Overview

5
5
years of professional experience

Work History

Data Engineer

Verizon
12.2020 - Current
  • Build end-to-end data ingestion pipelines and data harmonization components for generating actionable insights for executives and channels
  • Design and develop daily batch processing ETL jobs for data extraction, transformation, and loading
  • Fine-tuned Spark jobs and achieved 40% cost reduction in Dataproc clusters on Google Cloud Platform (GCP) through optimization techniques and efficient resource allocation.
  • Implement high-availability ETL framework using Spark to collect data from various sources into Hadoop
  • Developed comprehensive BQ SQL queries to implement end-to-end data curation logic.
  • Created and deployed data processing jobs in Airflow by designing and configuring DAGs (Directed Acyclic Graphs) for efficient workflow management.
  • Successfully designed and implemented automated dataflow job within DAG to transfer daily 1 TB of data from BigQuery (BQ) to AWS ElasticSearch, ensuring seamless and efficient data synchronization between platforms.
  • Develop scalable Spark scripts for capturing and curating data from multiple sources
  • Troubleshoot production issues and coordinate with support teams for code deployment
  • Collaborate with team members to ensure adherence to processes, procedures, and data security protocols
  • Automate Spark jobs using Oozie framework and create coordinators for specific workflows
  • Validate curated output data through multiple comparisons with source data
  • Collaborate with infrastructure, network, database, application, and business intelligence teams to ensure high data quality and availability
  • Automate deployments in GITLAB using Jenkins CI/CD pipelines
  • Migrate on-premise data to Google Cloud Platform (GCP) storage service and create big queries
  • Utilize GCP Dataproc clusters and GCS buckets for executing Scala-based Spark jobs
  • Migrate customer data hub batch processing jobs from Verizon Vgrid on-premise to Google Cloud
  • Follow agile methodologies, utilize Verizon JIRA board for work tracking, user stories, and tasks
  • Escalate production data issues to platform team and provide prompt resolution
  • Configure and manage GCS buckets for storage and backup on GCP
  • Generate custom SQL queries for verifying dependencies in daily, weekly, and monthly jobs
  • Develop Spark code and Spark SQL/Streaming scripts for efficient data processing and testing
  • Monitor and support daily, weekly, and monthly job schedules, addressing failures and issues
  • Optimize performance of dashboards and workbooks in Tableau.

Hadoop Developer

IBM
05.2018 - 11.2019
  • Worked in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
  • Gathered requirements and translated business needs into technical design for Hadoop and Big Data solutions
  • Imported and exported data between HDFS and databases using Sqoop
  • Developed data pipeline using Flume, Sqoop, Pig, and Java MapReduce for ingesting behavioral data into HDFS
  • Utilized Maven for building and deploying MapReduce programs to cluster
  • Created customized BI tools for query analytics using HiveQL
  • Implemented optimized joins and partitioning in Hive to process and analyze data efficiently
  • Leveraged Oozie workflow engine to manage interdependent Hadoop jobs and automate various types of jobs
  • Designed and implemented Cassandra NoSQL database for high-volume user profile data
  • Migrated high-volume OLTP transactions from Oracle to Cassandra
  • Developed MapReduce programs with chained mappers to create data pipelines
  • Utilized Pig as ETL tool for data transformations, event joins, filters, and pre-aggregations
  • Optimized performance of Hive and Pig jobs through tuning and optimization techniques
  • Developed Oozie job flows to automate data extraction from warehouses and weblogs.
  • Performed data cleaning on unstructured information using various Hadoop tools

Education

Master of Science - Computer Science

Saint Louis University
2021

Bachelor of Technology - Computer Science and Engineering

JNTUH
Hyderabad
2019

Skills

  • Big Data Tools: Hadoop Ecosystem (MapReduce, Spark 23, Airflow 1108, HBase 12, Hive 23, Pig 017, Sqoop 14, Kafka 101, Oozie 43, Hadoop 30)
  • Programming Languages: SQL, PL/SQL, UNIX, Scala
  • Cloud Platform: AWS, Google Cloud
  • Databases: Oracle 12c/11g, Teradata R15/R14
  • OLAP Tools: Tableau, SSAS, Business Objects
  • ETL/Data Warehouse Tools: Informatica 96/91, Tableau
  • Operating Systems: Windows, Unix, Sun Solaris

Timeline

Data Engineer

Verizon
12.2020 - Current

Hadoop Developer

IBM
05.2018 - 11.2019

Master of Science - Computer Science

Saint Louis University

Bachelor of Technology - Computer Science and Engineering

JNTUH
Chetan Chowdary Boyapati