Summary

Overview

Work History

Education

Skills

Timeline

Chetan Chowdary Boyapati

St Louis,MO

Summary

Results-oriented IT professional with extensive experience in all phases of the Software Development Life Cycle (SDLC). Skilled in data analysis, design, development, testing, and deployment of software systems. Strong expertise in the Hadoop ecosystem, including HDFS, Spark, MapReduce, Hive, Pig, YARN, Oozie, Sqoop, Flume, Kafka, and NoSQL databases like HBase and Cassandra. Proven ability to leverage Spark and Scala APIs for efficient data processing and comparison with Hive and SQL. Well-versed in data migration projects and proficient in both on-premises and cloud environments. Adept at creating and orchestrating data pipelines using Oozie and Airflow. Experienced in using cloud services such as Amazon EMR, S3, EC2, Redshift, Athena, Google GCS, Dataproc clusters, Airflow, BigQuery, and Logging. Solid understanding of distributed systems design, HDFS architecture, MapReduce, and Spark processing frameworks. Skilled in developing highly scalable data transformations using Spark RDDs, DataFrames, Spark SQL, and Spark Streaming. Proficient in troubleshooting Spark failures and optimizing long-running Spark applications. Excellent knowledge of SQL and proficiency in working with various databases including Oracle, MySQL, and Teradata. Strong team player with exceptional communication, analytical, presentation, and interpersonal skills. Proficient in Core Java concepts and experienced in using project management tools like JIRA, source code management with GIT, continuous integration with Jenkins, and code reviews with Crucible.

Overview

years of professional experience

Work History

Data Engineer

Verizon

12.2020 - Current

Build end-to-end data ingestion pipelines and data harmonization components for generating actionable insights for executives and channels
Design and develop daily batch processing ETL jobs for data extraction, transformation, and loading
Fine-tuned Spark jobs and achieved 40% cost reduction in Dataproc clusters on Google Cloud Platform (GCP) through optimization techniques and efficient resource allocation.
Implement high-availability ETL framework using Spark to collect data from various sources into Hadoop
Developed comprehensive BQ SQL queries to implement end-to-end data curation logic.
Created and deployed data processing jobs in Airflow by designing and configuring DAGs (Directed Acyclic Graphs) for efficient workflow management.
Successfully designed and implemented automated dataflow job within DAG to transfer daily 1 TB of data from BigQuery (BQ) to AWS ElasticSearch, ensuring seamless and efficient data synchronization between platforms.
Develop scalable Spark scripts for capturing and curating data from multiple sources
Troubleshoot production issues and coordinate with support teams for code deployment
Collaborate with team members to ensure adherence to processes, procedures, and data security protocols
Automate Spark jobs using Oozie framework and create coordinators for specific workflows
Validate curated output data through multiple comparisons with source data
Collaborate with infrastructure, network, database, application, and business intelligence teams to ensure high data quality and availability
Automate deployments in GITLAB using Jenkins CI/CD pipelines
Migrate on-premise data to Google Cloud Platform (GCP) storage service and create big queries
Utilize GCP Dataproc clusters and GCS buckets for executing Scala-based Spark jobs
Migrate customer data hub batch processing jobs from Verizon Vgrid on-premise to Google Cloud
Follow agile methodologies, utilize Verizon JIRA board for work tracking, user stories, and tasks
Escalate production data issues to platform team and provide prompt resolution
Configure and manage GCS buckets for storage and backup on GCP
Generate custom SQL queries for verifying dependencies in daily, weekly, and monthly jobs
Develop Spark code and Spark SQL/Streaming scripts for efficient data processing and testing
Monitor and support daily, weekly, and monthly job schedules, addressing failures and issues
Optimize performance of dashboards and workbooks in Tableau.

Hadoop Developer

IBM

05.2018 - 11.2019

Worked in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
Gathered requirements and translated business needs into technical design for Hadoop and Big Data solutions
Imported and exported data between HDFS and databases using Sqoop
Developed data pipeline using Flume, Sqoop, Pig, and Java MapReduce for ingesting behavioral data into HDFS
Utilized Maven for building and deploying MapReduce programs to cluster
Created customized BI tools for query analytics using HiveQL
Implemented optimized joins and partitioning in Hive to process and analyze data efficiently
Leveraged Oozie workflow engine to manage interdependent Hadoop jobs and automate various types of jobs
Designed and implemented Cassandra NoSQL database for high-volume user profile data
Migrated high-volume OLTP transactions from Oracle to Cassandra
Developed MapReduce programs with chained mappers to create data pipelines
Utilized Pig as ETL tool for data transformations, event joins, filters, and pre-aggregations
Optimized performance of Hive and Pig jobs through tuning and optimization techniques
Developed Oozie job flows to automate data extraction from warehouses and weblogs.
Performed data cleaning on unstructured information using various Hadoop tools

Education

Master of Science - Computer Science

Saint Louis University

2021

Bachelor of Technology - Computer Science and Engineering

JNTUH

Hyderabad

2019

Skills

Big Data Tools: Hadoop Ecosystem (MapReduce, Spark 23, Airflow 1108, HBase 12, Hive 23, Pig 017, Sqoop 14, Kafka 101, Oozie 43, Hadoop 30)
Programming Languages: SQL, PL/SQL, UNIX, Scala
Cloud Platform: AWS, Google Cloud
Databases: Oracle 12c/11g, Teradata R15/R14

OLAP Tools: Tableau, SSAS, Business Objects
ETL/Data Warehouse Tools: Informatica 96/91, Tableau
Operating Systems: Windows, Unix, Sun Solaris

Timeline

Data Engineer

Verizon

12.2020 - Current

Hadoop Developer

IBM

05.2018 - 11.2019

Master of Science - Computer Science

Saint Louis University

Bachelor of Technology - Computer Science and Engineering

JNTUH

Chetan Chowdary Boyapati

Summary

Overview

Work History

Data Engineer

Hadoop Developer

Education

Master of Science - Computer Science

Bachelor of Technology - Computer Science and Engineering

Skills

Timeline

Data Engineer

Hadoop Developer

Master of Science - Computer Science

Bachelor of Technology - Computer Science and Engineering

Similar Profiles

Mohd SulemaanMohd Sulemaan

James W ScoffieldJames W Scoffield

Marcus EadesMarcus Eades

Marjorie Haro BarretoMarjorie Haro Barreto

Taejona HarrisTaejona Harris