Summary
Overview
Work History
Education
Skills
Websites
References
Timeline
Generic
Pallavolu Jayakumar

Pallavolu Jayakumar

Charlotte,NC

Summary

Highly skilled Big Data Engineer with over 9+ years of experience in developing, implementation and optimization of data plumbing systems and ETL processes with in-depth understanding of business and IT requirements to streamline administration and internal processes, resulting in enhanced automation and operational efficiency. Collaborated with data analysts and leads to develop data pipelines to increase data availability in Hadoop environment. Strong analytical, leadership, and communication skills, with a commitment to excellence.

Overview

12
12
years of professional experience

Work History

Java/Big Data Developer

Mitchell Martin Inc.
08.2023 - Current
  • Successfully implemented a POC to migrate the TIAA Enterprise applications consisting of 2 major projects built by Jenkins Groot Controller to a Federated CI Pipeline
  • Assisted team with cloud optimization in scaling up/down of the Spark Dynamic allocution of executors
  • Set up execution pipeline in Electric flow to test the automatic deployment and execution of jobs
  • Created validation scripts to test connection between Hadoop cloud environment on AWS to pull data from Snowflakes using Sqoop
  • Supported team with validation on Prod-fix environment with migration from On-Prem to AWS cloud environment
  • Developed reports from hive table for visualization purposes and presented the results in Tableau.

Application Architect

Mitchell Martin Inc.
03.2022 - 08.2023
  • Developed Scala/Spark code for Spark SQL transformation on hive tables and optimized performance
  • Partnered with LOB contacts to design data pipelines in Hadoop ecosystem for report generation and predictive modeling
  • Extensively worked with Autosys tool for workflow scheduling and actions triggering
  • Leveraged Sqoop ingestion framework for reading data from RDBMs and loading into hive tables
  • Developed and implemented data pipelines to improve data quality, resulting an increase in data accuracy
  • Partnered with Line of Business (LOB) contacts to create the flow of data from source systems to the Strategy Decision Engine (SDE), the brain for Collections and Recovery module.
  • Participated in designing data pipelines in Hadoop ecosystem, along with Job scheduling tools (AutoSys) and ETL tools (IBM DataStage) to support rapidly growing business processes for report generation and predictive/prescriptive modeling for campaign decision engines.

Application Architect

Mitchell Martin Inc.
10.2021 - 02.2022
  • Developed Scala/Spark code to read data from Healthcare EDC adapter to download the clinical research data in XML format through web API calls.
  • Performed data validation for the downloaded XML files using XSD to ensure the attributes match from XML to DB.
  • Flatten the downloaded XML data using required schema files and appropriate data types in spark and stored in hive tables.

Application Architect

Mitchell Martin Inc.
01.2021 - 10.2021
  • Enhanced generic file export job in Spark/java code for reading hive tables and exporting to different platforms
  • Partnered with LOB contacts for data flow design and participated in designing data pipelines in Hadoop ecosystem
  • Set up Zaloni registration file for data pull from RDBMS into Hadoop Data Lake Hive tables.
  • Experience in designing DataStage Job sequence flows to read a file from the landing zone and load it into the Oracle table.
  • Developed JUnit test classes to check data quality for the heavy transformations performed on data for the stage table load in hive.
  • Experience in working with DataStage to build data extraction from SQL Server and create flat files to be ingested into Hadoop.
  • Partnered with Line of Business (LOB) contacts to create the flow of data from source systems to the Strategy Decision Engine (SDE), the brain for Collections and Recovery module.
  • Participated in designing data pipelines in Hadoop ecosystem, along with Job scheduling tools (AutoSys) and ETL tools (IBM DataStage) to support rapidly growing business processes for report generation and predictive/prescriptive modeling for campaign decision engines.
  • Set up Zaloni registration file to perform DATA PULL from RDBMS source, Oracle and SQL Server into Hadoop Data lake Hive tables

Big Data Engineer

Populus Group
12.2019 - 06.2020
  • Hands-on experience in migrating applications between Hadoop clusters
  • Developed and automated Spark jobs with Oozie actions
  • Scheduled Oozie workflows and coordinators for Sqoop imports/exports from various sources
  • Experienced in handling Hadoop jobs in Yahoo's native cluster.
  • Identified and executed process improvements in relation to data processes.
  • Ability to understand and interpret the Machine learning code for data analytics.
  • Exposure to Linear regression and classification models.
  • Extensively worked with Yahoo's CI/CD pipeline tool, Screwdriver, for continuous delivery with YAML file support.

Hadoop Admin and Support Lead

Eniac Systems Inc.
06.2018 - 12.2019
  • Used Sqoop to ingest the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
  • Worked on Historical data ingestion and incremental load approaches to the daily batch processes using Hadoop utilities
  • Responsible for supporting all test environments and issues during warranty period post implementation
  • Involved in creating Generic Components leveraging the existing capabilities of IIS Suite
  • Used the tools/technologies such as Hadoop, Hive, Impala, Sqoop, IBM Infosphere Information Suite (DataStage), Teradata, Oracle, Star Team, Autosys, Unix/Linux Scripting.
  • Experience in writing the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Assisted app teams to set up the Prod Fix and Disaster Recovery Hadoop Environments for critical applications.
  • Successfully implemented version control of all the codes/scripts so, that all the SDLC Environments are in sync.
  • Successfully implemented the data ingestion of real time data via Kafka, into Hbase streamlined through NIFI Application.
  • Automated the Data validation process of critical data loads before sending over the data to Downstream Applications.

Hadoop/Spark Developer

Collabera
11.2017 - 06.2018
  • Published the HDFS/Hive table data to external system using the custom Kafka Producer for continuous updates
  • Developed Spark RDD transformations, actions, and Data Frames, case classes, Datasets for the required input data and performed the data transformations using Spark-Core
  • Worked on Converting Hive/SQL queries into Spark transformations using Spark RDD with Scala and Worked with Spark Context, Spark-SQL, Data Frames, Pair RDD's, and Datasets
  • Imported from several relational databases to HDFS and exported data from HDFS to RDBMS using Sqoop
  • Created Parquet Hive tables with Snappy compression, loaded data and wrote Hive queries, which will invoke and run MapReduce tasks in the backend.

Hadoop Developer

Nemo IT Solutions, Inc.
01.2017 - 09.2017
  • Developed Pyspark code to read data from Hive, group the fields and generate XML files
  • Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
  • Implemented REST call to submit the generated CDAs to vendor website
  • Implemented Impyla to support JDBC/ODBC connections for Hiveserver2
  • Enhanced the Pyspark code to replace spark with Impyla
  • Built data validation dashboard in Solr to display the message record.

Hadoop Developer

Nemo IT Solutions, Inc.
01.2016 - 12.2016
  • Evaluated Spark’s performance vs Impala on transactional data
  • Used Spark transformations and aggregations to perform min, max and average on transactional data
  • Experienced in migrating data from HiveQL to SparkSQL
  • Knowledge in using Spark Dataframes to load data in Spark Dataframes
  • Used java to develop Restful API for database Utility Project
  • Designed a data model in Cassandra(POC) for storing server performance data.

IT Analyst

Serco Global Services
12.2011 - 07.2014
  • Analyzed and prepared detailed specifications and test requirements
  • Coordinated with the Business Analyst and the Business users to understand project requirements to figure out the scope of test strategy
  • Executed test cases based on the BRD and SDD and uploaded to Quality Center
  • Involved in test case execution and creation of the bugs by using the Chromium tool
  • Performed functionality testing as per Google Testing Standards
  • Real Time & continuous follow-up with global support teams for Critical incident resolution.

Education

Master of Science in Computer Science -

University of Illinois Springfield
Springfield, IL
07.2015

Skills

  • Big Data/Hadoop Framework HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Cassandra, Spark, Impala,
  • Impyla, Streamsets, NiFi, Kafka, Zaloni
  • Hadoop Distributions Cloudera, Hortonworks
  • Languages Java, Scala, Python
  • BI Tools Data Visualization Tools (Tableau, Qlikview, Power BI, MicroStrategy)
  • Enterprise Applications MS Office Suite
  • Databases MS SQL Server 2005/2000/70, Oracle 9i/10g, Netezza, Teradata
  • Enterprise Data Warehouses EDW, RMW, ESP
  • Operating Systems Windows XP/7/8/10, Ubuntu, RHEL
  • Development Tools Eclipse, IntelliJ, Visual Studio
  • Cloud Computing Microsoft Azure, AWS
  • CI/CD Deployment tools YAML, Screwdriver, Ansible, Jenkins, GIT, Bitbucket
  • File Transfer SFTP, FTP, NDM, DTS
  • Other Tools IBM Infosphere Data Stage, Autosys, Snowflake, ElectricFlow

References

Available upon request

Timeline

Java/Big Data Developer

Mitchell Martin Inc.
08.2023 - Current

Application Architect

Mitchell Martin Inc.
03.2022 - 08.2023

Application Architect

Mitchell Martin Inc.
10.2021 - 02.2022

Application Architect

Mitchell Martin Inc.
01.2021 - 10.2021

Big Data Engineer

Populus Group
12.2019 - 06.2020

Hadoop Admin and Support Lead

Eniac Systems Inc.
06.2018 - 12.2019

Hadoop/Spark Developer

Collabera
11.2017 - 06.2018

Hadoop Developer

Nemo IT Solutions, Inc.
01.2017 - 09.2017

Hadoop Developer

Nemo IT Solutions, Inc.
01.2016 - 12.2016

IT Analyst

Serco Global Services
12.2011 - 07.2014

Master of Science in Computer Science -

University of Illinois Springfield
Pallavolu Jayakumar