Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Pavan Kondepudi

Pavan Kondepudi

Aurora,IL

Summary

Experienced Enterprise Data Architect with solid background in designing, creating, and managing large-scale data architectures. Strengths include thorough knowledge of database structures, systems planning, and implementation coupled with skills in strategic data analysis and modeling. Known for effective leadership in directing teams towards successful project completion while ensuring optimal data integrity and system security. Previous roles have resulted in improved efficiency through streamlining processes and implementing innovative solutions within the enterprise architecture framework.

  • 10+ years of IT experience in Big Data/Hadoop/Azure Development, with special expertise in System Analysis, Architecture, Design, Development, and Implementation.
  • Project Management, Project Estimation, Cost Control, Cost Management, and Cost Planning.
  • Collaborate with Scrum Team, Project Managers, and business owner to develop the application.
  • Hands-on experience using Hortonworks Distribution of Hadoop (2.4 and 2.6) and Microsoft Azure.
  • Extensive experience in working with HDFS, MapReduce, Pig, Hive, Sqoop, Flume, Oozie, Zookeeper, Kafka, Spark, HUE, HBase, Ambari, and Phoenix.
  • Experience in data processing, ingestion, storage, querying, and analysis using Spark SQL and Spark Streaming.
  • Design, implement, and manage operations framework (security, automation, incident management, monitoring, and notification) for big data applications.
  • Provide design and technical mentorship to teams and support management in identifying and implementing competency development measures.
  • Experience in Hadoop On-prem to Microsoft Azure migration project.
  • Experienced in Microsoft Azure tools (ADLS, Synapse Analytics, Spark Pool, and Synapse Analytics Workspace).
  • Assist the development team in identifying the root cause of slow-performing jobs and queries.
  • Developed data pipelines using Spark for real-time streaming to store data into HDFS, Hive, and HBase from Kafka.
  • Experience in tuning performance by using Partitioning and Bucketing in Hive.
  • Experience on Talend Open Studio 6.4 and Talend Administration Center (TAC), GitHub, and Jenkins.
  • Experience in implementing Spark using Scala to perform complex data transformations and converting Hive/SQL queries into Spark transformations using Scala.
  • Worked with Spark Streaming and Kafka to load the sensor event data into HDFS.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked on using different file formats like Sequence, AVRO, ORC files, Parquet files, and CSV using different compression techniques.
  • Experience in importing and exporting data from various databases like Teradata, Oracle, and DB2 into HDFS using Sqoop.
  • Experience in creating subscriptions in IBM InfoSphere Data Replication (IIDR) to flow data from DB2 and Oracle to Kafka.
  • Loading data from different sources (database and files) into Hive, Kafka using Talend tool.
  • Experience in developing MapReduce programs using Java and Scala.
  • Knowledge of scheduling and monitoring tools like Oozie, Control-M, and Zena.
  • An excellent team player and self-starter with good communication skills and proven abilities to finish tasks before target deadlines.
  • Extensive knowledge in Software Development Life Cycle, along with software methodologies like Waterfall and Agile.
  • Expertise in Unit Testing, Functional Testing, System Testing, Integration Testing, Regression Testing, User Acceptance Testing (UAT), and Performance Testing.
  • Excellent communication, analytical skills, and flexibility to learn new tools and technologies to meet the needs of the organization towards the company's success.

Overview

12
12
years of professional experience
1
1
Certification

Work History

Enterprise Data Architect

Health Care Service Corporation, HCSC
Chicago, IL
09.2022 - Current
  • Developed a data architecture strategy to ensure the organization's long-term success.
  • Analyzed current and future data needs, identified potential solutions, and created architectural models.
  • Built an enterprise-wide data warehouse by integrating databases from multiple departments.
  • Designed and implemented data governance processes to ensure quality and accuracy of collected data.
  • Identified areas of improvement in existing systems to increase performance and scalability.
  • Provided technical guidance to developers on best practices for developing high performing queries against large datasets.
  • Conducted research on emerging technologies, industry trends, standards, products, services, protocols, and architectures related to data management.

Interim Data Engineering Manager

Health Care Service Corporation, HCSC
Chicago, IL
06.2022 - 06.2023
  • Supporting multiple product teams with Hadoop, Confluent Kafka, Azure Synapse, Azure Databricks, and Denodo Platform needs.
  • Collaborate with Scrum Teams, Project Managers, and Business owners to platform intake for future projects.
  • Helping team with Platform and technical issue and unblocking with any dependencies.
  • Part of Hiring Team members and Resource allocations.
  • Doing Project Management, Project Estimation, Cost Control, Cost Management, and Cost Planning.

Sr Application Architect

Health Care Service Corporation, HCSC
Chicago, IL
08.2019 - 09.2022
  • Create design and technical architecture for Microsoft Azure and Hadoop Migration to Azure.
  • Create design and technical architecture for Big Data Batch, Streaming, and Near Real-Time Batch applications.
  • Act as a design and technical subject matter expert and provide guidance to scrum teams and help resolve design and technical roadblocks in setting up the development environment.
  • Involved in end-to-end ownership for solution components that ingest data from various internal and external sources.
  • Lead the team in infrastructure setup phases and performance tuning activities of the system.
  • Delivering the applications within the allocated timeline by working with all the required parties and following design best practices and tools.
  • Helping to maintain and upgrade enterprise solutions on the big data analytics platform.
  • Prepare design document and technical document, which helps in development of the proposed architectural solution.
  • Design and manage automation scripts, and schedule them with Enterprise Scheduler.

Lead Data Engineer

Health Care Service Corporation, HCSC
Chicago, IL
08.2017 - 08.2019
  • Analyzing the business requirements thoroughly from the Business Partners.
  • Developed Hive HQL to process XML and JSON files by using XML and JSON SerDes.
  • Experience with Talend Open Studio 6.4 and Talend Administration Center (TAC).
  • Extensively created mappings in Talend, using different components.
  • Created Talend Batch job to read the metadata from DB2 and to create Hive HQL.
  • Create a Talend Batch Job to produce and consume the data from Kafka.
  • Build Talend Batch Job to read and load data to Hive.
  • Experience in monitoring and scheduling Zena and Job Conductor (TAC) and using UNIX Scripting.
  • Performed data ingestion from multiple internal clients using Apache Kafka.
  • Worked on custom Pig Loaders and storage classes to work with a variety of data formats, such as JSON and XML file formats.
  • Performed benchmarking of NoSQL databases, such as HBase.
  • Created a data model for structuring and storing the data efficiently. Implemented partitioning and bucketing of tables in Hive.
  • Analyzed data with discrepancies through error files and log files for further data processing and cleansing.
  • Integrated Apache Spark with Kafka to perform web analytics. Uploaded clickstream data from Kafka to HDFS, HBase, and Hive by integrating with Spark.
  • Created, altered, and deleted topics using Kafka Queues when required with varying.
  • Using HIVE Mapside/Skew join queries to join multiple tables of a source system and load them into Elastic Search Tables.
  • Developed Java custom code for IBM InfoSphere Data Replication (IIDR).
  • Build a subscription to read the messages from DB2 logs and produce the data to Kafka using IBM InfoSphere Data Replication (IIDR).
  • Developed Unix script to monitor the Spark streaming jobs.
  • Solved performance issues in Hive scripts with understanding of Joins, Group and aggregation, and how it translates to Elastic MapReduce jobs.
  • Developed Scala code to import metadata into Hive and migrated existing tables and applications to work on Hive.
  • Involved in configuring Phoenix Tables on HBase Tables.
  • Proficient in data modeling with Hive partitioning, indexing, bucketing, and other optimization techniques in Hive.
  • Build pipeline, read the messages from IBM MQ to HDFS using Flume.
  • Used Flume to create FANIN and FANOUT multiplexing flows and custom interceptors for data conversion, along with Flume.
  • Build pipeline, read the messages from Kafka, and store them in HBase using Spark.
  • Involved in configuring Spark environment to produce and consume data from Kafka, and developed a pipeline between Kafka and HDFS using Scala.
  • Implemented Spark using Scala and SparkSQL for faster processing of data.
  • Performed interactive querying and aggregate functions using Spark SQL.
  • Responsible for Spark streaming configuration based on the type of input.
  • Involved in Hadoop cluster upgrade (2.4 to 2.6) without any impact on running jobs and data with previous configuration.
  • Responsible for Production On-Call Support.
  • Development activities have been carried out by using Agile Methodologies.

Big Data Hadoop Developer

Solvent Software
Hyderabad, India
07.2012 - 06.2015
  • Analyzing the business requirements thoroughly from the Business Partners.
  • Used Sqoop to import the data from DB2 into HDFS and loaded it to Hive.
  • Developed multiple MapReduce jobs for data cleaning.
  • Worked on Pig, MapReduce, and Sqoop to develop a data pipeline for moving customer behavioral data and transaction histories into HDFS for further analysis.
  • Implemented Avro and Parquet data formats for Apache Hive computations to handle custom business requirements.
  • Implementing different performance optimization techniques, such as using distributed cache for small datasets, partitioning and bucketing in Hive, and doing map side joins, etc.
  • Developed UDFs to implement business logic in Hadoop.
  • Created shared container and after-job subroutine to record each job's start time and end time to log in the Audit tables.
  • Design of data transformation layer using MapReduce and PIG scripts. (Use cases, design documents, DW design using Hive, etc.)
  • Proficient in data modeling with Hive partitioning, indexing, bucketing, and other optimization techniques in Hive.
  • Creating Hive UDFs in Java, compiling them into JARs, and adding them to the HDFS, and executing them with Hive queries.
  • Involved in Hadoop cluster task like adding and removing nodes without any effect on running jobs and data.
  • Involved in the Spring Boot framework, configured with Kafka.
  • Performed data ingestion from multiple internal clients using Apache Kafka and Flume.
  • Configured and developed the code to Kafka and Flume to write data onto HDFS.
  • Tested Apache Tez, an extensible framework for building high-performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Developed workflow in Oozie and ZooKeeper to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Installed Oozie workflow engine to run multiple Hive, Shell Script, Sqoop, Pig, and Java jobs.
  • Setting up and managing Kafka for stream processing, and broker and topic configuration and creation.
  • Responsible for loading data from UNIX system to HDFS, configuring Hive, and writing Hive UDFs.
  • Workflows developed in Oozie to automate tasks of loading the data into HDFS and preprocessing with Pig.
  • Development activities have been carried out by using waterfall methodologies.

Education

Master’s in Computer Science - Data Processing Technology

Silicon Valley University
San Jose, CA
12-2017

Bachelor of Technology - Electronics And Communication Engineering

Pondicherry University
Puducherry, India
04-2012

Skills

Azure: ADLS, Synapse Analytics, Spark Pool, Synapse Analytics Workspace, Databricks

Big Data/Hadoop: Hadoop (10X and 20X), HDFS, YARN, MapReduce, Pig, HBase, Hive, Sqoop, Flume, Spark, Kafka, Oozie, Zookeeper, HUE, Ambari and Hortonworks, Phoenix, NiFi

Tools/Software: MyEclipse, WinSCP, Visual Studio 2013, Putty, Teradata SQL Assistant, Secure CRT, Secure FX, JIRA, GitHub, Jenkins, MyEclipse, IBM IIDR

ETL Tools: Talend 64, Talend 73, TAC Server

Programming: PIG, Linux, Unix Shell Scripting, SQL, PL/SQL, Java, Scala, Python

Databases: HBase, Oracle 11g/10g, SQL, MySQL, Hive, Teradata, DB2
Operating Systems: Windows, Unix and Linux, Azure

Reporting Tool: Microsoft Power BI

Scheduling: Control-M, Oozie, Zena
Methodologies: Agile, Scrum and Waterfall

Certification

Microsoft Azure Solutions Architect

Timeline

Enterprise Data Architect

Health Care Service Corporation, HCSC
09.2022 - Current

Interim Data Engineering Manager

Health Care Service Corporation, HCSC
06.2022 - 06.2023

Sr Application Architect

Health Care Service Corporation, HCSC
08.2019 - 09.2022

Lead Data Engineer

Health Care Service Corporation, HCSC
08.2017 - 08.2019

Big Data Hadoop Developer

Solvent Software
07.2012 - 06.2015

Master’s in Computer Science - Data Processing Technology

Silicon Valley University

Bachelor of Technology - Electronics And Communication Engineering

Pondicherry University
Pavan Kondepudi