Summary

Overview

Work History

Education

Skills

Certification

Timeline

Divya Kaki

Haslet,TX

Summary

over 12 years of demonstrated experience in retail, banking, and healthcare industries, Big-Data ,Azure DataBricks, Scala, Spark & Nifi development including 6+ years of experience with Hadoop Ecosystem in development and support of different Hadoop eco-system components. Dedicated and skilled Azure Databricks Data Engineer with extensive experience in designing, implementing, and optimizing data pipelines and analytics solutions. Proficient in utilizing Azure Databricks platform to transform raw data into actionable insights. Seeking to leverage expertise in data engineering and Azure technologies to contribute to innovative projects and drive business success. Experienced with development on python/scala/Spark Hive on premise application including 4+ year experience on Python with spark in health care development. 5+ years of experience in Hadoop Developer in Big Data/Hadoop technology development

Overview

years of professional experience

Certification

Work History

Senior Data Engineer

Avaap

06.2023 - 02.2024

The Ohio Department of Medicaid’s Demographic and Expenditure Dashboard is populated using enrollment, capitation, claims, and provider data from department’s Enterprise Data Warehouse (EDW).

Responsibilities:

Designed and implemented scalable data pipelines using Azure Databricks, Apache Spark, and other related technologies to process large volumes of data efficiently.
Collaborated with cross-functional teams to understand business requirements and translate them into technical solutions.
Developed and maintained ETL processes to ingest, clean, transform, and load data from various sources into Azure Databricks.
Optimized data workflows for performance, reliability, and cost-effectiveness, leveraging cluster tuning, partitioning strategies, and caching techniques.
Implemented data governance and security measures to ensure compliance with regulatory requirements and protect sensitive data.
Conducted performance monitoring, troubleshooting, and optimization of Spark jobs to meet SLAs and maintain high availability.
Provided technical guidance and mentorship to junior team members, fostering a culture of knowledge sharing and continuous learning.
Collaborated with data scientists and analysts to support advanced analytics and machine learning initiatives on Azure Databricks platform.

Technical Skills:

Big Data Technologies: Azure Databricks, Apache Spark, Apache Hive, Apache Impala, HDFS

Programming Languages: Python, Scala, SQL,Pyspark

ETL Tools: Apache NiFi, Streamsets

Database Systems: MySQL, Oracle

Data Serialization Formats: Avro, Parquet, JSON, XML

Version Control: Git

Client Reference:

Megan Glenn

773-750-0647

E-mail: megan.glenn@avaap.com

Senior Hadoop Developer

Cotiviti Inc.

01.2022 - 05.2023

Responsiblities:

Senior Hadoop Developer at Cotiviti, involved in backend ingestion Platform for NILE application
Responsible for Development/enhancements/monitoring/automation Big Data Platform/Hadoop and creating scripts/ programs to maintaining application up to date
Designed and developed real-time and batch data processing solutions using Azure Databricks, Apache Spark Streaming, and Azure Event Hubs.
Implemented data integration solutions to connect Azure Databricks with various data sources and destinations, including Azure Data Lake Storage, Azure SQL Database.
Collaborated with DevOps team to implement CI/CD pipelines for continuous integration and delivery of data engineering artifacts.
Conducted performance tuning and optimization of Spark jobs to improve data processing throughput and reduce latency.
Designed and implemented data lake architectures to support data exploration, analytics, and reporting requirements.
Developed custom UDFs and libraries in Python and Scala to extend the functionality of Azure Databricks and Apache Spark.

Client Reference:

Vasudeva Rao Vellala

4703947735

v.vasudevarao@gmail.com

HMS Healthcare, Cloudera Hadoop Engineer

Cotiviti Inc.

01.2020 - 01.2022

Responsibilities:

● Responsibilities include extensive development on migrating the legacy Vitreous Application which resides on Postgres SQL /JAVA Platforms to on premise Hadoop includes Python, Spark as the data ingestion platforms . Nifi being the job scheduler, HIVE being the open-source data-warehouse is used for the data validation.

● Have done a POC on the API calls to the geocoder and Google matrix A2C (Access 2 Care) applications. Via NIFI, designed an end-to-end workflow to make an API call to the source to get the geocodes for the patient id’s and their Locations.

● Have developed the Pyspark code to do the complete migration from the legacy systems to Hadoop.

● Have developed the python scripts to do the validation on the quality testing and validation of the application.

● Hands on Git repo, for code push and pull commit the code to the master, azure dev Ops pipelines to deploy the code to dev and TEST.

● Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures like Hive.

● Developed workflow in NIFI to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, and Hive jobs Created framework in NIFI to pull data from RDBMS

● Developed workflows for all new and existing data sets ingested into a new data lake

● Experience in using NIFI in orchestrating the flow of data between different software systems

● Worked with NIFI admins to test & validate the NIFI new features, functionality post upgrade

● Designed workflows to send messages to Azure Service Bus

● Developed workflows to bulk extract data from multiple sources and dump to Data Lake & Hadoop HDFS

● Extensively worked in Hive UDFs and fine tuning.

● Responsible for building scalable distributed data solutions using Hadoop

● Developed Spark Code using Scala and Spark-SQL/Streaming for faster testing and processing of data Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop

● Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data in real time.

● Exploring with Spark improving the performance and optimization of the Existing algorithms in Hadoop using Spark context, Spark-SQL, data frame pair RDD's, Spark YARN.

● Involved in importing and exporting data from local/external file system and RDBMS to HDFS.

Client Reference:

Santhi Nutakki

Contact title: Tech lead

510-456-8476

Hadoop, NIFI & Scala-Java/Kafka Developer

AAP (Advanced Auto Part)

01.2020 - 01.2021

Responsibilites:
My responsibilities include Management Information System (MIS) enhancements and sustenance on the data lakes & pipelines for better insight creation from the data facts.
Ownership of the design and development of Data pipeline jobs from different source systems.
Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
Developed workflow in NIFI to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs, and Hive jobs Created framework in NiFi to pull data from RDBMS, NoSQL, and S3 ...etc sources for reusability
Developed workflows for all new and existing data sets ingested into a new data lake
Experience in using NIFI in orchestrating the flow of data between different software systems
Worked with Nifi admins to test & validate the Nifi new features, functionality post upgrade
Designed workflows to send messages to Azure Service Bus
Developed workflows to bulk extract data from multiple sources and dump to Data Lake & Hadoop Hdfs
Extensively worked in Hive UDFs and fine tuning.
Knowledge on Amazon EC2 Spot integration & and Amazon S3 integration
Responsible for building scalable distributed data solutions using Hadoop
Developed Spark Code using Scala and Spark-SQL/Streaming for faster testing and processing of data Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in real time.
Exploring with Spark improving the performance and optimization of the Existing algorithms in Hadoop using Spark context, Spark-SQL, data frame pair RDD's, Spark YARN.
Developed multiple Kafka Producers and Consumers from as per the software requirement specifications
Worked on Storm to handle the parallelization, partitioning, and retrying on failures and developed a data pipeline using Kafka and Strom to store data into HDFS.
Involved in importing and exporting data from local/external file system and RDBMS to HDFS.
Testing the AWS Kafka automation deployment process.
Develop the code for dynamic routing (orders and location) understand the schema and route the orders according to the rules match
Understanding the drools concept of rule engine.
Setting up the environment (Aws, git) and proper access to AAP repo
Code Analysis and understanding the flow of data knowing the upstream and downstream of the project.
Created cluster in the slower environment to work on the data gen part for code development
Complete the Kafka build on the AWS instance, successfully build boot-strap-host/build-host,
did successful install for the host to come up.
Created the views/stored procedures in the PostgreSQL for the retry framework for the rules to trigger in a particular order.
Involved in design of DLQ new relic, creating the topics and testing
Testing and code analysis

Java & Tibco Developer

UCSD

03.2019 - 08.2020

Java & Nifi Developer

Responsibilities:

● Designed and implemented workload distribution in NiFi using Remote Processor Groups for parallelism

● Used NiFi to schedule, automate and monitor Hive, Spark, and Shell scripts

● Created framework in NiFi to pull data from RDBMS, NoSQL, and S3 ...etc sources for reusability

● Developed workflows for all new and existing data sets ingested into a new data lake

● Experience in using NIFI in orchestrating the flow of data between different software systems

● Worked with Nifi admins to test & validate the Nifi new features, functionality post upgrade

● Designed workflows to send messages to Azure Service Bus

● Developed workflows to bulk extract data from multiple sources and dump to Data lake & Hadoop Hdfs

● Developing and involved in enhancements for web applications using Core Java, J2EE, JSP, Servlets, JDBC, Struts, Spring, Hibernate, JMS and XML

● Implemented reusable NiFi flows to process XML, CSV, and Log, etc files and store them into Hive, HDFS, HBase

● scripts to improve performance

● Good experience in writing Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig, and Hive.

● Involved in loading data from the edge node to HDFS using shell scripting.

● Configured Oracle Database to store Hive metadata.

● Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.

Data Engineer

Pepsico

06.2019 - 02.2020

Key responsibilities:

● Designed workflows to send messages to Azure Service Bus

● Developing and involved in enhancements for web applications using Core Java, J2EE, JSP, Servlets, JDBC, Struts, Spring, Hibernate, JMS and XML

● Implemented reusable NiFi flows to process XML, CSV, and Log, etc files and store them into Hive, HDFS, HBase

● Worked on both web applications with SOAP and Restful web services to provide backend support to applications led by cross functional teams

● Constant on call/remote support to maintain and resolve production problems of medium to critical complexity

● Coordinated with team members, clients and business analysts for timely delivery of functionality

● Configured additional levels of development and test environments for greater code quality Designed developed and implemented solutions to meet business objectives.

● Collaborate across teams to analyze and develop system requirements

● Consult business clients for any clarification needed on requirements, also receive user acceptance feedback

● Perform code reviews, organize daily status calls with offshore team and provide timely support for development continuity

● Involve in unit testing code, build and deploy onto cloud platforms and support additional levels of testing including integration and system testing

● Assist onsite production support teams when necessary

Client References:

Name: Karen Small

Title: Senior manager

Phone #:972-965-1746

Company Name: TCS / Pepsico

Java Engineer

Intuit

08.2017 - 02.2019

Responsibilites: