Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic
Nani K

Nani K

Newark,DE

Summary

Results-driven Data Engineer specializing in CCD, ADT, CAnD, and SIU within the healthcare sector. Skilled in Spark, AWS, and Azure technologies, with a focus on optimizing ETL processes and enhancing data quality through Oracle DB SQL for Tableau reports. Collaborates effectively with cross-functional teams to deliver impactful data solutions.

Overview

7
7
years of professional experience

Work History

Business Analyst/Data Analyst

AmeriHealth Caritas
Newark, DE
06.2023 - Current
  • Developed applications and deployed them in Google Cloud Platform using DataProc, Dataflow, Composer, BigQuery, BigTable, Cloud Storage, GCS and various operators in DAG.
  • Migrated existing data pipelines in hive to GCP platform
  • Designed and implemented data transformation, ingestion and curation functions on GCP cloud using GCP native and Python.
  • Optimized data pipelines for performance and cost for large scale data lakes.
  • Designed and automated BigQuery tables and Google Cloud Functions to enable reporting, analysis, and modeling.
  • Used Node.js to write custom UDFs in Big query and used them in the data pipeline.
  • Used Python for scripting purposes, for leveraging a wide range of technologies that include leveraging a wide range of technologies.
  • Developed and supported databases and related ETL processes, including batch and real-time processing.
  • Demonstrated understanding of issue triaging processes in Big Data systems.
  • Configured and managed Apache Airflow, a workflow management platform and used it for scheduling the data pipeline.
  • Also managed complex ETL batch processes using hive, wrote Hive UDFs and used HDFS as storage.
  • Involved in Importing and transforming processes on data using Hive, Spark and loading it into HDFS.
  • Demonstrated expertise in Oracle, PL/SQL and Stored Procedures, optimizing database performance.
  • Wrote and executed complex SQL queries, facilitating data extraction and manipulation.
  • Worked closely with the SME and BA to get an understanding of the business requirements.
  • Worked in Agile Model, moving through the sprint cycle and using the tracking Tool: JIRA.
  • Collaborated with cross-functional teams to gather and document requirements.
  • Created process maps to visualize workflows and enhance operational understanding.
  • Managed projects and served as primary liaison between client and multiple internal groups to clarify goals and meet standards and deadlines.
  • Delivered timely support by tracking issues and communicating resolutions to end users.
  • Analyzed existing systems and processes to identify areas of improvement.
  • Developed reports using SQL queries to track progress against key performance indicators.
  • Working as a Business/Data Analyst for Payer organization, gather
  • requirement from onsite client partner, analyse and create BRD,
  • FRD, RTM and RFC.
  • Groom requirements with Development & QA team.
  • Perform and Document Gap Analysis. Create Source-to-Target
  • Mapping (STTM) for FHIR standard.
  • Perform Unit and conduct UAT testing.
  • Perform Data Profiling for FHIR data mapping, parsing CCD data.
  • Learn and understand NCQA/U.S Core FHIR implementation
  • Guide/FHIR profiles.
  • Collaborate with SmileCDR (FHIR repository) to GET/POST FHIR
  • bundles.
  • Test and Validate FHIR Json response through Postman/SoapUI.
  • Participate in POC for CMS Prior Authorization Final Rule.
  • Troubleshoot and resolve production issues.Worked on relational and dimensional modeling.

Senior Data engineer

Cigna
Philadelphia, USA
06.2021 - 06.2023
  • Engineered and Administered Informatica platforms for Cloud Services, Big Data Management, Master Data Management, Data Integration and Data Quality.
  • Wrote python scripts to parse XML documents and load the data in database and developed web - based applications using Python, CSS and HTML.
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
  • Developed glue job scripts using python to load data from one bucket to another bucket with optimized techniques. Experience in working on Glue 3.0.
  • Developed step functions, lambda functions, and CloudWatch events to trigger Glue jobs, streamlining data loading from source to destination while enhancing ETL process control.
  • Conducted R&D activities to improve and optimize step functions, reducing the execution time for ETL.
  • Created Athena tables to hold some reference data which would be used for the data model.
  • Expertise on Testing REST API using Robot Framework and SOAP UI
  • Developed lambda function to control the process of business domain tables.
  • Experience in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Involved in fixing high priority defects that come along in the ETL process using Lambda/Redshift data correction framework. Worked on standard python packages like boto and boto3 for AWS.
  • Experience in working on Python 3.7 when developing lambda functions. Created Glue job to SYNC views from Athena to Redshift. Created lambda to Check OK files for 7 subject areas from a cross AWS account.
  • Involved in creating new Athena tables as per data model. Created customized views as per business requirement. Involved in release activities and actively co-ordinate with offshore.
  • Validated SOAP UI and Restfull API services
  • Worked on AWS Batch to load data from on-perm to AWS landing S3 bucket.
  • Analyzed Test Plans and Test Cases based on Requirements and General Design Documents, involved in both Manual and Automation Testing.
  • Created Glue job to merge CSV data into parquet files and converted Athena tables to Delta Lake tables, optimizing data storage and querying efficiency while working with PostgreSQL on AWS RDS.
  • Experienced in creating and transforming Spark Dataframe using Python in Glue jobs.
  • Developed continuous integration and deployment pipelines to automate software delivery processes. for continuous integration/deployment using Jenkins.
  • Deployed AWS components to different environments using UDeploy. Developed Jenkins script to email Vera code scan status to different teams. Responsible to support ongoing ETL process for the team.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
  • Actively involved in code review, code profiling and bug fixing for improving the performance.
  • Worked on applications and developed them with XML, JSON, XSL (PHP, Django, Python, Rails).
  • Experienced in developing Web Services with Python programming language.
  • Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL database. Involved in unit testing and code coverage activities.
  • Create manage bucket policies and lifecycle for S3 storage as per organizations and compliance guidelines.
  • Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with Excel macros and python libraries.
  • Proficient with testing REST APIs, Web & Database testing.
  • Analyzed and processed the S3 data from starting stage to the persistence stage by using AWS Athena, Glue crawlers and creating glue jobs. Experienced working in Dev, staging & prod environment.
  • Implemented AWS services like EC2, SQS, SNS, IAM, S3, and DynamoDB to deploy multi-tier advertiser applications concerning fault tolerance, high availability, and auto-scaling in AWS Cloud formation.
  • Worked on analyzing Data-Integration, Data Mapping, Data Profiling and Data Warehouse access using SQL, ETL process. Involved in preparing Logical Data Models/Physical Data Models.
  • Developed use case diagrams, class diagrams, database tables, and mapping between relational database tables & loading to Hive tables.
  • Created Hive-compatible table schemas on raw data in Data Lake, partitioned by time and product dimensions, enabling efficient analysis and ad-hoc queries using AWS Athena.
  • Worked on Agile (Scrum) Methodology, participated in daily scrum meetings, and was actively involved in sprint planning and product backlog creation.
  • Responsible for performing cleansing, filtering, and comparing existing data with the model and database using excel and data comparison tools.
  • Optimize existing data pipelines and maintain all domain-related data pipelines.
  • Designed & built the efficient and reliable data pipelines to move & transform data (both large & smaller amounts) Worked closely with the SME to get an understanding of the business requirements.
  • Designed and developed large-scale data processing pipelines using PySpark and Spark to handle large datasets and support real-time analytics.
  • Collaborated with decision scientists, marketing, and other cross functions teams to understand data needs, development requirements and align on development strategy for scalable solution.
  • Working on data validation and data profiling to ensure the accuracy of the data between the warehouse and source systems.
  • Experience in report data validations and cosmetic data validations which are developed in Cognos/ MicroStrategy and Business Objects.
  • Philadelphia, PA

Data Engineer

JP Morgan Chase
Newark, USA
02.2019 - 05.2021
  • Meetings with business/user groups to understand the business process, gather requirements, analyze, design, develop and implement according to client requirements.
  • Experienced in developing web-based applications using Python, Django, PHP, XML, CSS, HTML, and DHTML. Regularly tune performance of Spark jobs to improve data processing and retrieving.
  • This project is to design and implement a statistical model for predicting the balance outcome of specific pricing actions. A forecasting framework designed to optimize pricing strategy with the help of SVA maximization. Worked on Cloudera 5.10 Hadoop Distribution and Spark 2.3.0.
  • Experience on querying the Hive tables using HUE. Developed ETL integration patterns using Python on Spark. Hands-on experience in RDD’s, Pandas and PySpark Data Frames.
  • Developed Python scripts for importing and validating model variables and coefficients, ensuring data integrity in JSON format.
  • Executed ETL processes to extract files for external vendors, ensuring data accuracy and integrity.
  • Experience on developing Spark programs to parse the raw data, populate staging tables, and store the refined data in partitioned Hive tables in HDFS.
  • Involved in analyzing business requirements and preparing detailed specifications that follow project guidelines required for project development.
  • Integrated machine learning models with forecast engines, enhancing predictive accuracy of customer behavior analysis.
  • Experience in machine learning methods like standard deviation, regression, resampling.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Worked with ELASTIC MAPREDUCE and set up a Hadoop environment in AWS EC2 Instances.
  • Experience programming in Scala for model execution and cash flow as per the Business requirement.
  • Experience with Json, Parquet, CSV, INI, YAML, and other file formats for job/Spark configurations.
  • Use Jira for bug tracking and Bitbucket to check-in and check-out code changes.
  • Experiences in using build/deploy tools such as Jenkins for continuous integration and deployment.
  • Experienced working in dev, staging, and prod environment.
  • Work with Scrum team in delivering agreed user stories on time for every Sprint.
  • Actively involved in code review, code profiling, and bug fixing for improving the performance.
  • Strong experience in development, and testing phases of software development life cycle.
  • Played a key role in making PAC product agnostic to support for all payers. This results in reducing development effort & huge cost savings for client.
  • Developed the PAC product & enhancements under Agile methodology covering 44 successful Sprints.
  • Have in-depth knowledge in debugging the issues and identifying the root causes and applying the quick fixes for DataStage jobs and Oracle database objects.
  • Extensively involved in medium and complex DS code Built and maintained high quality deliverables without slippage. Using Spark Dataframe API in Scala for analyzing data.
  • Extraction of data from different source systems, transforming into suitable format and loading into a data warehouse using DataStage jobs.
  • Worked in the entire lifecycle beginning from Requirements gathering, Data Model Design, ETL Design, Report Design, Testing, and Migration to Production and Postproduction Support.
  • Conducting the peer reviews of the code developed by the other team members.
  • Worked closely with the SME and BA to get an understanding of the business requirements.
  • Worked in Agile Model, moving through the sprint cycle and using the tracking Tool: JIRA.
  • Prepared Functional specification Doc to convert the business requirements into ETL Mappings.
  • Responsible for architecting and developing solutions and successful project delivery & process engagements.
  • Involved in preparing Logical Data Models/Physical Data Models.
  • Identified source systems, their connectivity, related tables, and fields and ensure data suitably for mapping. Generated DDL and created tables and views in the corresponding layers.
  • Designed & developed logical & physical data model using data warehouse methodologies, including Star schema - Star-joined schema, confirmed dimensions data architecture, data modeling, designing & developing ETL applications using Datastage.
  • Performed unit testing & provided the technical support to the QA Team in case of any defects or failures.
  • Newark, DE

Education

Masters - CS

WIU
Macomb, IL
05-2018

Bachelors - EIE

JNTUK
VR Siddhartha Engineering College
04-2015

Skills

  • HDFS
  • Yarn
  • Spark
  • Hive
  • Pig
  • Spark Streaming
  • Spark SQL
  • Oozie
  • Impala
  • Apache Sqoop
  • Apache Kafka
  • Apache Flume
  • Apache Spark
  • Hadoop
  • Hortonworks
  • Cloudera
  • EC2
  • IAM
  • DynamoDB
  • Cloud watch
  • EMR
  • S3
  • Glue
  • Step functions
  • Lambda
  • Athena
  • Maven
  • Jenkins
  • SBT
  • Ant
  • Eclipse
  • IntelliJ
  • MS Visual Studio
  • Net Beans
  • Java
  • Scala
  • Python
  • SQL
  • PL/SQL
  • JSON
  • Linux Shell Scripting
  • MongoDB
  • MySQL
  • HBase
  • Git
  • SVN
  • Linux
  • Windows
  • Ambari
  • Cloudera Manager
  • Agile/Scrum
  • Data mapping
  • FHIR implementation
  • Data validation
  • CCD
  • ADT
  • CAnD
  • SIU
  • Data mapping
  • Project management
  • Azure Data Lake
  • Azure cloud

Timeline

Business Analyst/Data Analyst

AmeriHealth Caritas
06.2023 - Current

Senior Data engineer

Cigna
06.2021 - 06.2023

Data Engineer

JP Morgan Chase
02.2019 - 05.2021

Masters - CS

WIU

Bachelors - EIE

JNTUK
Nani K