Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic

Niharika Chukkametta

Rogers,Arkansas

Summary

With over 8 years of professional software development experience, I specialize in Professional Data Engineering and Analytics alongside Java-based application development. Leveraging expertise in navigating the Spark and Hadoop ecosystems, I excel in distributed computing, real-time analytics, and constructing robust data lakes on AWS Cloud. Proficient in developing production-ready Spark applications, troubleshooting data pipelines, and optimizing processes, I demonstrate adeptness in Scala, Python, Docker, and version control systems. My track record includes meeting deadlines in Agile environments and continuously expanding knowledge in Azure and GCP services, with proficiency in data migration projects and Snowflake tools. Eager to contribute to high-impact projects and drive innovation.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Big Data Engineer

O9 solutions, Inc.
07.2022 - Current
  • Developed Spark Applications by using Scala and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources
  • Experience with PYSPARK for using Spark libraries by using Python scripting for data analysis and aggregation and for utilizing data frames, developed Spark SQL API for processing data
  • Run DAGs using Apache airflow to structure batch jobs in an extremely efficient way
  • Developed Spark Scala applications using both RDD/Data frames/Spark SQL for Data Aggregation, queries and writing data back into OLTP system using Spark JDBC
  • Configured Spark Streaming to receive real time data from the Kafka and store the processed stream data back to Kafka
  • Experienced in writing live Real-time Processing using Spark Streaming with Kafka
  • Involved in creating Hive tables and loading and analyzing data using hive queries
  • Extensively worked with S3 bucket in AWS
  • Utilizing AWS Lambda and DynamoDB, a security framework was created to provide for fine-grained access control for items in AWS S3
  • Complete architectural and implementation evaluations of several AWS services, including Amazon EMR, Redshift, and S3
  • Utilized AWS EMR to transport data across databases and data storage inside AWS, including Amazon S3 and Amazon DynamoDB, efficiently
  • Used SSIS (SQL Server Integration Services) and DTS Packages (Data Transformation Services) to import and export databases.

Data Engineer

Dell Technologies
12.2020 - 06.2022
  • Developed PYSPARK applications using Python utilizing Data frames and Spark SQL API for faster processing of data
  • Developed highly optimized Spark applications to perform various data cleansing, validation, transformation, and summarization activities according to the requirements
  • Data pipeline consists of Spark, Hive and Sqoop and custom-built Input Adapters to ingest, transform and analyze operational data
  • Used Spark for interactive queries, processing of streaming data and integration with HBase NoSQL database for huge volume of data
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Python
  • Expertise in utilizing various Azure Cloud Services, including PaaS and IaaS, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake
  • Contributed to the design and development of Snap logic pipelines for data extraction from the Data Lake to the staging server, followed by data processing using Informatica for ingestion into Teradata within the EDW
  • Addressed Teradata utility failures and handled errors related to Snap logic, Informatica, and Teradata by implementing necessary code overrides
  • Analyzed data profiling results and performed various transformations
  • Created reference tables using Informatica Analyst and Developer tools
  • Developed Python scripts for parsing JSON documents and loading data into databases
  • Generated capacity planning reports using Python packages including NumPy and Matplotlib
  • Utilized Azure Data Factory, T-SQL, Spark SQL, and from Azure Data Lake Analytics to extract, transform, and load data from source systems to Azure Storage services
  • Proficient in using Snowflake utilities, Snow SQL, Snow Pipe, and applying Big Data modeling techniques using Python
  • Developed ETL pipelines between data warehouses using a combination of Python and Snowflake's Snow SQL, writing SQL queries against Snowflake.

Senior Data Engineer

Sentara Healthcare
02.2019 - 11.2020
  • Building custom ETL workflows using Spark and Hive to perform data cleaning, transformation, and mapping tasks, ensuring data quality and consistency.
  • Implementing Kafka Custom encoders for custom input format, enabling efficient loading of data into Kafka for real-time data processing and analytics.
  • Utilizing HUE for creating files and optimizing SQL queries in Hive, ensuring efficient data processing.
  • Converting HiveQL queries into Spark transformations using Spark RDD and Scala programming, enabling seamless transition between different processing frameworks.
  • Exploring Spark to enhance performance and optimize existing algorithms in Hadoop, leveraging Spark context, Spark-SQL, Data Frame, and pair RDD.
  • Harnessing PYSPARK to utilize Spark libraries through Python scripting for advanced data analysis and processing.
  • Responsible for building scalable distributed data solutions using Hadoop Cluster environment with Cloudera distribution.
  • Additionally, I apply normalization and demoralization techniques to enhance performance in both relational and dimensional database environments, contributing to efficient data management and analysis processes.
  • Utilized TensorFlow to enhancing patient care outcomes, optimizing operational processes, and driving data-driven decision-making within the organization.
  • Developed interactive dashboards in PowerBI, providing executives with real-time insights into patient demographics, treatment outcomes, and operational efficiency.
  • Utilized Entity-Relationship (ER) modeling techniques to design robust data schemas and improve data integrity across Sentara Healthcare's database systems.
  • Collaborated with business analysts to identify key performance indicators (KPIs) and design visually compelling visualizations to track healthcare quality metrics over time.

Data Engineer

Nordstrom
03.2017 - 01.2019
  • Designed and developed applications on the data lake to transform the data according to business users to perform analytics
  • Responsible for managing data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
  • Conducted data model reviews with team members and captured technical metadata through modelling tools
  • Implemented ETL process wrote and optimized SQL queries to perform data extraction and merging from SQL server database
  • Furthermore, I have provided support in upgrading, configuring, and maintaining different Hadoop infrastructures, including Pig, Hive, and HBase
  • I have been actively involved in the extraction of large volumes of customer data from diverse data sources and loading it into Hadoop HDFS
  • In addition to extracting customer's big data from various sources into Hadoop HDFS, I have worked with data from mainframes, databases, and logs data from servers
  • Worked with NoSQL databases like HBase in creating HBase tables to store large sets of semi-structured data coming from various data sources
  • Developed complex MapReduce jobs for performing efficient data transformations
  • Data cleaning, pre-processing, and modeling using MapReduce
  • Strong Experience in writing SQL queries.

Data Engineer

Citrix systems
01.2016 - 02.2017
  • Worked on development of data ingestion pipelines using ETL tool, Talend & bash scripting with big data technologies including but not limited to Hive, Impala, Kafka, and Talend
  • Experience in developing scalable & secure data pipelines for large datasets
  • Gathered requirements for ingestion of new data sources including life cycle, data quality check, transformations, and metadata enrichment
  • Collecting and aggregating large amounts of log data and staging data in HDFS for further analysis
  • Monitor the Daily, Weekly, Monthly jobs and provide support in case of failures/issues
  • Hands-on experience in preprocessing and transforming data using scikit-learn built-in functions for scaling, encoding, and feature selection.
  • Delivered data engineer services like data exploration, ad-hoc ingestions, subject-matter-expertise to Data scientists in using big data technologies
  • Utilized AWS monitoring and logging tools to track system performance, troubleshoot issues, and optimize resource utilization for cost efficiency.
  • Knowledge on implementing the JILs to automate the jobs in the production cluster
  • Troubleshooted user's analyses bugs (JIRA and IRIS Ticket)
  • Worked with the SCRUM team in delivering agreed user stories on time for every Sprint
  • Worked on analyzing and resolving production job failures in several scenarios
  • Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.
  • Utilized PowerBI data modeling capabilities to transform raw data into meaningful insights, improving decision-making processes.

Education

Masters in Computer Technology -

Eastern Illinois university
Charleston, Illinois

Graduated in Electronics and communication Engineer -

Lords Institute of Engineering and Technology
Hyderabad, India

Skills

  • Big Data Tools: Hadoop, HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Impala, Zookeeper, Spark, Kafka, NIFI & AirFlow
  • No-SQL: H-Base, Cassandra, MongoDB
  • Build and Deployment Tools: Maven, Git, SVN, Jenkins
  • Programming and Scripting: Java, Scala, Python, SQL, Shell Scripting, HiveQL
  • Databases: Teradata, Redshift, Oracle, MySQL, Postgres
  • Web Dev Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript
  • Data Visualization Tools: Power BI, Tableau, ER/studio, Quick Sight
  • Cloud Environments: AWS, Azure, GCP

Certification

  • Certified on Hadoop, Spark & SnowFlake - Duke University


Timeline

Big Data Engineer

O9 solutions, Inc.
07.2022 - Current

Data Engineer

Dell Technologies
12.2020 - 06.2022

Senior Data Engineer

Sentara Healthcare
02.2019 - 11.2020

Data Engineer

Nordstrom
03.2017 - 01.2019

Data Engineer

Citrix systems
01.2016 - 02.2017

Masters in Computer Technology -

Eastern Illinois university

Graduated in Electronics and communication Engineer -

Lords Institute of Engineering and Technology
Niharika Chukkametta