Summary

Overview

Work History

Education

Skills

Websites

Certification

Timeline

Niharika Chukkametta

Rogers,Arkansas

Summary

With over 8 years of professional software development experience, I specialize in Professional Data Engineering and Analytics alongside Java-based application development. Leveraging expertise in navigating the Spark and Hadoop ecosystems, I excel in distributed computing, real-time analytics, and constructing robust data lakes on AWS Cloud. Proficient in developing production-ready Spark applications, troubleshooting data pipelines, and optimizing processes, I demonstrate adeptness in Scala, Python, Docker, and version control systems. My track record includes meeting deadlines in Agile environments and continuously expanding knowledge in Azure and GCP services, with proficiency in data migration projects and Snowflake tools. Eager to contribute to high-impact projects and drive innovation.

Overview

years of professional experience

Certification

Work History

Big Data Engineer

O9 solutions, Inc.

07.2022 - Current

Developed Spark Applications by using Scala and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources
Experience with PYSPARK for using Spark libraries by using Python scripting for data analysis and aggregation and for utilizing data frames, developed Spark SQL API for processing data
Run DAGs using Apache airflow to structure batch jobs in an extremely efficient way
Developed Spark Scala applications using both RDD/Data frames/Spark SQL for Data Aggregation, queries and writing data back into OLTP system using Spark JDBC
Configured Spark Streaming to receive real time data from the Kafka and store the processed stream data back to Kafka
Experienced in writing live Real-time Processing using Spark Streaming with Kafka
Involved in creating Hive tables and loading and analyzing data using hive queries
Extensively worked with S3 bucket in AWS
Utilizing AWS Lambda and DynamoDB, a security framework was created to provide for fine-grained access control for items in AWS S3
Complete architectural and implementation evaluations of several AWS services, including Amazon EMR, Redshift, and S3
Utilized AWS EMR to transport data across databases and data storage inside AWS, including Amazon S3 and Amazon DynamoDB, efficiently
Used SSIS (SQL Server Integration Services) and DTS Packages (Data Transformation Services) to import and export databases.

Data Engineer

Dell Technologies

12.2020 - 06.2022

Developed PYSPARK applications using Python utilizing Data frames and Spark SQL API for faster processing of data
Developed highly optimized Spark applications to perform various data cleansing, validation, transformation, and summarization activities according to the requirements
Data pipeline consists of Spark, Hive and Sqoop and custom-built Input Adapters to ingest, transform and analyze operational data
Used Spark for interactive queries, processing of streaming data and integration with HBase NoSQL database for huge volume of data
Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Python
Expertise in utilizing various Azure Cloud Services, including PaaS and IaaS, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake
Contributed to the design and development of Snap logic pipelines for data extraction from the Data Lake to the staging server, followed by data processing using Informatica for ingestion into Teradata within the EDW
Addressed Teradata utility failures and handled errors related to Snap logic, Informatica, and Teradata by implementing necessary code overrides
Analyzed data profiling results and performed various transformations
Created reference tables using Informatica Analyst and Developer tools
Developed Python scripts for parsing JSON documents and loading data into databases
Generated capacity planning reports using Python packages including NumPy and Matplotlib
Utilized Azure Data Factory, T-SQL, Spark SQL, and from Azure Data Lake Analytics to extract, transform, and load data from source systems to Azure Storage services
Proficient in using Snowflake utilities, Snow SQL, Snow Pipe, and applying Big Data modeling techniques using Python
Developed ETL pipelines between data warehouses using a combination of Python and Snowflake's Snow SQL, writing SQL queries against Snowflake.

Senior Data Engineer

Sentara Healthcare

02.2019 - 11.2020

Building custom ETL workflows using Spark and Hive to perform data cleaning, transformation, and mapping tasks, ensuring data quality and consistency.
Implementing Kafka Custom encoders for custom input format, enabling efficient loading of data into Kafka for real-time data processing and analytics.
Utilizing HUE for creating files and optimizing SQL queries in Hive, ensuring efficient data processing.
Converting HiveQL queries into Spark transformations using Spark RDD and Scala programming, enabling seamless transition between different processing frameworks.
Exploring Spark to enhance performance and optimize existing algorithms in Hadoop, leveraging Spark context, Spark-SQL, Data Frame, and pair RDD.
Harnessing PYSPARK to utilize Spark libraries through Python scripting for advanced data analysis and processing.
Responsible for building scalable distributed data solutions using Hadoop Cluster environment with Cloudera distribution.
Additionally, I apply normalization and demoralization techniques to enhance performance in both relational and dimensional database environments, contributing to efficient data management and analysis processes.
Utilized TensorFlow to enhancing patient care outcomes, optimizing operational processes, and driving data-driven decision-making within the organization.
Developed interactive dashboards in PowerBI, providing executives with real-time insights into patient demographics, treatment outcomes, and operational efficiency.
Utilized Entity-Relationship (ER) modeling techniques to design robust data schemas and improve data integrity across Sentara Healthcare's database systems.
Collaborated with business analysts to identify key performance indicators (KPIs) and design visually compelling visualizations to track healthcare quality metrics over time.

Data Engineer

Nordstrom

03.2017 - 01.2019

Designed and developed applications on the data lake to transform the data according to business users to perform analytics
Responsible for managing data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
Conducted data model reviews with team members and captured technical metadata through modelling tools
Implemented ETL process wrote and optimized SQL queries to perform data extraction and merging from SQL server database
Furthermore, I have provided support in upgrading, configuring, and maintaining different Hadoop infrastructures, including Pig, Hive, and HBase
I have been actively involved in the extraction of large volumes of customer data from diverse data sources and loading it into Hadoop HDFS
In addition to extracting customer's big data from various sources into Hadoop HDFS, I have worked with data from mainframes, databases, and logs data from servers
Worked with NoSQL databases like HBase in creating HBase tables to store large sets of semi-structured data coming from various data sources
Developed complex MapReduce jobs for performing efficient data transformations
Data cleaning, pre-processing, and modeling using MapReduce
Strong Experience in writing SQL queries.

Data Engineer

Citrix systems

01.2016 - 02.2017

Worked on development of data ingestion pipelines using ETL tool, Talend & bash scripting with big data technologies including but not limited to Hive, Impala, Kafka, and Talend
Experience in developing scalable & secure data pipelines for large datasets
Gathered requirements for ingestion of new data sources including life cycle, data quality check, transformations, and metadata enrichment
Collecting and aggregating large amounts of log data and staging data in HDFS for further analysis
Monitor the Daily, Weekly, Monthly jobs and provide support in case of failures/issues
Hands-on experience in preprocessing and transforming data using scikit-learn built-in functions for scaling, encoding, and feature selection.
Delivered data engineer services like data exploration, ad-hoc ingestions, subject-matter-expertise to Data scientists in using big data technologies
Utilized AWS monitoring and logging tools to track system performance, troubleshoot issues, and optimize resource utilization for cost efficiency.
Knowledge on implementing the JILs to automate the jobs in the production cluster
Troubleshooted user's analyses bugs (JIRA and IRIS Ticket)
Worked with the SCRUM team in delivering agreed user stories on time for every Sprint
Worked on analyzing and resolving production job failures in several scenarios
Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.
Utilized PowerBI data modeling capabilities to transform raw data into meaningful insights, improving decision-making processes.

Education

Masters in Computer Technology -

Eastern Illinois university

Charleston, Illinois

Graduated in Electronics and communication Engineer -

Lords Institute of Engineering and Technology

Hyderabad, India

Skills

Big Data Tools: Hadoop, HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Impala, Zookeeper, Spark, Kafka, NIFI & AirFlow
No-SQL: H-Base, Cassandra, MongoDB
Build and Deployment Tools: Maven, Git, SVN, Jenkins
Programming and Scripting: Java, Scala, Python, SQL, Shell Scripting, HiveQL

Databases: Teradata, Redshift, Oracle, MySQL, Postgres
Web Dev Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript
Data Visualization Tools: Power BI, Tableau, ER/studio, Quick Sight
Cloud Environments: AWS, Azure, GCP

Websites

www.linkedin.com/in/niharikacdataengineer

Certification

Certified on Hadoop, Spark & SnowFlake - Duke University

Timeline

Big Data Engineer

O9 solutions, Inc.

07.2022 - Current

Data Engineer

Dell Technologies

12.2020 - 06.2022

Senior Data Engineer

Sentara Healthcare

02.2019 - 11.2020

Data Engineer

Nordstrom

03.2017 - 01.2019

Data Engineer

Citrix systems

01.2016 - 02.2017

Masters in Computer Technology -

Eastern Illinois university

Graduated in Electronics and communication Engineer -

Lords Institute of Engineering and Technology

Niharika Chukkametta

Summary

Overview

Work History

Big Data Engineer

Data Engineer

Senior Data Engineer

Data Engineer

Data Engineer

Education

Masters in Computer Technology -

Graduated in Electronics and communication Engineer -

Skills

Websites

Certification

Timeline

Big Data Engineer

Data Engineer

Senior Data Engineer

Data Engineer

Data Engineer

Masters in Computer Technology -

Graduated in Electronics and communication Engineer -

Similar Profiles

Venkat SrirajVenkat Sriraj

Rahul TrivediRahul Trivedi

SAHIL JAINSAHIL JAIN

Terrell WashingtonTerrell Washington

Tracy ScottTracy Scott