Summary

Overview

Work History

Education

Skills

Certification

Additional Information

Timeline

Nidhi Bhuva

Toronto,ON

Summary

Location: Toronto, ON A seasoned Data Engineer with leadership skills, possessing extensive experience, and a focus on delivering results. Demonstrates resourcefulness and effective problem-solving abilities. Successfully navigated tight release schedules and achieved success. With over 5+ years of diverse IT experience, including the development and implementation of various applications across big data and mainframe systems. PROFILE SUMMARY: Total 5+ years of comprehensive experience as a Data Engineer, Hadoop, Big Data &Analytics Developer. Proficient in Hadoop architecture and its ecosystem components including HDFS, MapReduce, Pig, Hive, Sqoop, Flume. Thorough comprehension of Hadoop daemons like Job Tracker, Task Tracker, Name Node, Data Node, as well as MRV1 and YARN architecture. Experienced in installing, configuring, managing, supporting, and monitoring Hadoop clusters using different distributions including Apache Hadoop, Cloudera Horton works, and various cloud platforms such as AWS and GCP. Execute a one-time data migration of multi-state level data from SQL Server to Snowflake utilizing Python and Snow SQL. Create Docker images to enable Airflow execution in a local environment for testing ingestion and ETL pipelines. Proficient in installing and configuring various components of the Hadoop stack, including MapReduce, HDFS, Hive, Pig, Sqoop, Flume, and Zookeeper. Examined the impact of changes on existing ETL/ELT processes to ensure the punctual completion and availability of data in the data warehouse for reporting purposes. Experience in Developing Spark applications using Spark - SQL In Data-bricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Extensive experience in writing and implementing complex test plans, design, development and execution of test scripts for system, Data integration, user acceptance (UAT) and regression testing. Having a strong foundation in JCL (Job Control Language), I have developed, maintained, and optimized many applications, guaranteeing reliable and efficient performance in a range of IT environments. Because of my extensive expertise in JCL, COBOL, CICS, and DB2, I can provide complete, integrated solutions. Worked on source version control tools such as Subversion (SVN), TFS and GIT. Designed execute features of ATDD/BDD using selenium and cucumber. Expertise in developing automation scripts in BDD format using cucumber and proficient in writing scenarios in GHERKIN format. Ample knowledge on Apache Kafka, Apache Storm to build data platforms, pipelines, and storage systems and search technologies such as Elastic search. Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services Expertise in writing custom Kafka consumer code and modifying existing producer code in Python to push data to Spark-streaming jobs. Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features. Shown excellent resilience by successfully adjusting to shifting project needs, resolving technical difficulties, and sustaining high performance under duress, all of which contributed to the success and continuity of the project. With a great deal of knowledge in data validation, I have created and put into place dependable processes to ensure data reliability, precision, and integrity in a variety of situations and projects. I have extensive development, execution, and reporting experience in all phases of data validation.

Overview

years of professional experience

Certification

Work History

Sr. Test Data Engineer

CareFreeIT

01.2023 - Current

I possess a very good understanding of logical and physical data modeling and creation of star schema for
Enterprise Data Warehouse with multi-dimensional data and data mart
Good knowledge in Technologies on systems which comprise of massive amount of data running in highly distributive mode in Cloudera, Horton works Hadoop distributions and Amazon AWS
Developed Oozie workflows for scheduling and orchestrating the ETL process
Involved in writing Python scripts to automate the process of extracting weblogs using Airflow DAGs
Implemented airflow for scheduling and monitoring workflows and architecting complex data pipelines
Practical Understanding of data modeling concepts like star-schema modeling, Snowflake
Wrote pyspark scripts to apply hard qujessicaty checks on data at record level to generate reports to end users
Collaborated with cross-functional teams, including data analysts, database administrators, and business stakeholders, to ensure data validation processes align with business requirements and objectives
Strong data visualization and reporting skill to present insight from the analysis
Good Knowledge on architecture and components of Spark, and efficient in working with Spark Core, Spark
SQL, Spark streaming and expertise in building Spark and Spark-Scala applications for interactive analysis, batch processing and stream processing
Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage
Designed and developed NoSQL solutions for all users
Managed and maintained Oracle and NoSQL databases in production domain
Evaluated system performance and validated NoSQL solutions
Consulting on Snowflake Data Platform Solution Architecture, Design, Development, and deployment focused to bring the data driven culture across the enterprises
Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in snowflake
Stage the API or Kafka Data (in JSON file format) into Snowflake DB by Flattening the same for different functional services
Created data sharing between two snowflake accounts and Reports in Looker based on Snowflake
Connections
Involved in the project life cycle including the design, development, and implementation of verifying data received in the data lake
Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, Text Files into
AWS Redshift
Designed, developed, and implemented complex ETL workflows using Informatica PowerCenter to extract, transform, and load data from various sources into the EDW
Configured and optimized Informatica PowerCenter mappings, sessions, and workflows to ensure efficient data processing and high performance
Managed the migration of ETL processes from legacy systems to Informatica PowerCenter, ensuring minimal disruption and downtime
Analyzed impact changes on existing ETL/ELT processes to ensure timely completion and availability of data in data warehouse for reporting use
Design, build and manage the ELK (Elasticsearch, Logstash, Kibana) cluster for centralized logging and search functionalities for the App
Connected Tableau to diverse data sources, including SQL databases, cloud storage (AWS Redshift), and
Excel files
Ensured seamless data integration and regular updates to maintain data freshness
Used elastic search for storing and querying large data in object-oriented structure and Log stash for filtering tags to visualize the results over Kibana
Developed python code to gather the data from HBase and design the solution for implementing using spark
Built a real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift
I was responsible for the implementation and management of a data catalog
Developed logical and physical data flow models for Informatica ETL applications
Added support for AWS S3 and RDS to host static & media files and the database into amazon cloud
Worked on creation of customer Docker container images, tagging, and pushing of data images
Created and executed a Job Stream and added job definitions in Control-M
Utilized GitHub's robust version control system to manage code changes, track revisions, and maintain a clear history of project development
Written shell scripts to extract data from Unix servers into Hadoop HDFS for long-term storage
Worked extensively on building data pipelines in docker container environment in development phase
Hadoop, Spark, Hive, Native,Teradata, Tableau, Linux, Python, Kafka, Snowflake
AWS S3 Buckets, AWS Glue, NIFI, Post Grass, AWS EC2, Oracle PL/SQL, AWS stack, Development tool kit (JIRA
Bitbucket/Git, Service now etc.

Sr. Data Engineer

Equitable Bank

09.2022 - 12.2022

Hands on experience in Azure Storage - Storage accounts, blob storage, and Azure SQL Server
Explored on the Azure storage accounts like Blob storage
Experience in building, deploying, troubleshooting data extraction for huge number of records using Azure
Data Factory (ADF)
Working on service-oriented architecture and experience of the Release Management process with CI/CD pipelines using Azure DevOps
Worked on Microsoft Azure services like HDInsight Clusters, BLOB, ADLS, Data Factory and Logic Apps and done POC on Azure Data Bricks
Set up and configured the Databricks environment, ensuring seamless integration with AWS S3 for data storage and retrieval
Developed and optimized Spark jobs on Databricks for processing large datasets, improving ETL performance by 30%
Utilized Delta Lake on Databricks for efficient and reliable data storage, ensuring ACID transactions and scalable metadata handling
Analyzing data from Celonis and other dashboards and systems
Designed, developed, and managed data pipelines using Databricks Notebooks to extract, transform, and load (ETL) data from diverse sources
Leveraged Databricks Notebooks to write, debug, and optimize Spark jobs, ensuring efficient data processing and analytics
Creating visualization using Celonis to find and define new opportunities
Workedonmaintenance and extending the existing Celonis data processing pipeline
Automated CI/CD pipeline using Jenkins, build-pipeline-plugin, Maven, and GIT
Building/Maintaining Docker container clusters managed by Kubernetes
Utilization of Kubernetes and
Docker for the runtime environment of the CI/CD system to build, test and deploy
Developed data warehouse model in snowflake for over 100 datasets using where Scape and Creating
Reports in Looker based on Snowflake
Conducted thorough data analysis using tools such as Python, SQL, and Excel to uncover trends, patterns, and anomalies
Focused on extracting meaningful insights that aligned with business objectives
Consulting on Snowflake Data Platform Solution Architecture, Design, Development, and deployment focused to bring the data driven culture across the enterprises
Worked on Parquet files, CSV files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement
Worked on data Pipeline to process large set of data and configured Lookup’s for Data Validation and
Integrity
Implemented airflow for scheduling and monitoring workflows and architecting complex data pipelines
Used Elastic search for powering not only Search but using ELK stack for logging and monitoring our systems end to end Using Beats, participated in problem resolving, change, release, and event management for ELK stack
Worked with Sqoop import and export functionalities to handle large data set transfer between Oracle database and HDFS
Involved insubmitting and tracking spark jobs using Dkron
Involved indevelopmentand testing of both SOAPand REST Service using RestAssured andtested both XML and JSON format
Involved increating Dkron workflow andCoordinate jobs to kick off the jobs on time anddata availability
Developed scripts using Spark which are used to load the data from Hive to Amazon RDS at a faster rate
Involved in loading the created SQL tables data into Spark-Redis for faster access of large customer base without taking Performance hit
Involved inconverting Hive/SQL queries into Spark (RDDs, Data frame and Dataset) using Python and Scala
Experience increating micro services using Scala programming
Knowledge of handling Hive queries using Spark SQL that integrates Spark environment
Implemented test scripts to support test driven development and continuous integration
Involved in story-driven agile development methodology and actively participated in daily scrum meetings
Knowledge of Azure DevOps and its process of creation of the tasks, pull requests, Git repositories
Environment: Spark, Scala, Hadoop, Hive, Sqoop, Play framework, Jenkins, NIFI, Azure Blob, ADLS, DataBricks
Azure stack.

Hadoop Developer

UnitedHealth Group

01.2022 - 07.2022

Involved in the complete life cycle Hadoop Implementation project specializing in but not limited to, writing
PIG queries, Hive Queries (HQL) and Sqoop to pull the log files
Gathered business requirements in meetings for successful implementation and POC (Proof-of- Concept) of
Hadoop and its ecosystem
Worked on Hadoop cluster scaling from 4 nodes in development/test environment to up to 200 nodes in
Production (2 edge nodes, 3 master nodes and 200 data nodes)
Developed some machine learning algorithms using Mahout for data mining for the data stored in HDFS
Experience working on python scripts
Experience in creating and designing the AWS cloud formation templates
Working experience with AWS services: Lambda, S3, EC2
Adept in statistical programming languages like Python and R including Big-Data technologies like Hadoop
HDFS, Spark and Hive
Experience in deploying elastic beanstalk applications to various environments on AWS
Developed Scalding (Scala), Hive, and Java/Python Map-reduce applications for analytics and machine learning at scale
Wrote sequel code to connect a process to Celonis
Understanding of execution solutions in Celonis
Used Flume extensively ingathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS)
Worked with Oozie Work-flow manager to schedule Hadoop jobs (cleanup jobs) to remove the duplicate log data in HDFS
Worked with Sqoop to load genomic research and experimental results data from Oracle database to Hadoop
Distributed File System (HDFS)
Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS
Implemented "Hive Collector Sink" which uses "Collector Sink" interface but takes Hive table as an extra argument to load data in HDFS to Hive table
Involved in Hadoop Name node metadata backups and load balancing as a part of Cluster Maintenance and
Monitoring
Used File System Check (FSCK) to check the health of files in HDFS
Configured log4j so that the audit log is written to a separate file and isn't mixed up with the Name node's other log entries
Monitored Nightly jobs to export data out of HDFS to be stored offsite as part of HDFS backup
Used Pig for analysis of large data sets and brought data back to HBase by Pig
Scheduled,monitored, and debugged various MapReduce nightly jobs using Oozie Workflow
Worked with various Hadoop Ecosystem tools like Sqoop, Hive, Pig, Flume, Oozie
Involved in End User Training, Launch and Adoption of Hadoop system
Environment: Hadoop2.2.0, Sqoop1.4.4, MySQL database, Oozie, Flume, Hive, Pig, Java, Eclipse Kepler.

Sr.Software Developer

WebFacial Technologies

07.2021 - 12.2021

As part of the overall Card Services product-offering GE provides several consumer card services reporting functions
At the heart of this reporting system is the Collection Data Warehouse
The Collection Data
Warehouse consists of several tables that house several years' worth of cardholder account and transactional data
Business partner First Data Corporation (FDR) provides a significant portion of GE Capital's consumer account management services
Provided ongoing support and maintenance for JCL scripts and COBOL applications, ensuring their functionality aligns with evolving business requirements and system updates
Analyzing the existing ETL Sass process
Worked closely with data modelers on Erwin
Preparing functional, technical document
Created new streams for handling the new feeds
Design and develop Abinadi Graph for processing Account/Statement /Transactional data
Worked on performance improvement of Abinadi graphs
Worked on Enhancement of graphs to incorporate new requirement and client additions
Meta data changes are made to handle the new columns and new feeds
Preparing various test cases for programs
End-to-End testing of the process for verifying the impacts caused due to the enhancements.

Software Developer

Growmore Infotech

01.2019 - 06.2021

Involved in various phases of Software Development Life Cycle (SDLC/SCRUM)
Worked with various types of controllers like simple form controller, Abstract Controller and Controller
Interface etc
Integrated Spring DAO for data access using Hibernate, used HQL for querying databases
DevelopedUI modules using HTML, JSP, JavaScript and CSS
Implemented the logging mechanism using Log4j framework
Experience working with OOP
Designed and developed batch processing using multi-threading to process payments
Used Eclipse as the IDE for developing the J2EE application
Involved in writing ANT scripts to build the application
Involved in production support and fixed the issues based on the priority
Developed Stored Procedures, Triggers and Functions in Oracle
Used Concurrent VersionSystem (CVS) as source control tool to keep track of the system state
Created and Configured Connection pools in WebSphere Application Server
Used JUnit for debugging, testing, and maintaining the system state
Environment: Java, JSP, Web Sphere Application Server, HTML, ANT, JUnit, CVS, Eclipse, Oracle.

Education

Bachelor of Technology - Computer Science

Gujarat Technological University

India

Associates Degree - Computer Science

Conestoga College

Canada

Skills

TECHNICAL SKILLS:
PROGRAMMING
LANGUAGES
Java, Scala, Python and Shell Scripting, Scala
BIGDATAECOSYSTEM Spark, Hive, HBase, SQOOP, Oozie, ELK, Storm, Flume, Pig, Kafka, Zookeeper
Play2MapReduce, celonis, Akka
CLOUD Snowflake, AWS EMR, EC2, S3, RDS, Daturic, Dataflow
Azure Data Factory, Blob Storage, Azure Data Lake, Data Processing
DBMS SQLServer, MySQL, PL/SQL, Graphql, Oracle, Database modelling, TerraData
PostghreSQL
NoSQL Databases Cassandra, Mongo DB

IDEs Eclipse,Visual Studio, VersionControl
Monitoring/Reporting
Tools
Wrike, What graph, Dash This
OPERATING SYSTEMS Windows, Unix, Linux, Solaris, CentOS
FRAMEWORKS MVC, Struts, Power BI, Maven, Junit, Log4J, ANT, Tableau, Qlik, Splunk, Aqua-
Data Studio
ETL Tools Databricks Lakehouse Platform, Five Tran
J2EE TECHNOLOGIES Spring, Servlets, J2SE, JSP, JDBC
METHODOLOGIES Agile, Waterfall, BDD, TDD, ATDD

Certification

Project 1#: JAN 2023 totill date Client: Carefree

Additional Information

Used python and Django to interface with the jQuery UI and manage the storage deletion of contest. Wrote pythonmodules to load / extract asset data from the MYSQL source database. Designed and implemented GraphQL schemas to define types, queries, mutations, and subscriptions. Used python script to update the content of the Database and manipulate the files. Experienced in data processing and analysis using Spark, HiveQL, and SQL. Extensive experience in Writing User Defined Functions (UDFs) in Hive and Spark Strong experience in database design, writing complex SQL Queries and Stored Procedures. In a fast-paced, unstructured atmosphere, experience working both alone and cooperatively to solve challenges and create high-quality outcomes. Designed and implemented machine learning and deep learning models using PyTorch. Prior experience designing software solutions to expand Big Data Platform capabilities. Knowledge of the different tools and frameworks used in the Hadoop ecosystem. (MapReduce, YARN, Pig, Hive, HBase, Zookeeper, Sqoop,) as well as NoSQL. Developed high-performance parallel computing applications using CUDA C/C++ to leverage GPU capabilities. Collaborated with data scientists and analysts to develop interactive and reusable Databricks Notebooks for exploratory data analysis and visualization. Integrated Databricks Notebooks with visualization tools like Tableau to create real-time dashboards and reports, providing stakeholders with actionable insights. Documented data workflows and best practices within Databricks Notebooks to facilitate knowledge sharing and collaboration among team members. Designed and developed RESTful and SOAP APIs to facilitate seamless communication between internal and external systems. Knowledge of ETL and relational database systems, as well as how to create and optimize them. Worked extensively over semi-structured data (fixed length & delimited files) for data sanitation, report generation and standardization. Used ELK (Elasticsearch, Logstash and Kibana) for name search pattern for a customer. Used Elastic search for powering not only Search but using ELK stack for logging and monitoring our systems end to end Using Beats. Responsible for designing and deploying new ELK clusters (Elastic search, Logstash, Kibana, beats, Kafka, zookeeper etc.) Extensive experience working with AWS Cloud services and AWS SDKs to work with services like AWS API Gateway, Lambda, S3, IAM and EC2. Experienced inmonitoring Hadoop cluster using Cloudera Manager and Web UI.

Timeline

Sr. Test Data Engineer

CareFreeIT

01.2023 - Current

Sr. Data Engineer

Equitable Bank

09.2022 - 12.2022

Hadoop Developer

UnitedHealth Group

01.2022 - 07.2022

Sr.Software Developer

WebFacial Technologies

07.2021 - 12.2021

Software Developer

Growmore Infotech

01.2019 - 06.2021

Bachelor of Technology - Computer Science

Gujarat Technological University

Associates Degree - Computer Science

Conestoga College

Nidhi Bhuva

Summary

Overview

Work History

Sr. Test Data Engineer

Sr. Data Engineer

Hadoop Developer

Sr.Software Developer

Software Developer

Education

Bachelor of Technology - Computer Science

Associates Degree - Computer Science

Skills

Certification

Additional Information

Timeline

Sr. Test Data Engineer

Sr. Data Engineer

Hadoop Developer

Sr.Software Developer

Software Developer

Bachelor of Technology - Computer Science

Associates Degree - Computer Science

Similar Profiles

Luke TomicLuke Tomic

Kenrick BarnettKenrick Barnett

Tina MaiTina Mai

Melika TafreshiMelika Tafreshi

Preeti KumariPreeti Kumari