Summary
Overview
Work History
Education
Skills
Certification
Additional Information
Timeline
Generic

Nidhi Bhuva

Toronto,ON

Summary

Location: Toronto, ON A seasoned Data Engineer with leadership skills, possessing extensive experience, and a focus on delivering results. Demonstrates resourcefulness and effective problem-solving abilities. Successfully navigated tight release schedules and achieved success. With over 5+ years of diverse IT experience, including the development and implementation of various applications across big data and mainframe systems. PROFILE SUMMARY: Total 5+ years of comprehensive experience as a Data Engineer, Hadoop, Big Data &Analytics Developer. Proficient in Hadoop architecture and its ecosystem components including HDFS, MapReduce, Pig, Hive, Sqoop, Flume. Thorough comprehension of Hadoop daemons like Job Tracker, Task Tracker, Name Node, Data Node, as well as MRV1 and YARN architecture. Experienced in installing, configuring, managing, supporting, and monitoring Hadoop clusters using different distributions including Apache Hadoop, Cloudera Horton works, and various cloud platforms such as AWS and GCP. Execute a one-time data migration of multi-state level data from SQL Server to Snowflake utilizing Python and Snow SQL. Create Docker images to enable Airflow execution in a local environment for testing ingestion and ETL pipelines. Proficient in installing and configuring various components of the Hadoop stack, including MapReduce, HDFS, Hive, Pig, Sqoop, Flume, and Zookeeper. Examined the impact of changes on existing ETL/ELT processes to ensure the punctual completion and availability of data in the data warehouse for reporting purposes. Experience in Developing Spark applications using Spark - SQL In Data-bricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Extensive experience in writing and implementing complex test plans, design, development and execution of test scripts for system, Data integration, user acceptance (UAT) and regression testing. Having a strong foundation in JCL (Job Control Language), I have developed, maintained, and optimized many applications, guaranteeing reliable and efficient performance in a range of IT environments. Because of my extensive expertise in JCL, COBOL, CICS, and DB2, I can provide complete, integrated solutions. Worked on source version control tools such as Subversion (SVN), TFS and GIT. Designed execute features of ATDD/BDD using selenium and cucumber. Expertise in developing automation scripts in BDD format using cucumber and proficient in writing scenarios in GHERKIN format. Ample knowledge on Apache Kafka, Apache Storm to build data platforms, pipelines, and storage systems and search technologies such as Elastic search. Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services Expertise in writing custom Kafka consumer code and modifying existing producer code in Python to push data to Spark-streaming jobs. Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features. Shown excellent resilience by successfully adjusting to shifting project needs, resolving technical difficulties, and sustaining high performance under duress, all of which contributed to the success and continuity of the project. With a great deal of knowledge in data validation, I have created and put into place dependable processes to ensure data reliability, precision, and integrity in a variety of situations and projects. I have extensive development, execution, and reporting experience in all phases of data validation.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Sr. Test Data Engineer

CareFreeIT
01.2023 - Current
  • I possess a very good understanding of logical and physical data modeling and creation of star schema for
  • Enterprise Data Warehouse with multi-dimensional data and data mart
  • Good knowledge in Technologies on systems which comprise of massive amount of data running in highly distributive mode in Cloudera, Horton works Hadoop distributions and Amazon AWS
  • Developed Oozie workflows for scheduling and orchestrating the ETL process
  • Involved in writing Python scripts to automate the process of extracting weblogs using Airflow DAGs
  • Implemented airflow for scheduling and monitoring workflows and architecting complex data pipelines
  • Practical Understanding of data modeling concepts like star-schema modeling, Snowflake
  • Wrote pyspark scripts to apply hard qujessicaty checks on data at record level to generate reports to end users
  • Collaborated with cross-functional teams, including data analysts, database administrators, and business stakeholders, to ensure data validation processes align with business requirements and objectives
  • Strong data visualization and reporting skill to present insight from the analysis
  • Good Knowledge on architecture and components of Spark, and efficient in working with Spark Core, Spark
  • SQL, Spark streaming and expertise in building Spark and Spark-Scala applications for interactive analysis, batch processing and stream processing
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage
  • Designed and developed NoSQL solutions for all users
  • Managed and maintained Oracle and NoSQL databases in production domain
  • Evaluated system performance and validated NoSQL solutions
  • Consulting on Snowflake Data Platform Solution Architecture, Design, Development, and deployment focused to bring the data driven culture across the enterprises
  • Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in snowflake
  • Stage the API or Kafka Data (in JSON file format) into Snowflake DB by Flattening the same for different functional services
  • Created data sharing between two snowflake accounts and Reports in Looker based on Snowflake
  • Connections
  • Involved in the project life cycle including the design, development, and implementation of verifying data received in the data lake
  • Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, Text Files into
  • AWS Redshift
  • Designed, developed, and implemented complex ETL workflows using Informatica PowerCenter to extract, transform, and load data from various sources into the EDW
  • Configured and optimized Informatica PowerCenter mappings, sessions, and workflows to ensure efficient data processing and high performance
  • Managed the migration of ETL processes from legacy systems to Informatica PowerCenter, ensuring minimal disruption and downtime
  • Analyzed impact changes on existing ETL/ELT processes to ensure timely completion and availability of data in data warehouse for reporting use
  • Design, build and manage the ELK (Elasticsearch, Logstash, Kibana) cluster for centralized logging and search functionalities for the App
  • Connected Tableau to diverse data sources, including SQL databases, cloud storage (AWS Redshift), and
  • Excel files
  • Ensured seamless data integration and regular updates to maintain data freshness
  • Used elastic search for storing and querying large data in object-oriented structure and Log stash for filtering tags to visualize the results over Kibana
  • Developed python code to gather the data from HBase and design the solution for implementing using spark
  • Built a real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift
  • I was responsible for the implementation and management of a data catalog
  • Developed logical and physical data flow models for Informatica ETL applications
  • Added support for AWS S3 and RDS to host static & media files and the database into amazon cloud
  • Worked on creation of customer Docker container images, tagging, and pushing of data images
  • Created and executed a Job Stream and added job definitions in Control-M
  • Utilized GitHub's robust version control system to manage code changes, track revisions, and maintain a clear history of project development
  • Written shell scripts to extract data from Unix servers into Hadoop HDFS for long-term storage
  • Worked extensively on building data pipelines in docker container environment in development phase
  • Hadoop, Spark, Hive, Native,Teradata, Tableau, Linux, Python, Kafka, Snowflake
  • AWS S3 Buckets, AWS Glue, NIFI, Post Grass, AWS EC2, Oracle PL/SQL, AWS stack, Development tool kit (JIRA
  • Bitbucket/Git, Service now etc.

Sr. Data Engineer

Equitable Bank
09.2022 - 12.2022
  • Hands on experience in Azure Storage - Storage accounts, blob storage, and Azure SQL Server
  • Explored on the Azure storage accounts like Blob storage
  • Experience in building, deploying, troubleshooting data extraction for huge number of records using Azure
  • Data Factory (ADF)
  • Working on service-oriented architecture and experience of the Release Management process with CI/CD pipelines using Azure DevOps
  • Worked on Microsoft Azure services like HDInsight Clusters, BLOB, ADLS, Data Factory and Logic Apps and done POC on Azure Data Bricks
  • Set up and configured the Databricks environment, ensuring seamless integration with AWS S3 for data storage and retrieval
  • Developed and optimized Spark jobs on Databricks for processing large datasets, improving ETL performance by 30%
  • Utilized Delta Lake on Databricks for efficient and reliable data storage, ensuring ACID transactions and scalable metadata handling
  • Analyzing data from Celonis and other dashboards and systems
  • Designed, developed, and managed data pipelines using Databricks Notebooks to extract, transform, and load (ETL) data from diverse sources
  • Leveraged Databricks Notebooks to write, debug, and optimize Spark jobs, ensuring efficient data processing and analytics
  • Creating visualization using Celonis to find and define new opportunities
  • Workedonmaintenance and extending the existing Celonis data processing pipeline
  • Automated CI/CD pipeline using Jenkins, build-pipeline-plugin, Maven, and GIT
  • Building/Maintaining Docker container clusters managed by Kubernetes
  • Utilization of Kubernetes and
  • Docker for the runtime environment of the CI/CD system to build, test and deploy
  • Developed data warehouse model in snowflake for over 100 datasets using where Scape and Creating
  • Reports in Looker based on Snowflake
  • Conducted thorough data analysis using tools such as Python, SQL, and Excel to uncover trends, patterns, and anomalies
  • Focused on extracting meaningful insights that aligned with business objectives
  • Consulting on Snowflake Data Platform Solution Architecture, Design, Development, and deployment focused to bring the data driven culture across the enterprises
  • Worked on Parquet files, CSV files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement
  • Worked on data Pipeline to process large set of data and configured Lookup’s for Data Validation and
  • Integrity
  • Implemented airflow for scheduling and monitoring workflows and architecting complex data pipelines
  • Used Elastic search for powering not only Search but using ELK stack for logging and monitoring our systems end to end Using Beats, participated in problem resolving, change, release, and event management for ELK stack
  • Worked with Sqoop import and export functionalities to handle large data set transfer between Oracle database and HDFS
  • Involved insubmitting and tracking spark jobs using Dkron
  • Involved indevelopmentand testing of both SOAPand REST Service using RestAssured andtested both XML and JSON format
  • Involved increating Dkron workflow andCoordinate jobs to kick off the jobs on time anddata availability
  • Developed scripts using Spark which are used to load the data from Hive to Amazon RDS at a faster rate
  • Involved in loading the created SQL tables data into Spark-Redis for faster access of large customer base without taking Performance hit
  • Involved inconverting Hive/SQL queries into Spark (RDDs, Data frame and Dataset) using Python and Scala
  • Experience increating micro services using Scala programming
  • Knowledge of handling Hive queries using Spark SQL that integrates Spark environment
  • Implemented test scripts to support test driven development and continuous integration
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings
  • Knowledge of Azure DevOps and its process of creation of the tasks, pull requests, Git repositories
  • Environment: Spark, Scala, Hadoop, Hive, Sqoop, Play framework, Jenkins, NIFI, Azure Blob, ADLS, DataBricks
  • Azure stack.

Hadoop Developer

UnitedHealth Group
01.2022 - 07.2022
  • Involved in the complete life cycle Hadoop Implementation project specializing in but not limited to, writing
  • PIG queries, Hive Queries (HQL) and Sqoop to pull the log files
  • Gathered business requirements in meetings for successful implementation and POC (Proof-of- Concept) of
  • Hadoop and its ecosystem
  • Worked on Hadoop cluster scaling from 4 nodes in development/test environment to up to 200 nodes in
  • Production (2 edge nodes, 3 master nodes and 200 data nodes)
  • Developed some machine learning algorithms using Mahout for data mining for the data stored in HDFS
  • Experience working on python scripts
  • Experience in creating and designing the AWS cloud formation templates
  • Working experience with AWS services: Lambda, S3, EC2
  • Adept in statistical programming languages like Python and R including Big-Data technologies like Hadoop
  • HDFS, Spark and Hive
  • Experience in deploying elastic beanstalk applications to various environments on AWS
  • Developed Scalding (Scala), Hive, and Java/Python Map-reduce applications for analytics and machine learning at scale
  • Wrote sequel code to connect a process to Celonis
  • Understanding of execution solutions in Celonis
  • Used Flume extensively ingathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS)
  • Worked with Oozie Work-flow manager to schedule Hadoop jobs (cleanup jobs) to remove the duplicate log data in HDFS
  • Worked with Sqoop to load genomic research and experimental results data from Oracle database to Hadoop
  • Distributed File System (HDFS)
  • Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS
  • Implemented "Hive Collector Sink" which uses "Collector Sink" interface but takes Hive table as an extra argument to load data in HDFS to Hive table
  • Involved in Hadoop Name node metadata backups and load balancing as a part of Cluster Maintenance and
  • Monitoring
  • Used File System Check (FSCK) to check the health of files in HDFS
  • Configured log4j so that the audit log is written to a separate file and isn't mixed up with the Name node's other log entries
  • Monitored Nightly jobs to export data out of HDFS to be stored offsite as part of HDFS backup
  • Used Pig for analysis of large data sets and brought data back to HBase by Pig
  • Scheduled,monitored, and debugged various MapReduce nightly jobs using Oozie Workflow
  • Worked with various Hadoop Ecosystem tools like Sqoop, Hive, Pig, Flume, Oozie
  • Involved in End User Training, Launch and Adoption of Hadoop system
  • Environment: Hadoop2.2.0, Sqoop1.4.4, MySQL database, Oozie, Flume, Hive, Pig, Java, Eclipse Kepler.

Sr.Software Developer

WebFacial Technologies
07.2021 - 12.2021
  • As part of the overall Card Services product-offering GE provides several consumer card services reporting functions
  • At the heart of this reporting system is the Collection Data Warehouse
  • The Collection Data
  • Warehouse consists of several tables that house several years' worth of cardholder account and transactional data
  • Business partner First Data Corporation (FDR) provides a significant portion of GE Capital's consumer account management services
  • Provided ongoing support and maintenance for JCL scripts and COBOL applications, ensuring their functionality aligns with evolving business requirements and system updates
  • Analyzing the existing ETL Sass process
  • Worked closely with data modelers on Erwin
  • Preparing functional, technical document
  • Created new streams for handling the new feeds
  • Design and develop Abinadi Graph for processing Account/Statement /Transactional data
  • Worked on performance improvement of Abinadi graphs
  • Worked on Enhancement of graphs to incorporate new requirement and client additions
  • Meta data changes are made to handle the new columns and new feeds
  • Preparing various test cases for programs
  • End-to-End testing of the process for verifying the impacts caused due to the enhancements.

Software Developer

Growmore Infotech
01.2019 - 06.2021
  • Involved in various phases of Software Development Life Cycle (SDLC/SCRUM)
  • Worked with various types of controllers like simple form controller, Abstract Controller and Controller
  • Interface etc
  • Integrated Spring DAO for data access using Hibernate, used HQL for querying databases
  • DevelopedUI modules using HTML, JSP, JavaScript and CSS
  • Implemented the logging mechanism using Log4j framework
  • Experience working with OOP
  • Designed and developed batch processing using multi-threading to process payments
  • Used Eclipse as the IDE for developing the J2EE application
  • Involved in writing ANT scripts to build the application
  • Involved in production support and fixed the issues based on the priority
  • Developed Stored Procedures, Triggers and Functions in Oracle
  • Used Concurrent VersionSystem (CVS) as source control tool to keep track of the system state
  • Created and Configured Connection pools in WebSphere Application Server
  • Used JUnit for debugging, testing, and maintaining the system state
  • Environment: Java, JSP, Web Sphere Application Server, HTML, ANT, JUnit, CVS, Eclipse, Oracle.

Education

Bachelor of Technology - Computer Science

Gujarat Technological University
India

Associates Degree - Computer Science

Conestoga College
Canada

Skills

  • TECHNICAL SKILLS:
  • PROGRAMMING
  • LANGUAGES
  • Java, Scala, Python and Shell Scripting, Scala
  • BIGDATAECOSYSTEM Spark, Hive, HBase, SQOOP, Oozie, ELK, Storm, Flume, Pig, Kafka, Zookeeper
  • Play2MapReduce, celonis, Akka
  • CLOUD Snowflake, AWS EMR, EC2, S3, RDS, Daturic, Dataflow
  • Azure Data Factory, Blob Storage, Azure Data Lake, Data Processing
  • DBMS SQLServer, MySQL, PL/SQL, Graphql, Oracle, Database modelling, TerraData
  • PostghreSQL
  • NoSQL Databases Cassandra, Mongo DB
  • IDEs Eclipse,Visual Studio, VersionControl
  • Monitoring/Reporting
  • Tools
  • Wrike, What graph, Dash This
  • OPERATING SYSTEMS Windows, Unix, Linux, Solaris, CentOS
  • FRAMEWORKS MVC, Struts, Power BI, Maven, Junit, Log4J, ANT, Tableau, Qlik, Splunk, Aqua-
  • Data Studio
  • ETL Tools Databricks Lakehouse Platform, Five Tran
  • J2EE TECHNOLOGIES Spring, Servlets, J2SE, JSP, JDBC
  • METHODOLOGIES Agile, Waterfall, BDD, TDD, ATDD

Certification

Project 1#: JAN 2023 totill date Client: Carefree

Additional Information

  • Used python and Django to interface with the jQuery UI and manage the storage deletion of contest. Wrote pythonmodules to load / extract asset data from the MYSQL source database. Designed and implemented GraphQL schemas to define types, queries, mutations, and subscriptions. Used python script to update the content of the Database and manipulate the files. Experienced in data processing and analysis using Spark, HiveQL, and SQL. Extensive experience in Writing User Defined Functions (UDFs) in Hive and Spark Strong experience in database design, writing complex SQL Queries and Stored Procedures. In a fast-paced, unstructured atmosphere, experience working both alone and cooperatively to solve challenges and create high-quality outcomes. Designed and implemented machine learning and deep learning models using PyTorch. Prior experience designing software solutions to expand Big Data Platform capabilities. Knowledge of the different tools and frameworks used in the Hadoop ecosystem. (MapReduce, YARN, Pig, Hive, HBase, Zookeeper, Sqoop,) as well as NoSQL. Developed high-performance parallel computing applications using CUDA C/C++ to leverage GPU capabilities. Collaborated with data scientists and analysts to develop interactive and reusable Databricks Notebooks for exploratory data analysis and visualization. Integrated Databricks Notebooks with visualization tools like Tableau to create real-time dashboards and reports, providing stakeholders with actionable insights. Documented data workflows and best practices within Databricks Notebooks to facilitate knowledge sharing and collaboration among team members. Designed and developed RESTful and SOAP APIs to facilitate seamless communication between internal and external systems. Knowledge of ETL and relational database systems, as well as how to create and optimize them. Worked extensively over semi-structured data (fixed length & delimited files) for data sanitation, report generation and standardization. Used ELK (Elasticsearch, Logstash and Kibana) for name search pattern for a customer. Used Elastic search for powering not only Search but using ELK stack for logging and monitoring our systems end to end Using Beats. Responsible for designing and deploying new ELK clusters (Elastic search, Logstash, Kibana, beats, Kafka, zookeeper etc.) Extensive experience working with AWS Cloud services and AWS SDKs to work with services like AWS API Gateway, Lambda, S3, IAM and EC2. Experienced inmonitoring Hadoop cluster using Cloudera Manager and Web UI.

Timeline

Sr. Test Data Engineer

CareFreeIT
01.2023 - Current

Sr. Data Engineer

Equitable Bank
09.2022 - 12.2022

Hadoop Developer

UnitedHealth Group
01.2022 - 07.2022

Sr.Software Developer

WebFacial Technologies
07.2021 - 12.2021

Software Developer

Growmore Infotech
01.2019 - 06.2021

Bachelor of Technology - Computer Science

Gujarat Technological University

Associates Degree - Computer Science

Conestoga College
Nidhi Bhuva