Summary
Overview
Work History
Education
Skills
Certification
Personal Information
Timeline
Generic

TRIBHUVAN C

Harrison

Summary

Accomplished Data Engineer with over 5 years of experience leveraging Python, R, SQL, AWS, and Tableau to drive data-driven decision-making and enhance healthcare outcomes. Designed and implemented scalable data lake solutions using AWS Lake Formation, Azure Data Lake, and Google Cloud Storage, enabling 40% faster data processing and real-time analytics for business decision-making. Skilled in creating detailed JIRA stories, Visio diagrams, and PowerPoint presentations to facilitate project execution. Adept at working with AI/ML modules, risk management frameworks, and backend SQL-based systems in banking and financial environments. Engineered distributed data pipelines with Apache Spark, Databricks, and Hadoop HDFS, ensuring efficient processing of terabyte-scale datasets with advanced fault-tolerance mechanisms. Proficient in Python, SQL, and TypeScript, with expertise in version control systems like Git. Experienced in designing and maintaining ETL pipelines using Apache Spark, AWS Glue, and Airflow. Developed and optimized serverless ETL workflows using AWS Glue, Azure Data Factory, and GCP Dataflow, achieving high data quality and seamless integration of structured and unstructured data sources. Proficient in database design, administration, and optimization using NoSQL platforms (MongoDB, DynamoDB) and SQL databases (SQL Server, Snowflake, PostgreSQL) to ensure efficient, scalable data storage solutions. Expert in T-SQL development, including writing complex stored procedures, triggers, views, and performance tuning queries to optimize transaction processing in SQL Server. Implemented event-driven architectures with Apache Kafka, AWS Kinesis, and Azure Event Hubs, enabling real-time data streaming and analytics pipelines for critical applications. Enhanced data governance and schema management with Delta Lake, Apache Iceberg, and Hive Metastore, ensuring reliable and versioned data pipelines. Automated data ingestion pipelines using PySpark, Airflow DAGs, Terraform, and SSIS, reducing operational overhead and improving workflow efficiency. Optimized SQL query performance through advanced techniques like indexing, partitioning, and query refactoring in SQL Server, Redshift, and BigQuery, ensuring minimal query latency. Built dynamic data visualization dashboards with Tableau, Power BI, and Python libraries (Matplotlib, Plotly, Seaborn) to support actionable insights and informed decision-making. Applied Natural Language Processing (NLP) and Transformer-based models (BERT, GPT) to process unstructured text data, enabling advanced analytics in sentiment analysis and document automation workflows. Designed and deployed cloud-native architectures using AWS S3, Google BigQuery, Azure Synapse Analytics, and Terraform, enabling seamless data pipeline integration across hybrid cloud environments. Architected and optimized data warehouse solutions with Snowflake and Azure Synapse, implementing dimensional modeling, partitioning, and OLAP for large-scale analytical operations. Collaborated with DevOps teams to establish CI/CD pipelines for data engineering workflows using GitLab CI, Jenkins, Docker, Kubernetes, and Ansible, improving deployment reliability and scalability. Built real-time stream processing systems using Apache Flink, Spark Streaming, and Kafka Streams, ensuring sub-second latency for time-sensitive data analytics. Developed end-to-end machine learning model pipelines with tools like MLflow, TensorFlow Extended (TFX), and Kubeflow, integrating predictive capabilities into production environments.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Data Engineer

Accurate Healthcare
New York
08.2024 - 12.2025
  • Built and maintained automated ETL pipelines using Python, SQL, and AWS Glue, reducing data processing time by 35%.
  • Collaborated with data scientists and analysts to architect optimized data structures, enhancing analytics and reporting capabilities by 30% across healthcare datasets.
  • Spearheaded the adoption and implementation of AWS Glue for scalable ETL workflows, streamlining data integration processes and reducing ETL execution times by 35%.
  • Created and maintained comprehensive JIRA stories to streamline developer workflows, ensuring clear and actionable requirements.
  • Leveraged big data frameworks such as Apache Spark, Hadoop, and Databricks to process and analyze datasets exceeding 1TB, cutting data processing times by 50% and supporting advanced analytics.
  • Implemented robust version control systems using Git and SVN, fostering seamless collaboration, improving code quality, and enhancing team productivity by 40%.
  • Utilized development environments like Jupyter Notebook, Eclipse, and Spyder for efficient data exploration, ETL pipeline development, and data model evaluation in healthcare-focused projects.
  • Designed and deployed scalable, secure data storage solutions using Amazon S3, ensuring high availability, integrity, and fault tolerance for critical healthcare applications.
  • Built automated data pipelines to transform raw data into structured formats suitable for reporting and machine learning, improving data accessibility and operational workflows.
  • Ensured compliance with data governance and security policies by implementing role-based access controls and encryption standards for sensitive healthcare data.
  • Conducted performance tuning for ETL workflows and big data solutions, optimizing resource utilization and achieving substantial gains in operational efficiency.

Research Intern – Data Scientist

NJIT
05.2023 - 10.2023
  • Engineered scalable data ingestion pipelines on Azure HDInsight using Azure Data Factory and Spark SQL, enabling seamless transformation and integration of structured and unstructured data into Azure Synapse Analytics for advanced analytics.
  • Designed and implemented Hadoop MapReduce workflows in Java for comprehensive data cleaning and preprocessing, ensuring high data quality and optimized performance in large-scale distributed environments.
  • Assisted in AI/ML research initiatives, integrating statistical modeling into business workflows.
  • Installed, configured, and managed Hadoop ecosystem components, including MapReduce and HDFS, while monitoring and maintaining cluster performance using Cloudera Manager to ensure system reliability and efficiency.
  • Leveraged Terraform to automate the provisioning and versioning of Azure infrastructure, streamlining resource management and enhancing operational consistency and scalability.
  • Optimized the performance of Spark SQL queries in Azure HDInsight, ensuring faster data processing and reduced latency in real-time and batch workloads.
  • Collaborated with cross-functional teams to design robust ETL workflows on Azure, leveraging best practices in cloud-based data architecture and security.
  • Ensured adherence to industry standards in data governance and cloud infrastructure management by implementing role-based access controls and policy-driven compliance solutions.

Project Engineer

Wipro Technology
02.2020 - 03.2023
  • Designed and implemented end-to-end ETL pipelines using Apache NiFi, Apache Spark, and Hadoop ecosystem tools, enabling efficient data transformation, real-time processing, and integration across multiple data sources, reducing processing time by 30%.
  • Built scalable data architectures on cloud platforms (AWS, Azure, GCP) using AWS Glue, Azure Data Factory, and Google Dataflow, automating data ingestion and streamlining data processing workflows to enhance data accessibility and availability.
  • Created structured JIRA stories to support developer execution of business requirements.
  • Developed and maintained ETL pipelines using Apache Spark, AWS Glue, and Azure Data Factory, enhancing data integration workflows.
  • Developed real-time data streaming solutions leveraging Apache Kafka and Apache Spark Streaming, ensuring low-latency processing and real-time analytics for business-critical applications, improving decision-making capabilities.
  • Optimized data storage and management solutions by implementing data lakes and data warehouses on AWS S3 and Azure Data Lake, ensuring high availability, fault tolerance, and efficient storage of large datasets.
  • Enhanced query performance and reduced storage costs by applying data partitioning, indexing, and compression techniques within large-scale data environments, supporting efficient big data analytics.
  • Engineered and maintained automated data pipelines using Python, Scala, SQL, and Spark SQL, ensuring the efficient extraction, transformation, and loading (ETL) of data into both structured and unstructured databases for business intelligence and reporting.
  • Collaborated with cross-functional teams to define, design, and implement optimized data models and frameworks that supported seamless data integration, enhanced reporting, and improved business analytics capabilities.
  • Ensured data governance, security, and compliance by implementing data access control policies, encryption, and auditing mechanisms across cloud and on-premises data systems, safeguarding sensitive business information.
  • Applied machine learning techniques to automate data quality checks within the data pipeline, enhancing predictive analytics and driving actionable insights that informed strategic business decisions.
  • Implemented and managed CI/CD pipelines for data engineering workflows using Jenkins and GitLab, ensuring continuous integration, testing, and deployment of data solutions within an agile environment, streamlining data pipeline development and maintenance.

Education

Master of Science - Data Science

New Jersey Institute of Technology
Newark, USA
05.2024

Bachelor of Technology - Computer Engineering

KMIT
Hyderabad, India
07.2022

Skills

  • Apache Spark
  • Databricks
  • Hadoop HDFS
  • Apache Flink
  • Kafka
  • AWS Glue
  • Azure Data Factory
  • GCP Dataflow
  • C
  • C
  • Python
  • R
  • SQL
  • T-SQL
  • Java
  • TypeScript
  • JavaScript
  • Scala
  • Power Shell
  • SAS
  • SQL Server
  • PostgreSQL
  • Snowflake
  • MySQL
  • Oracle
  • MongoDB
  • DynamoDB
  • SSIS
  • Alteryx
  • Airflow DAGs
  • Azure Synapse Pipelines
  • Informatica
  • JIRA
  • Visio
  • PowerPoint
  • Docker
  • Kubernetes
  • AWS Redshift
  • AWS Athena
  • AWS PageMaker
  • AWS S3
  • Azure Synapse
  • Azure Data Lake
  • Azure Event Hubs
  • Google Cloud Platform BigQuery
  • AWS IAM
  • EC2
  • SES
  • CloudFront
  • Lambda
  • Route53
  • VPC
  • CloudTrail
  • CloudWatch
  • SNS
  • SQS
  • Amazon Kinesis Data Streams
  • Dimensional Modeling
  • OLAP/OLTP Systems
  • Delta Lake
  • Apache Iceberg
  • Hive Metastore
  • Apache Hadoop
  • MapReduce
  • Hive
  • Pig
  • Kafka Streams
  • Spark Streaming
  • Pandas
  • NumPy
  • MatplotLib
  • SciPy
  • Scikit-Learn
  • SeaBorn
  • PyTorch
  • TensorFlow
  • Ggplot2
  • Plotly
  • Keras
  • LangChain
  • OpenCV
  • Visual Studio Code
  • Jupyter Notebook
  • PyCharm
  • Tableau
  • Power BI
  • Machine Learning
  • Deep Learning
  • NLP
  • Feature Engineering
  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning
  • MLflow
  • Kubeflow
  • Transformer Models
  • BERT
  • GPT
  • ARIMA
  • GARCH Models
  • Statistical Testing
  • Predictive Modeling
  • Git
  • GitHub
  • Jenkins
  • GitLab CI/CD
  • Terraform
  • Ansible
  • Data Lake Management
  • Schema Management
  • Data Versioning
  • Advanced Query Optimization
  • Confluence
  • Windows
  • MacOS
  • Linux
  • A/B Testing
  • Hypothesis Testing
  • Critical Thinking
  • Communication Skills
  • Presentation Skills
  • Problem-Solving
  • Corporate Tax
  • Query
  • Siebel CRM
  • Salesforce
  • Intermediate to Advanced SQL
  • Data Domain Expertise
  • Clinical Reporting Interpretation
  • Automation with R or Python

Certification

  • Azure Data Scientist Associate, 02/23, 02/25
  • Azure Power BI Data Analyst, 03/24, 03/26

Personal Information

Relocation: Open for Relocation

Timeline

Data Engineer

Accurate Healthcare
08.2024 - 12.2025

Research Intern – Data Scientist

NJIT
05.2023 - 10.2023

Project Engineer

Wipro Technology
02.2020 - 03.2023

Master of Science - Data Science

New Jersey Institute of Technology

Bachelor of Technology - Computer Engineering

KMIT
TRIBHUVAN C