Summary
Overview
Work History
Education
Skills
Certification
Personal Information
Timeline
Generic

Dibu Benjamin

Dallas,USA

Summary

Data engineer with 12+ years of experience in designing, building, and maintaining data infrastructure. Expertise in Fivetran, DBT, Snowflake, and AWS, with a strong record of automating ETL processes and developing scalable data models. Proven ability to optimize data warehouse performance and collaborate effectively with analysts, scientists, and stakeholders. Strong problem-solving skills and clear communication of complex technical concepts.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Senior Data Analytical Engineer

Baylor Scott & White
Dallas, TX
10.2023 - Current
  • Designed and implemented scalable and efficient data pipelines on Azure cloud platform, utilizing services such as Azure Data Factory and Azure Databricks.
  • Developed and maintained data models using DBT (Data Build Tool), ensuring consistency and accuracy of analytical datasets.
  • Collaborated with cross-functional teams to gather requirements, design data solutions, and deliver insights to stakeholders.
  • Optimized Snowflake data warehouse performance through efficient schema design, query tuning, and resource utilization management.
  • Developed, tested, and scheduled data validation audits using SQL Stored Procedures, Views, and Functions to ensure data accuracy and completeness.
  • Analyzed large and complex datasets to identify trends, patterns, and anomalies, driving actionable business insights.
  • Implemented data governance policies and best practices to ensure data quality, security, and compliance with regulatory requirements.
  • Developed end-to-end ETL processes using python and Snowflake, ensuring data accuracy, integrity, and availability.
  • Developed RESTful APIs to facilitate data access and integration with external systems.
  • Leveraged Python for data processing, analysis, and building robust backend systems.
  • Conducted performance tuning and optimization of data pipelines to improve throughput and reduce latency.
  • Designed and implemented complex DBT models to transform raw data into actionable insights, leveraging advanced SQL techniques and best practices.
  • Designed and implemented DBT custom macros and Jinja functions to automate reusable transformations, improving code modularity and reducing duplication across multiple data models.
  • Assisted in building scalable data warehousing solutions using Kimball's approach, focusing on dimensional modeling for various business processes.
  • Developed DBT functions for dynamic model configuration, enabling parameterized logic for incremental models, partitioning, and dynamic schema generation based on environment.
  • Integrated DBT reusable functions to standardize audit columns, metadata tracking, and data quality checks across multiple fact and dimension models.
  • Implemented snapshotting strategies in DBT to enable historical analysis and data versioning, improving data integrity and auditability.
  • Worked closely with data analysts and data scientists to understand their requirements and provide them with access to clean and reliable data.
  • Built streamlit to collect config information in metadata.
  • Used snowpipe for extract and transformation.
  • Created dimension and fact tables aligned with Kimball's principles to enhance the organization's reporting and analysis capabilities.

Data Analytical Engineer

Tinuiti
05.2022 - 10.2023
  • Configured and monitored Stitch connectors to extract data from diverse sources such as databases, APIs, and cloud services, ensuring data accuracy and consistency.
  • Bolstered the engineering capabilities of the Data & Analytics team by designing a DBT development workflow that enables analysts without software engineering backgrounds to build models and leverage the power of the data warehouse.
  • Designed and developed data models for analytical purposes, implementing star and snowflake schemas on Redshift for efficient data retrieval.
  • Developed and maintained ETL workflows using AWS Glue/Apache Airflow for data ingestion, transformation, and loading into Redshift.
  • Design and build end-to-end data pipelines using modern data stack technologies, including Snowflake, dbt, and Airflow.
  • Developed and optimized complex SQL queries and stored procedures in PostgreSQL, leading to a 30% improvement in data retrieval times for reporting.
  • Utilized FiveTran to establish seamless, real-time data syncing between various data sources and the data warehouse, enabling the organization to make data-driven decisions with up-to-date information.
  • Managed FiveTran connectors and configurations to extract, transform, and load data from sources such as SaaS applications, databases, and cloud platforms into the data warehouse.
  • Developed custom transformation logic within FiveTran to preprocess data during the loading process, aligning it with specific business requirements and data modeling standards.
  • Utilized advanced SQL knowledge to query and optimize relational databases like RDS and DynamoDB, handling large-scale datasets.
  • Developed complex ETL workflows in Glue ETL to transform raw data from various sources into curated datasets, ensuring consistency and reliability for downstream analytics.
  • Leveraged Python packages like pandas and data frames for data manipulation, UDF creation, API integrations, and performance optimization.
  • Implemented best practices for data lineage, metadata management, and data quality checks using AWS Glue Catalog, ensuring data governance and compliance standards.
  • Employ Kubernetes to orchestrate and manage containerized applications, enhancing scalability and resource utilization within the data infrastructure.
  • Implemented data quality checks and validation processes to ensure the accuracy and consistency of data.
  • Integrated Redshift with other AWS services, such as S3, Lambda, and CloudWatch, to build end-to-end data pipelines.
  • Developed streaming applications that accept messages from Amazon AWS Kinesis queues and Experience in building efficient pipelines for moving data between Bigquery and AWS Redshift.
  • Imported the customer data into Python using Pandas libraries and performed various data analyses - found patterns in data which helped in key decisions for the company.

Senior Data Analytical Engineer

Bestow
02.2020 - 08.2021
  • Designed, implemented, and managed automated ETL processes using Fivetran, resulting in a 30% increase in data extraction efficiency from diverse sources.
  • Ensured data accuracy and efficiency in transformation and loading into the Snowflake data warehouse, reducing data processing time by 25%.
  • Utilized dbt to develop and maintain complex data models supporting key business requirements, enhancing reporting capabilities by 40%.
  • Implemented dbt best practices, including model versioning, documentation, and testing, improving overall data reliability and transparency.
  • Developed and maintained a comprehensive dbt project structure, ensuring organized and reusable SQL code, which facilitated easier maintenance and scalability.
  • Automated data quality checks and validation processes using dbt tests, significantly reducing the incidence of data errors and inconsistencies.
  • Created custom dbt macros to streamline repetitive tasks and enhance code reusability, improving development efficiency.
  • Maintained and optimized the Snowflake data warehouse, achieving a 20% improvement in query performance and data retrieval times.
  • Implemented best practices for data warehousing, including effective schema design, indexing, and query optimization.
  • Managed and maintained PostgreSQL databases, ensuring high availability and efficient data processing.
  • Designed and implemented Snowflake clustering and partitioning strategies, resulting in a 35% reduction in query response times.
  • Managed Snowflake access controls and security policies, ensuring compliance with organizational and regulatory standards.
  • Developed automated scripts to monitor and manage Snowflake usage and performance, reducing operational costs by 15%.
  • Utilized Snowflake's time travel and cloning features to ensure data recovery and backup solutions, enhancing data reliability and availability.
  • Designed and managed data infrastructure on AWS, including setting up and optimizing Amazon Redshift clusters for scalable data warehousing.
  • Integrated Snowflake with various AWS services such as S3 for data storage and Lambda for serverless data processing, streamlining workflows.
  • Utilized AWS Glue for data cataloging and ETL operations, improving data discoverability and transformation efficiency.
  • Monitored ETL pipelines and data workflows to ensure smooth and efficient operations, reducing downtime by 15%.
  • Troubleshot and resolved issues in ETL processes and the data warehouse promptly, maintaining high system reliability.

Senior BI Consultant

Saipem
Chennai, India
09.2017 - 12.2019
  • Lead the redesigned Data warehouse project and key contributor to data modeling and development strategies to meet the data and analytics demands of today's enterprise.
  • Implemented marketing strategies that resulted in a 12% growth in sales revenue. Implemented Analytics Module which resulted in an estimated reporting/analytics utilization increase of 25%. The project required expertise in ELT, security implementation, python scripting, and data modeling design.
  • Developed Python scripts to parse and analyze XML data from various sources, including web services and databases.
  • Utilized libraries such as ElementTree and xml to extract and transform XML data, and integrated the results into data visualization tools like Tableau and Power BI for easy interpretation and reporting.
  • Worked on ETL tool Informatica Power Center to transfer data from different source systems to the warehouse.
  • Played a key role in migrating on-premises ETL processes to AWS cloud infrastructure, resulting in reduced operational costs and increased scalability.
  • Migrated on-premises data to Redshift.
  • Collaborated with application developers to design and optimize NoSQL database schemas based on application requirements.
  • Build monitoring dashboards that helped with alerting which increased visibility and also improved the data quality of the reports provided to stakeholders.
  • Developed data pipelines which included data transformation, extract, and load processes.
  • Chennai, India

Data Analyst

CCC
Dubai, UAE
01.2015 - 08.2017
  • Developed creative dashboards and reports for the Finance / Marketing department using Tableau and OBIEE.
  • Solely developed and configured data pipelines and did the end-to-end process of building data pipelines and creating dashboards using tools like Informatica and Tableau.
  • Used Tableau with R to create reports for internal teams or external clients and use appropriate tools to identify Data Insights by applying mathematical models and statistical techniques and derive insights that are of value to Business.
  • Worked on data exploration, and data cleaning methods using R. Have used R packages such as pylr, ggplot2, and plotly for data manipulation visualization and dplyr for data wrangling.
  • Collaborate with team members to collect and analyze data from several different sources using Python programming.
  • Improved the performance of the Informatica Mappings and helped in tuning the daily Incremental ETL.
  • Worked as a Data Warehouse Developer implementing end-to-end project requirements from ETL to data modeling and report development.
  • Proposed technical feasibility solutions for new functional designs and suggested options for performance improvement of technical objects.
  • Create presentations and write and review reports based on recommendations and findings and present status to senior management.
  • Dubai, UAE

Junior Data Analyst

Airon Technical Solutions
, India
10.2012 - 10.2014
  • Assisted in the design and development of ETL processes, contributing to more efficient data extraction and loading routines.
  • Supported data modeling and transformation tasks, ensuring data accuracy and consistency.
  • Participated in maintaining and optimizing the data warehouse, learning best practices in schema design and query optimization.
  • India

Education

Masters - Computer Engineering

University of Central Florida

Bachelors - Applied Electronics & Instrumentation Engineering

MG University
Kerala, India

Skills

  • Python
  • SQL
  • PL SQL
  • C
  • Oracle SQL
  • PostgreSQL 93
  • MySQL
  • SQL-Server
  • AWS Redshift
  • Snowflake
  • Bigquery
  • Teradata
  • MS Office (Word/Excel/PowerPoint/Visio/Outlook)
  • Crystal Reports XI
  • SSRS
  • Cognos 70/60
  • Tableau
  • Microsoft Power BI
  • Qlik Sense
  • SSIS
  • SSAS
  • Business Intelligence Development Studio (BIDS)
  • OBIEE
  • Alteryx
  • SAP ADM
  • Talend
  • Informatica Powercenter
  • IBM Infosphere DataStage 85
  • Fivetran
  • Airbytes
  • AWS
  • GCP
  • Azure

Certification

  • Google Data Analytics Professional
  • Oracle Business Intelligence Foundation Suite 11gs

Personal Information

Title: Data Engineer

Timeline

Senior Data Analytical Engineer

Baylor Scott & White
10.2023 - Current

Data Analytical Engineer

Tinuiti
05.2022 - 10.2023

Senior Data Analytical Engineer

Bestow
02.2020 - 08.2021

Senior BI Consultant

Saipem
09.2017 - 12.2019

Data Analyst

CCC
01.2015 - 08.2017

Junior Data Analyst

Airon Technical Solutions
10.2012 - 10.2014

Masters - Computer Engineering

University of Central Florida

Bachelors - Applied Electronics & Instrumentation Engineering

MG University
Dibu Benjamin