Summary
Overview
Work History
Education
Skills
Certification
Websites
Timeline
Generic

RAHUL SINGH

Canton,MA

Summary

Experienced and highly skilled Data Architect / Data Engineer with a solid track record of 12+ years in Data Architecture, Integration, ETL Development, Cloud Migration, Administration, and Engineering. Adept at successfully migrating diverse databases and ETL workflows to platforms such as Teradata Vantage ,Snowflake, AWS ,Azure and GCP. As a versatile professional in roles including Data Engineer, ETL Developer, Data Architect and Technical Manager I possess a deep understanding of the entire Software & Data lifecycle. Meticulous in managing & leading technical team along with architecting solutions that precisely align with business requirements and implementing them with technical excellence.

Overview

12
12
years of professional experience
1
1
Certification

Work History

Senior Data Engineer/Data Architect

Verizon
Boston, MA
02.2024 - Current
  • Worked on multiple pilot POC projects to help Verizon move their Teradata data warehouse from on-prem to Big Query GCP Platform.
  • Implemented ETL process to streamline the import of data from various sources into BigQuery warehouse.
  • Optimized data pipelines to reduce costs by 30%, while ensuring data integrity and accuracy.
  • Invoked BigQuery Data Transfer Services to have the data transferred from Teradata database to BigQuery.
  • Design and implement data pipelines using GCP services such as Dataflow, Dataproc, and Pub/Sub.
  • Develop and maintain data ingestion and transformation processes using tools like Apache Beam and Apache Spark.
  • Worked on proof of concept migration projects in moving Teradata data marts to Snowflake.
  • Worked on Snowflake Stored Procedure, Snow Pipe, and Stream to replicate the ETL workflows.
  • Implemented Snowflake's security features, including role-based access control (RBAC), encryption, and multi-factor authentication, to ensure data privacy and compliance with industry regulations.
  • Configured network policies and Virtual Private Snowflake (VPS) for secure data access.
  • Orchestrated end-to-end data pipelines using Snowflake, integrating with tools like Apache Airflow and Data Build Tool (DBT) for workflow automation and data transformation.
  • Utilized Snowflake tasks and stored procedures for efficient and automated data processing workflows.
  • Configured auto-scaling policies to dynamically adjust warehouse size based on workload demands in Snowflake.
  • Implemented and managed Snowflake's data sharing functionality to securely and efficiently share data across different accounts, streamlining collaboration and data exchange.
  • Leveraged materialized views in Snowflake for performance optimization in shared datasets.
  • Worked on converting the SQL extracts in Teradata to Snowflake stored procedure.
  • Worked on using Snowpark Container services to run DBT Models securely.
  • Worked on sophisticated analysis using Snowpark to identify highest purchase, purchase history, and cost analysis.
  • Utilized Snowflake's time travel feature for seamless data history tracking and point-in-time recovery, ensuring data consistency and compliance with regulatory requirements.
  • Configured fail-safe mechanisms to prevent data loss and maintain data integrity during system failures or accidental changes.
  • Implemented and managed version control systems, ensuring a streamlined and controlled deployment process for data platform enhancements using GitLab.
  • Deployed CI/CD pipelines to automate testing and deployment, minimizing downtime and maximizing reliability using GitLab Actions.

Senior Data Engineer /Data Architect

Tredence
Boston, MA
09.2023 - Current
  • Led and Managed team of Data Engineers across various locations in designing and implementing data pipelines for efficient data ingestion, transformation, and delivery.
  • Worked as Subject Matter Expert on Cloud Data Architecture's (AWS,AZURE & GCP) as platforms and Cloud Data warehouse as Snowflake , Teradata Vantage & Big Query.
  • Implemented best practices for data engineering processes, resulting in improved reliability and scalability.
  • Employed strategic SQL techniques for seamless querying and manipulation of complex financial datasets critical for investment decision-making.
  • Ensured data accuracy and reliability for the fund's quantitative models and analysis.
  • Developed , Implemented and maintained Data pipelines using SnowSQL, Python and Airflow, ensuring timely and accurate data delivery in Snowflake Data warehouse from different heterogenous systems on premises and over cloud which helped the Investment & Capital Market team in achieving the SLA.
  • Managed Snowflake Data warehouse cloud infrastructure on Both Azure & AWS, optimizing performance and cost efficiency.
  • Utilized SQL and Tableau for data analysis and visualization, providing actionable insights to stakeholders.
  • Applied advanced Python programming skills, with a focus on Py-Spark and Pandas, to process and analyze large datasets efficiently on their Databricks and Snowflake Platforms
  • Developed custom algorithms using SQL & PySpark to meet the quantitative and analytical needs of the hedge fund and publishing the reports in Databricks Dashboards for the team to have live feeds.
  • Specialized in integrating alternative data sources in Snowflake Datawarehouse using Snowpipe contributing to a more holistic view of market trends and opportunities.
  • Engineered and fine-tuned Databricks clusters to balance performance and cost, utilizing auto-scaling and custom configurations for efficient resource allocation.
  • Implemented dynamic cluster management to adapt to varying workloads and optimize processing times in both Snowflake and Databricks .
  • Orchestrated data workflows by designing and scheduling Databricks jobs, optimizing parallel processing to handle large datasets efficiently.
  • Utilized Databricks notebooks for collaborative and reproducible analysis, incorporating version control for code management.
  • Implemented Delta Lake to enhance data reliability, ACID compliance, and versioning within Databricks, ensuring data consistency for critical business processes.
  • Utilized time travel feature for seamless data evolution in Snowflake and get point in time Snapshot of the data.
  • Designed and implemented real-time data processing using Databricks Structured Streaming, ensuring low-latency and high-throughput processing for time-sensitive applications.
  • Integrated streaming pipelines with Delta Lake and Delta Live Tables workflows for reliable, transactional streaming analytics.
  • Designed and implemented multi-cluster Snowflake warehouses to efficiently handle concurrent workloads, optimizing query performance and resource utilization.
  • Implemented Snowflake's security features, including role-based access control (RBAC), encryption, and multi-factor authentication, to ensure data privacy and compliance with industry regulations.

Data Architect/ Engineering Manager

Resideo Technologies
Canton, MA
09.2022 - 08.2023
  • Analyzed the SQL scripts and designed the solution to implement using Spark Framework in Python and Scala
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL and U-SQL Azure Data Lake Analytics
  • Data Ingestion to one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse
  • Implemented data integration and synchronization solutions between Snowflake and Databricks, ensuring seamless data transfer and synchronization across platforms
  • Worked with Terraform Templates to automate the Azure IaaS virtual machines using terraform modules and deployed virtual machine scale sets in production environment.
  • Leveraged advanced skills in Databricks to optimize and scale the platform for increased efficiency.
  • Demonstrated expertise in Python with a focus on Pyspark and Pandas for efficient data processing and analysis within the Databricks environment.

Senior Data Architect

Teradata
Atlanta, GA
10.2020 - 09.2022
  • Worked on Teradata on Premise Cloud Migration Plan to Snowflake Cloud over Azure ,AWS and GCP Platforms(Multiple Telecom Client).
  • Led the migration of Teradata on-premises data warehouse to Snowflake on AWS and Azure for a Telecom Client , optimizing query performance, reducing infrastructure costs, and improving data latency for enhanced business insights and reducing infrastructure costs by 20% .
  • Worked on designing, creating and implementing both Single & Multi Cluster Warehouses in Snowflake (AWS and AZURE ) as part of workload migration from Teradata on premises for a Telecom Client.
  • Creating Databases, Schemas, Tables, Event trigger and using functions flatten JSON to load tables in Snowflake.
  • Creating pipeline using Snowpipe to load the History data in Snowflake Stage as part of Migration.
  • Led the design and implementation of a real-time data processing system using Kafka and AWS Kinesis, resulting in a 30% reduction in data latency which help the Finance & Business team in making insightful decisions.
  • Leveraged the free trial version of Fivetran to replicate the data across Snowflake environments for a subset of data as part of Discovery phase and to provide recommendations to Business and Architecture Leadership team
  • Designed and Implemented DBT pipelines to transform raw data into analytics-ready datasets within Snowflake environment, ensuring data accuracy and consistency through modularized transformation pipelines.
  • Optimized DBT performance in Snowflake by implementing caching strategies, incremental model builds, and query optimization techniques, enhancing overall efficiency and scalability of data transformation.
  • Migrating an entire Teradata database to Snowflake on GCP using data pipelines in Airflow as part of Discovery and POC for a Telecom Client.
  • Working on the Solutions to Store data files in Google Cloud Buckets daily basis using DataProc and processing the bucket files in Snowflake to load the data in the tables.
  • Create CICD Pipelines in Azure DevOps , Jenkins and GitLab for continuous integration and continuous development on different cloud platforms like Azure , AWS & GCP

Associate Data Architect

Wells Fargo
Charlotte, NC
03.2020 - 10.2020
  • Working on the Encryption of all the Sensitive and Non-Sensitive customer data using third party encryption Algorithm in Teradata
  • ER & Dimension modeling & Design Data Models using ERwin Data Modeler tool
  • Manipulated data using pivot tables, pivot charts and macros in Excel
  • Extensive Experience in using Fast Load, Fast Export, TPT, TPump, Multi Load and BTEQ
  • Directed development of project scope, including estimates, budgets, and schedules.

Lead Data Engineer

CGI
Lafayette, LA
06.2017 - 03.2020
  • Configuring Teradata Database for Customer Messages and Storing the Healthcare Information in Teradata Table.
  • Worked on Discovery POC to migrate Teradata Database (5 TB) of Data over Teradata Vantage Cloud on AWS platform as part of Cloud Initiation Project .
  • Creating Teradata NOS (Native Object Store ) and using them in Teradata Views for Business replication Model.
  • Worked on Teradata CIM(Customer Interaction Manager) and RTIM(Real Time Interaction Manager) to create campaigns for marketing and capture the user responses based on the channels.
  • Performance Tuning using the Teradata Viewpoint and Teradata Active System Management to filter the bad running queries.
  • Worked on Code Review by using Teradata Statistics wizard.

Data Engineer

Teradata Corporation
Mumbai, Maharashtra
05.2012 - 05.2017
  • Closely with the Data Integration team to develop ETL solutions using various ETL tools like (Informatica, Data stage and Teradata tools Utilities)
  • Worked on Various Migration Projects as part of Migration COE .
  • Migrating DB2 Data Marts and Subject area codes in Teradata Environment using DataStage ETL and Teradata Database.
  • Loading and unloading Teradata tables using Multiload, Fast load, Fast export and BTEQ export utilities scripts.
  • Performance tuning for 22 BTEQ running within the time window SLA.
  • Loading tables from Oracle to Teradata using Fast clone tool to complete the history load for ODS and Datamart Layer.
  • Installed Teradata Query Grid to unload and load data from HDFS to Teradata
  • Monitoring the Database through Teradata Viewpoint.

Education

Bachelors of Engineering - Information Technology

Mumbai University
Mumbai
06.2011

Skills

  • Teradata Database 12-1720 (TPT, BTEQ, TTU, Stored Procedures)
  • Teradata Vantage
  • ETL ( DataStage,Informatica)
  • Business Intelligence & Dashboard (Tableau & Looker)
  • CI/ CD (GitLab Actions, Jenkins)
  • Python(Pyspark, Pandas, Boto3)
  • RDBMS(DB2,SQL Server,Oracle,Teradata)
  • Cloud Migration
  • AWS(S3,Glue,SNS, SQS, Lambda Functions)
  • Databricks (Delta Tables, Delta Live Tables ,Dashboard & Unity Catalog)
  • Azure (Azure Data Factory ,Azure Synapse, Azure Terraform,ADLS Gen 2,Blob Storage,Data Explorer)
  • Snowflake (Snowpipe, Stage ,Stream, Task, Snowpark, Stored Procedures)
  • GCP(Cloud Storage Bucket, Data Proc )
  • Big Query( Query Development , BQ DTS)
  • Data build Tool (DBT) , Fivetran ,Airflow
  • Kafka Streaming
  • Data Modeling(Dimension, Data Vault 20)
  • Data Quality Management
  • SQL( Analytical ,Transactional)
  • Big Data Technologies( Hadoop , Hive , MapReduce , Apache Spark)

Certification

  • Teradata Certified Professional
  • Teradata Certified SQL developer
  • Teradata Certified Specialist
  • Amazon Web Services Solutions Architect
  • Databricks Lakehouse Fundamentals.

Timeline

Senior Data Engineer/Data Architect

Verizon
02.2024 - Current

Senior Data Engineer /Data Architect

Tredence
09.2023 - Current

Data Architect/ Engineering Manager

Resideo Technologies
09.2022 - 08.2023

Senior Data Architect

Teradata
10.2020 - 09.2022

Associate Data Architect

Wells Fargo
03.2020 - 10.2020

Lead Data Engineer

CGI
06.2017 - 03.2020

Data Engineer

Teradata Corporation
05.2012 - 05.2017

Bachelors of Engineering - Information Technology

Mumbai University
RAHUL SINGH