Summary
Overview
Work History
Education
Skills
Certification
References
Interests
Timeline
Generic
RAHUL SINGH

RAHUL SINGH

Canton,MA

Summary

Experienced and highly skilled Data Architect / Data Engineer with a solid track record of 11+ years in Data Architecture, Integration, ETL Development, Cloud Migration, Administration, and Engineering. Adept at successfully migrating diverse databases and ETL workflows to platforms such as Teradata Cloud, Snowflake, AWS and Azure SQL DB. As a versatile professional in roles including Data Engineer, ETL Developer, and Data Architect, I possess a deep understanding of the entire Software & Data lifecycle. Meticulous in architecting solutions that precisely align with business requirements and implementing them with technical excellence.

Overview

12
12
years of professional experience
1
1
Certification

Work History

Senior Data Architect / Engineer

Tredence
Boston, MA
09.2023 - Current
  • Prior experience as a Data Engineer within an investment team, showcasing a deep understanding of financial data requirements and industry-specific nuances.
  • Employed strategic SQL techniques for seamless querying and manipulation of complex financial datasets critical for investment decision-making.
  • Ensured data accuracy and reliability for the fund's quantitative models and analysis.
  • Applied advanced Python programming skills, with a focus on Pyspark and Pandas, to process and analyze large datasets efficiently on their Databricks and Snowflake Platforms
  • Developed custom algorithms to meet the specific quantitative and analytical needs of the hedge fund.
  • Specialized in integrating alternative data sources, contributing to a more holistic view of market trends and opportunities.
  • Applied innovative techniques to incorporate non-traditional datasets into the fund's analytical models.
  • Engineered and fine-tuned Databricks clusters to balance performance and cost, utilizing auto-scaling and custom configurations for efficient resource allocation.
  • Implemented dynamic cluster management to adapt to varying workloads and optimize processing times.
  • Orchestrated data workflows by designing and scheduling Databricks jobs, optimizing parallel processing to handle large datasets efficiently.
  • Utilized Databricks notebooks for collaborative and reproducible analysis, incorporating version control for code management.
  • Implemented Delta Lake to enhance data reliability, ACID compliance, and versioning within Databricks, ensuring data consistency for critical business processes.
  • Utilized time travel and schema evolution features for seamless data evolution.
  • Designed and implemented real-time data processing using Databricks Structured Streaming, ensuring low-latency and high-throughput processing for time-sensitive applications.
  • Integrated streaming pipelines with Delta Lake for reliable, transactional streaming analytics.
  • Designed and implemented multi-cluster Snowflake warehouses to efficiently handle concurrent workloads, optimizing query performance and resource utilization.
  • Configured auto-scaling policies to dynamically adjust warehouse size based on workload demands.
  • Implemented and managed Snowflake's data sharing functionality to securely and efficiently share data across different accounts, streamlining collaboration and data exchange.
  • Leveraged materialized views for performance optimization in shared datasets.
  • Utilized Snowflake's time travel feature for seamless data history tracking and point-in-time recovery, ensuring data consistency and compliance with regulatory requirements.
  • Configured fail-safe mechanisms to prevent data loss and maintain data integrity during system failures or accidental changes.
  • Implemented Snowflake's security features, including role-based access control (RBAC), encryption, and multi-factor authentication, to ensure data privacy and compliance with industry regulations.
  • Configured network policies and Virtual Private Snowflake (VPS) for secure data access.
  • Orchestrated end-to-end data pipelines using Snowflake, integrating with tools like Apache Airflow and Data Build Tool (DBT) for workflow automation and data transformation.
  • Utilized Snowflake tasks and stored procedures for efficient and automated data processing workflows.
  • Demonstrated a deep understanding of financial markets and instruments, acquired through extensive experience as a Data Engineer in hedge fund environments.
  • Collaborated closely with quantitative analysts and portfolio managers to align data solutions with the fund's investment strategies.
  • Implemented and managed version control systems, ensuring a streamlined and controlled deployment process for data platform enhancements using GITLAB.
  • Deployed CI/CD pipelines to automate testing and deployment, minimizing downtime and maximizing reliability using GITLAB Actions.

Senior Data Architect

Resideo Technologies
Canton, MA
09.2022 - 08.2023
  • Analyzed the SQL scripts and designed the solution to implement using Spark Framework in Python and Scala
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL and U-SQL Azure Data Lake Analytics
  • Data Ingestion to one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse
  • Implemented data integration and synchronization solutions between Snowflake and Databricks, ensuring seamless data transfer and synchronization across platforms
  • Worked with Terraform Templates to automate the Azure IaaS virtual machines using terraform modules and deployed virtual machine scale sets in production environment.
  • Leveraged advanced skills in Databricks to optimize and scale the platform for increased efficiency.
  • Demonstrated expertise in Python with a focus on Pyspark and Pandas for efficient data processing and analysis within the Databricks environment.

Senior Data Architect

Teradata
Atlanta, GA
10.2020 - 09.2022
  • Worked on Teradata on Premise Cloud Migration Plan to Snowflake Cloud over Azure & AWS as Platform
  • Worked on designing, creating and implementing both Single & Multi Cluster Warehouses in Snowflake as part of workload migration from Teradata on premises Datawarehouse environments
  • Creating Databases, Schemas, Tables, Event trigger and using functions flatten JSON to load tables in Snowflake
  • Creating pipeline in Snowflake to load the History data as part of Migration from Teradata on Premise to Snowflake Databases
  • Designed data pipelines using Fivetran & DBT to load & Transform data in Snowflake tables
  • Migrating an entire Teradata database to SNowflake on GCP using using data pipelines in Airflow.
  • Working on the Solutions to Store data files in Google Cloud Buckets daily basis using DataProc and processing the bucket files in Snowflake to load the data in the tables.
  • Spearheaded the migration of an on-premises data warehouse to Snowflake AWS optimizing query performance and reducing infrastructure costs by 40%
  • Led the design and implementation of a real-time data processing system using Kafka and AWS Kinesis, resulting in a 30% reduction in data latency and improved business insights
  • Create CICD Pipelines in Azure DevOps , Jenkins and GitLab for continuous integration and continuous development on different cloud platforms like Azure , AWS & GCP

Associate Data Architect

Wells Fargo
Charlotte, NC
03.2020 - 10.2020
  • Working on the Encryption of all the Sensitive and Non-Sensitive customer data using third party encryption Algorithm in Teradata
  • ER & Dimension modeling & Design Data Models using ERwin Data Modeler tool
  • Manipulated data using pivot tables, pivot charts and macros in Excel
  • Extensive Experience in using Fast Load, Fast Export, TPT, TPump, Multi Load and BTEQ
  • Directed development of project scope, including estimates, budgets, and schedules.

Lead Data Engineer

CGI
Lafayette, LA
06.2017 - 03.2020
  • Configuring Teradata Database for Customer Messages and Storing the Healthcare Information in Teradata Table
  • Worked on Teradata CIM(Customer Interaction Manager) and RTIM(Real Time Interaction Manager) to create campaigns for marketing and capture the user responses based on the channels
  • Performance Tuning using the Teradata Viewpoint and Teradata Active System Management to filter the bad running queries
  • Performance Tuning of long running queries to improve their performance
  • Worked on Code Review by using Teradata Statistics wizard.

Data Engineer

Teradata Corporation
Mumbai, Maharashtra
05.2012 - 05.2017
  • Closely with the Data Integration team to develop ETL solutions using various ETL tools like (Informatica, Data stage and Teradata tools Utilities)
  • Migrating DB2 Data Marts and Subject area codes in Teradata Environment using DataStage ETL and Teradata Database
  • Loading and unloading Teradata tables using Multiload, Fast load, Fast export and BTEQ export utilities scripts
  • Performance tuning for 22 BTEQ running within the time window SLA
  • Loading tables from Oracle to Teradata using Fast clone tool to complete the history load for ODS and Datamart Layer
  • Installed Teradata Query Grid to unload and load data from HDFS to Teradata
  • Monitoring the Database through Teradata Viewpoint.

Education

Bachelor of Science - Information Technology

Mumbai University
06.2011

Skills

  • Teradata Database 12-1720 (TPT, BTEQ, TTU, Teradata Stored Procedures)
  • ETL ( DataStage,Informatica)
  • Dashboard (Tableau & Looker)
  • SQL & Advanced SQL
  • AWS(S3,Glue,SNS, SQS, Lambda)
  • Azure Databricks (Delta Tables, Delta Live Tables & Unity Catalog)
  • CI/ CD (GitLab Actions, Jenkins)
  • Python(Pyspark, Boto3)
  • Unix shell scripting
  • RDBMS(DB2,SQL Server,Oracle,Teradata)
  • Erwin Data Modeler
  • Performance Tuning (SQL , Spark)
  • Cloud Migration
  • Azure (Data Factory ,Synapse, Azure Terraform,ADLS Gen 2,Blob Storage,Data Explorer)
  • Snowflake (Snowpipe, Stage ,Stream, Task, Snowpark, Snowproc)
  • Data build Tool (DBT) , Fivetran
  • Apache Spark , Apache Scala

Certification

Teradata Certified Professional. Teradata Certified SQL developer. Teradata Certified Specialist. Amazon Web Services Solutions Architect. Databricks Lakehouse Fundamentals.

References

References References & supporting documentation can be provided upon request.

Interests

Playing Table Tennis Cricket Soccer Listening to Music

Timeline

Senior Data Architect / Engineer

Tredence
09.2023 - Current

Senior Data Architect

Resideo Technologies
09.2022 - 08.2023

Senior Data Architect

Teradata
10.2020 - 09.2022

Associate Data Architect

Wells Fargo
03.2020 - 10.2020

Lead Data Engineer

CGI
06.2017 - 03.2020

Data Engineer

Teradata Corporation
05.2012 - 05.2017

Bachelor of Science - Information Technology

Mumbai University
RAHUL SINGH