DINESH SURAM

Work History

ETL Developer/Data Scientist

Toyota Financial Services

2022.10 - 2024.08

Created High Level and Detailed Level design
Conduct design reviews and design verification
Create final work estimate
Created brand new data platform in AWS using services like ec2, S3 buckets and Iam Roles
Proficient in Planning and Charting the program, working with Architects and Data steward managers for Program execution
Seasoned working in cross-functional, cross vendor and offshore teams
Performed the planning and formulated the plan to assess and capture the usable lineage of data attributes in the existing system and template requirement procedure
Developed Data Vault Models tailored to various use cases, designing specific models for each of the five countries involved in the project.
Designed and developed jobs that extracted data from source databases (Mexico, Brazil, Canada, Colombia, Puerto Rico) using DB connectors, and utilized the IBM Sterling tool for the MFT process to securely transfer files from various countries for data integration
Implemented Snowflake's cloud data warehousing solution to consolidate data silos into a single source of truth, enhancing data accessibility and integrity for real-time analytics across the organization
Designed and executed data migration to Snowflake, utilizing its scalable compute and storage capabilities to optimize data processing speeds and reduce infrastructure costs
Leveraged Snowflake's unique architecture to perform complex SQL queries on large datasets without impacting performance, resulting in a 50% decrease in query execution time
Utilized Snowflake's Time Travel and Zero-Copy Cloning features to improve data recovery processes and support efficient environment management for development, testing, and production workflows
Developed and optimized ETL pipelines using Snowflake's native capabilities, such as Snow pipe for continuous data ingestion and Streamlet for real-time data processing (Columbia, Puerto Rico)
Implemented role-based access control within Snowflake to enhance data security and compliance with regulatory requirements, ensuring that sensitive data is protected, and access is audited
Conducted regular performance tuning and optimization of Snowflake environments, achieving significant cost savings by optimizing data storage and compute resources
Created automated data transformation scripts using Snow SQL to facilitate complex data manipulation tasks, reducing manual effort and minimizing the risk of errors
Integrated Snowflake with various BI tools and platforms such as Tableau and PowerBI, enabling advanced data visualization and analytics capabilities for business users
Provided training and support to team members on best practices for using Snowflake, enhancing team productivity and ensuring efficient use of the platform across the organization
Snowflake Zero Copy cloning - Cloning databases for DEV and QA environments
Developed Ingestion Frameworks for generating and Execution of Snow SQL scripts in Python
(Canada, Brazil, Mexico)
Deployed ETL Process for Production implementation, resolving issues, monitoring performance, performance tuning of SQL queries, Data Transformation, Data Validation, Data Modeling, mapping documentation
Worked on ingestion process to ingest data into S3 buckets on daily bases End-to-end implementation, maintenance, optimizations, and enhancement of the application
Involved in creating SQL queries, performance tuning and Used flatten table function to produce lateral view of varient, object and array column
Worked in Agile/Scrum environment
Created Python scripts to generate user specific reports and send them over emails in a scheduled fashion
Extensively used materialized views for designing Fact tables and Dim tables
Ensured that operational and analytical data warehouses can support all business requirements for business reporting
Developed Unix shell scripts and worked on Python scripts for controlled execution of DataStage jobs
Shared sample data using grant access to customer for UAT
Extensively worked on dimensional modeling and data load into dim and fact
Deployed ETL Process for UAT implementation, change management, Testing of ETL Process
Leading the team and ensuring the delivery happens on time with the best quality and least defects
This is important in living up to the Quality standards of Toyota Financial Services
Maintained data pipelines to support the development of risk models, including Logistic Regression, Random Forest, and XGBoost, which were used to predict customer defaults and assess financial risks
Collaborated with data scientists to build Random Forest models for generating risk scores, integrating data from five countries
Contributed to identifying key features like income level and credit history that significantly influenced risk prediction
Supported the implementation of XGBoost models for predicting adverse financial events
Provided the data infrastructure and conducted data cleaning processes, which improved model performance by 30%
Assisted in hyperparameter tuning of XGBoost models by analyzing the distribution of the underlying risk data, contributing to a reduction in false positives
Engineered datasets for Logistic Regression models that classified customers into risk categories
Provided clean, normalized data and managed missing values, leading to a significant improvement in classification accuracy
Contributed to feature selection by analyzing transactional patterns, which directly enhanced the model's ability to predict high-risk customers
Aggregated and pre-processed time-series data to forecast future risk trends using ARIMA models
Collaborated with data scientists to identify seasonal patterns and cyclical trends that were critical to accurate forecasting
Verify production code, support first three executions of code in production, Transition of the code and Process to the Maintenance/Support team
Co-ordinate with various Business partners, Analytical teams, Stakeholders to provide status reporting
Actively participated in the team meetings in day-to-day calls, meeting reviews, status calls, batch reviews.
Environment: Unix/Linux, Snowflake, Jenkins, Shell, Python, Artifactory, ECR, Autosys Git, OpenShift, Unix

ETL Developer/Data Engineer

Reliance Industries Ltd.

2014.07 - 2020.10

Coordinate with Business Analysts (BA) across seven petrochemical sites and 2 refinery clusters to gather business requirements, evaluate the scope of design and technical feasibility
Analyze complexity and technical impact of requirements to cope with the existing design and discuss with Business Analyst for further refinement of requirements
Created High Level and Detailed Level design
Conduct design reviews and design verification
Create final work estimate
End-to-end implementation, maintenance, optimizations, and enhancement of the application
Built predictive models to predict the possibility of equipment failure; resulted in reduction of O&M costs by 12% annually
Code Review, Presenting code & design in Technical Review Board
Designed and developed jobs that extract data from the source databases using DB connectors Oracle, DB2 and Teradata
Involved in creating SQL queries, performance tuning and creation of indexes
Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Spark, Sqoop, Pig, Zookeeper and Flume
Data warehouse, Business Intelligence architecture design and develop
Designed the ETL process from various sources into Hadoop/HDFS for analysis and further processing of data modules
Designing and Creating Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL Azure Data Lake Analytics
Data Ingestion to at least one Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks
Created, provisioned numerous Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters
Developed ADF Pipelines to load data from on prem to AZURE cloud Storage and databases
Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks
Extensively chipped on Spark Context, Spark-SQL, RDD's Transformation, Actions and Data Frames
Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using PySpark and shell scripting
Created Spark RDDs from data files and then performed transformations and actions to other RDDs
Created Hive Tables with dynamic and static partitioning including buckets for efficiency
Also Created external tables in HIVE for staging purposes
Loaded HIVE tables with data, wrote hive queries that run on MapReduce and Created customized BI tool for management teams that perform query analytics using HiveQL
To meet specific business requirements wrote UDF's in Scala and PySpark
Experience in developing Spark applications using Spark-SQL in EMR for data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns
Utilized Spark in Memory capabilities, to handle large datasets
Used Broadcast variables in Spark, effective & efficient Joins, transformations, and other capabilities for data processing
Experienced in working with EMR cluster and S3 in AWS cloud
Creating Hive tables, loading and analyzing data using hive scripts
Implemented Partitioning (both dynamic Partitions and Static Partitions) and Bucketing in HIVE
Involved in continuous Integration of application using Jenkins
Lead in Installation, integration, and configuration of Jenkins CI/CD, including installation of Jenkins plugins
Implemented a CI/CD pipeline with Docker, Jenkins, and GitHub by virtualizing the servers using Docker for the Dev and Test environments by achieving needs through configuring automation using Containerization
Installing, configuring, and administering Jenkins CI tool using Chef on AWS EC2 instances
Performed Code Reviews and responsible for Design, Code, and Test signoff
Worked on designing and developing the Real - Time Tax Computation Engine using Oracle, Stream Sets, Kafka, Spark Structured Streaming
Validated data transformations and performed End-to-End data validations for ETL workflows loading data from XMLs to EDW
Extensively utilized Informatica to create complete ETL process and load data into database which was to be used by Reporting Services
Created Tidal Job events to schedule the ETL extract workflows and to modify the tier point notifications
Environment: Python, SQL, Oracle, Hive, Scala, Power BI, Azure Data Factory, Data Lake, Docker, Mongo DB, Kubernetes, PySpark, SNS, Kafka, Data Warehouse, Sqoop, Pig, Zookeeper, Flume, Hadoop, Airflow, Spark, EMR, EC2, S3, Git, GCP, Lambda, Glue, ETL, Databricks, Snowflake, AWS Data Pipeline.

Summary

Overview

Work History

ETL Developer/Data Scientist

ETL Developer/Data Engineer

Education

Master of Science in Data Science -

Bachelor of Science - Chemical Engineering

Skills

Certification

Competitionsleadershipexperience

Projects

Languages

Timeline

Similar Profiles

LaShawn JenningsLaShawn Jennings

Corey TarterCorey Tarter

Mary Ellen Geho De SouzaMary Ellen Geho De Souza