Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sankari Danesh

Summary

Certified AWS Cloud Practitioner with over 9 years of hands-on experience in data and quality engineering, specializing in on-premises and big data technologies. Expertise includes designing dimensional and fact models, developing data acquisition and quality frameworks, and executing cloud data migrations across commercial insurance, entertainment and media, and digital advertising sectors. Proven track record as a primary contact for analytics and business teams, successfully driving data quality initiatives while supporting senior management in making informed, data-driven decisions. Committed to fostering a collaborative environment by enhancing co-workers' business acumen and understanding of data, while adhering to agile methodologies for effective product and program delivery.

Overview

10
10
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

Paccar Solutions
07.2024 - Current
  • Project: PowerTrain – Fleet Health Management
  • Domain: AWS , Snowflake , MS-SQL
  • Technology: AWS, SQL, Python , Pyspark
  • Roles and Responsibilities:
  • Data Architecture & Design:
  • Designed scalable, cloud-based data architecture for Fleet Health Management application, leveraging AWS services such as Redshift, Glue, and S3.
  • Developed schema and optimized data models in Amazon Redshift to support analytics, real-time metrics, and historical data comparisons.
  • Implemented ETL pipelines for extracting, transforming, and loading high-volume fleet data from diverse sources into Redshift.
  • Glue Job Development:
  • Built and orchestrated Glue jobs using Apache Spark to process trillions of records, ensuring efficient handling of fleet information, failure modes, and dealer locations.
  • Optimized Glue job performance by configuring Dynamic Processing Units (DPUs) and partitioning large datasets for parallel processing.
  • Ensured data integrity by implementing error-handling mechanisms and data validation in Glue pipelines.
  • API Development:
  • Designed and implemented RESTful APIs to serve data to the Fleet Health Management application, enabling real-time fleet information access for end users.
  • Integrated APIs to provide insights on frequently failing components, recommended dealer locations for repairs, and metrics comparisons with similar fleets.
  • Secured API endpoints using IAM roles and implemented robust authentication and authorization mechanisms.
  • Fleet Analytics & Insights:
  • Developed data pipelines to aggregate and analyze fleet performance metrics, identifying patterns and trends in component failures and vehicle performance.
  • Generated reports and dashboards to compare metrics such as mileage, idle time, and fuel efficiency across similar fleets for benchmarking.
  • Collaboration & Documentation:
  • Worked closely with product managers, data scientists, and DevOps teams to define data requirements and system specifications.
  • Documented architecture diagrams, ETL workflows, and API design for internal and external stakeholders.
  • Conducted knowledge-sharing sessions to onboard team members to the Fleet Health Management architecture and workflows.
  • Roles: Data Engineer

Data Solution Engineer

American Family Insurance
09.2023 - 12.2023

● Designed, implemented, and maintained data pipelines for efficient data ingestion, transformation, and loading using PySpark in EC2 instances after cluster configuration is completed in EMR environment.

● Developed Automated Data Validation framework to identify and resolve data quality issues during and after the ingestion process and before the data is being sent to business users.

● Experience in creating and reading DynamoDB configuration parameters for ETL execution and automation script execution.

● Monitored AWS infrastructure and pipelines to ensure smooth and reliable data flow, promptly addressing any issues or bottlenecks.

● Validated ETL data pipelines to ensure accuracy, completeness, and adherence to business requirements and data standards.

● Created event driven data pipelines using AWS Lambda.

● Created data models and data mappings for different stages of data within the data warehouse, including the creation of dimensional and fact tables using SQL and Python.

● Proficient in working with AWS Glue, Step Functions, EC2, and data pipelines for efficient data processing and management.

● Generated ad hoc reports using AWS Glue to meet the data needs of end users, providing them with actionable insights.

● Ensured data validation across all layers of the data ecosystem to maintain data integrity and consistency.

● Collaborated with other teams outside of the Commercial Data Platform (CDP) to support end-to-end testing, spanning application systems to Tableau system integration.

● Played a key role in the architectural design of the CDP data model, providing expertise and insights for optimal data management and performance.

● Prepared a comprehensive list of scenarios covering all layers of data to support thorough testing and validation processes.

● Conducted report validations in Tableau, ensuring the accuracy and integrity of the visualized data.

● Experience in creating and publishing live data sources and extracts in Tableau, utilizing custom SQL and data tables to meet the reporting requirements of end users.

Data Engineer - Automation

Disney Streaming Services
10.2021 - 03.2023
  • Project: Data Activation - DAF
  • Domain: AWS
  • Technology: AWS, Python, Snowflake
  • Responsibilities:
  • Enhanced data feeds and supported workflows in Informatica and Active Batch schedules.
  • Developed SQL queries to support data integration within the existing framework, ensuring compatibility with the new DAF framework.
  • Created JSON files and formulated SQL statements to facilitate seamless data feeds, aligning with the established framework for the new DAF framework.
  • Designed and implemented Directed Acyclic Graphs (DAGs) in Airflow to schedule and manage data processing jobs and streamline data feeds.
  • Utilized Databricks notebooks to compare files across different S3 locations, enabling efficient data analysis and identification of discrepancies.
  • Worked in MySQL database for watermark settings.
  • Designed Pyspark SQL queries in Databricks to be executed on Snowflake datasets.
  • Developed an automation framework using Python and Pytest to validate data feeds migrated via a JAVA application, ensuring accuracy and efficiency in the validation process.
  • Defined test strategies and test plans for highly complex scenarios, aiming to automate the testing process and streamline operations.
  • Demonstrated expertise in writing SQL queries, ranging from simple to complex, to extract and analyze data in Snowflake, ensuring efficient data retrieval.
  • Query optimization in Snowflake SQL for effective data processing.
  • Utilized Python for loading and processing large files in S3, enabling effective data validation and verification procedures.
  • Hands-on experience with code deployment automation to lower and higher environment using Jenkins and Gitlab CI/CD pipelines. Used GIT repository to manage and track changes of the code.
  • Experience in automatically deploying code changes to staging or pre-production environments
  • Roles: Data Engineer - Automation

Data Solutions Engineer

Homesite Insurance
10.2019 - 10.2021
  • Project: Commercial Data Platform
  • Domain: AWS
  • Technology: AWS, SQL, Python, Tableau
  • Responsibilities:
  • Designed, implemented, and maintained data pipelines for efficient data ingestion, transformation, and loading using PySpark in EC2 instances after cluster configuration is completed in EMR environment.
  • Developed Automated Data Validation framework to identify and resolve data quality issues during and after the ingestion process and before the data is being sent to business users.
  • Experience in creating and reading DynamoDB configuration parameters for ETL execution and automation script execution.
  • Monitored AWS infrastructure and pipelines to ensure smooth and reliable data flow, promptly addressing any issues or bottlenecks.
  • Validated ETL data pipelines to ensure accuracy, completeness, and adherence to business requirements and data standards.
  • Created event driven data pipelines using AWS Lambda.
  • Created data models and data mappings for different stages of data within the data warehouse, including the creation of dimensional and fact tables using SQL and Python.
  • Proficient in working with AWS Glue, Step Functions, EC2, and data pipelines for efficient data processing and management.
  • Generated ad hoc reports using AWS Glue to meet the data needs of end users, providing them with actionable insights.
  • Ensured data validation across all layers of the data ecosystem to maintain data integrity and consistency.
  • Collaborated with other teams outside of the Commercial Data Platform (CDP) to support end-to-end testing, spanning application systems to Tableau system integration.
  • Played a key role in the architectural design of the CDP data model, providing expertise and insights for optimal data management and performance.
  • Prepared a comprehensive list of scenarios covering all layers of data to support thorough testing and validation processes.
  • Conducted report validations in Tableau, ensuring the accuracy and integrity of the visualized data.
  • Experience in creating and publishing live data sources and extracts in Tableau, utilizing custom SQL and data tables to meet the reporting requirements of end users.
  • Roles: Data Solutions Engineer

Big Data Engineer

Avvo Inc
07.2015 - 10.2019
  • Project: Avvo Business Services
  • Domain: ETL & Big Data
  • Technology: Hadoop (Hive, Impala), Data Dog, Python
  • Responsibilities:
  • Data pipeline creation and maintenance.
  • Data extraction and loading into Hadoop data warehouse environment.
  • Helped in analyzing a complex data set, and ideating on the underlying business problem.
  • Data Migration testing end to end from Netezza to Hadoop.
  • Various kinds of reports testing, the report that includes major business decisions.
  • Performed the role of Scrum master for some of the sprints.
  • Monitoring the data aligning business metrics for any errors / data mismatches using Data Dog.
  • Creating and deploying workflows for the monitoring metrics.
  • Involved in Design meetings and contributed valuable inputs for change in the design.
  • Emerged as a SME across the business services data.
  • Testing across all layers of the Data Model.
  • Preparation of Automated Scripts to run regression tests.
  • Data Monitoring for business metrics using Tableau Reports.
  • Helped in product process improvement through efficient data monitoring techniques.
  • AVVO App testing on Mobile devices, and the data flow from mobile app to warehouse.
  • Roles: Big Data Engineer

Education

Masters - Computer Science

University of Madras

Bachelor of Computer Science -

University of Madras

Skills

  • Experienced with AWS services including Glue, DynamoDB, and Lambda
  • Skilled in data processing with Hive, Impala, and Sqoop
  • Proficient in ETL development and data transformation
  • Snowflake data warehouse expertise
  • DAG job creation and scheduling
  • Proficient in CI/CD pipeline deployment
  • Streamlined job execution using DAGs
  • Experienced with Parquet, JSON, and CSV files
  • Tableau data visualization expertise
  • Expertise in maintaining data quality
  • Proficient in Google Analytics
  • Experienced in collaborative team environments
  • Analytical problem-solving skills
  • Experienced in optimizing data workflows with Snowflake SQL
  • Adept at generating JSON files and formulating SQL statements
  • Data quality assurance
  • Automation framework development using Python and Pytest
  • Test strategy development
  • Data loading and processing in S3 using Python
  • Data warehouse development and support
  • Expertise in big data technologies including Hadoop, Spark, and Google BigQuery
  • Data governance implementation
  • Proficient in using defect tracking tools like JIRA and Bugzilla
  • Skilled in aligning project inter-dependencies and managing milestones

Certification

  • AWS Cloud Practitioner, 2021-01-01
  • Certified Scrum Master (CSM), 2022-01-01

Timeline

Senior Data Engineer

Paccar Solutions
07.2024 - Current

Data Solution Engineer

American Family Insurance
09.2023 - 12.2023

Data Engineer - Automation

Disney Streaming Services
10.2021 - 03.2023

Data Solutions Engineer

Homesite Insurance
10.2019 - 10.2021

Big Data Engineer

Avvo Inc
07.2015 - 10.2019

Masters - Computer Science

University of Madras

Bachelor of Computer Science -

University of Madras
Sankari Danesh