Summary
Overview
Work History
Education
Skills
Timeline
Generic

Saisandesh Devireddy

Chicago,IL

Summary

Experienced Data Engineer with a comprehensive background in designing and implementing data solutions for complex business environments. Proven expertise in leading ETL architecture development, source-to-target mapping, and managing data warehouse projects from conception to execution. Skilled in utilizing tools like Erwin for logical and physical data modeling, collaborating closely with stakeholders to ensure alignment with business objectives. Proficient in managing large-scale data processing and storage using technologies such as Apache Spark, Sqoop, and AWS Glue, with a focus on optimizing performance and reliability. Experienced in cloud-native technologies including AWS, GCP, and Azure, with a strong foundation in database management systems and data visualization tools. Dedicated to driving business success through innovative data-driven solutions and strategic insights.

Overview

7
7
years of professional experience

Work History

Senior Data Engineer

Charles Schwab
Chicago, IL
02.2023 - Current
  • Designed and developed data pipelines using Azure Data Factory (ADF), leveraging Linked Services, Datasets, and Pipelines to facilitate ETL processes across various data sources, including Azure SQL, Blob Storage, and Azure SQL Data Warehouse.
  • Automated pipeline deployment using JSON scripts and Cosmos Activity for data processing.
  • Utilized Pig and Hive for synchronizing structured and unstructured data based on business requirements.
  • Developed and deployed Hadoop ecosystem solutions involving MapReduce, Spark, Hive, Pig, Sqoop, HBase, Oozie, and Impala.
  • Employed Apache Kafka and Zookeeper to implement distributed messaging queues, integrated with Cassandra.
  • Managed data ingestion using Apache Sqoop to move data between HDFS and RDBMS.
  • Engineered real-time data processing solutions using Azure Stream Analytics, Azure Event Hub, and Service Bus Queue.
  • Created and optimized Spark applications using Scala and SparkSQL/Streaming for efficient data processing.
  • Processed large datasets with Databricks, employing Python notebooks for data transformation and aggregation.
  • Developed and optimized database objects (Stored Procedures, UDFs, Triggers, Indexes, Views) using TSQL in both OLTP and Data Warehouse environments.
  • Designed data models and created Hive external tables to support data scientists' analytical needs.
  • Migrated on-premises Oracle ETL processes to Azure Synapse Analytics, ensuring seamless integration and performance optimization.
  • Utilized Azure Databricks and DataFrames to transform and load data into Parquet and SQL tables.
  • Integrated various NoSQL databases (HBase, Cassandra) and implemented realtime data pipelines using Kafka and Spark Streaming.
  • Automated Azure services remediation using PowerShell scripts and JSON templates.
  • Leveraged Terraform for managing infrastructure as code and orchestrating deployments on Kubernetes.
  • Developed ETL workflows using SQL Server Integration Services (SSIS) for data extraction, transformation, and loading from multiple sources.
  • Designed end-to-end ETL/ELT applications with Azure Synapse Analytics for comprehensive data processing solutions.
  • Implemented Big Data Analytics and Advanced Data Science techniques using Azure Databricks, Hive, Hadoop, Python, PySpark, and Spark SQL.
  • Applied machine learning algorithms to identify trends, patterns, and discrepancies in petabytes of data.
  • Created report models from cubes and relational data warehouses for generating ad hoc and chart reports.
  • Benchmarked and optimized Hadoop/HBase clusters for internal use.
  • Troubleshot and resolved errors in HBase Shell/API, Pig, Hive, and MapReduce workflows.
  • Ensured data security and compliance with industry standards while handling sensitive and large-scale data sets.

Senior Data Engineer

Fidelity Investments
Durham, NC
02.2021 - 01.2023
  • Designed and developed data pipelines using AWS Glue, leveraging connections, crawlers, and jobs to facilitate ETL processes across various data sources, including Amazon RDS, S3, and Redshift.
  • Automated pipeline deployment using CloudFormation templates and AWS Step Functions for data processing.
  • Utilized Apache Pig and Hive on Amazon EMR for synchronizing structured and unstructured data, based on business requirements.
  • Developed and deployed Hadoop ecosystem solutions on AWS involving MapReduce, Spark, Hive, Pig, Sqoop, HBase, Oozie, and Impala.
  • Employed Amazon MSK (Managed Streaming for Apache Kafka) and Zookeeper to implement distributed messaging queues integrated with Amazon DynamoDB.
  • Managed data ingestion using Apache Sqoop on Amazon EMR to move data between HDFS and RDBMS.
  • Engineered realtime data processing solutions using AWS Kinesis Data Analytics, Kinesis Data Streams, and SQS (Simple Queue Service).
  • Created and optimized Spark applications using Scala and SparkSQL/Streaming on Amazon EMR for efficient data processing.
  • Processed large datasets with AWS Glue and EMR, employing Python notebooks for data transformation and aggregation.
  • Developed and optimized database objects (Stored Procedures, UDFs, Triggers, Indexes, Views) using PL/pgSQL in both OLTP and Data Warehouse environments.
  • Designed data models and created Hive external tables on Amazon EMR to support data scientists' analytical needs.
  • Migrated on-premises Oracle ETL processes to Amazon Redshift, ensuring seamless integration and performance optimization.
  • Utilized AWS Glue and DataFrames to transform and load data into Parquet and Redshift tables.
  • Integrated various NoSQL databases (HBase, DynamoDB) and implemented realtime data pipelines using MSK (Kafka) and Spark Streaming.
  • Automated AWS services remediation using AWS Lambda functions and CloudFormation templates.
  • Leveraged Terraform for managing infrastructure as code and orchestrating deployments on Amazon EKS (Elastic Kubernetes Service).
  • Developed ETL workflows using AWS Data Pipeline and AWS Glue for data extraction, transformation, and loading from multiple sources.
  • Designed end-to-end ETL/ELT applications with Amazon Redshift for comprehensive data processing solutions.
  • Implemented Big Data Analytics and Advanced Data Science techniques using AWS Glue, Hive on EMR, Hadoop on EMR, Python, PySpark, and Spark SQL on EMR.
  • Created report models from Amazon QuickSight and relational data warehouses for generating ad hoc and chart reports.
  • Benchmarked and optimized Hadoop/HBase clusters on Amazon EMR for internal use.
  • Troubleshot and resolved errors in HBase Shell/API, Pig, Hive, and MapReduce workflows on AWS EMR.
  • Ensured data security and compliance with industry standards while handling sensitive and large-scale data sets using AWS IAM and KMS.

Big Data Engineer

Virgin Pulse
Providence, RI
11.2019 - 01.2021
  • Develop dashboards and visualizations to help business users analyse data as well as providing data insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.
  • Knowledge of U-SQL and how it can be used for data transformation as part of a cloud data integration strategy.
  • Performed the migration of large data sets to Databricks (Spark), create and administer cluster, load data, configure data pipelines, loading data from ADLS Gen2 to Databricks using ADF pipelines.
  • Created Linked service to land the data from SFTP location to Azure Data Lake.
  • Created various pipelines to load the data from Azure Data Lake into Staging SQLDB, followed by Azure SQL DB.
  • Write test scripts for inbound and outbound ETL processes and interfaces with other systems and streams.
  • Designed and developed a new solution to process the NRT data by using Azure Stream Analytics, Azure Event Hub, and Service Bus Queue.
  • Created Databricks notebooks to streamline and curate the data for various business use cases and mounted blob storage on Databricks.
  • Developed streaming pipelines using Apache Spark with Python.
  • Created pipelines, data flows, and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Databricks.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
  • Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Databricks.
  • Expertise in Python and Scala, user - defined functions (UDF) for Hive and Pig using Python.
  • Implemented the data warehousing solution in Azure Synapse Analytics.
  • To meet specific business requirements, I wrote UDFs in Scala and PySpark.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Used Informatica as an ETL tool to transfer the data from source to staging and staging to target.
  • Experience with Azure technologies, such as Storage solutions, Azure Blob storage, Azure Data Lake Storage gen2, and Azure SQL Database.
  • Hands-on experience in developing SQL scripts for automation purposes.
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS)
  • Experienced in building Automation Regression Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.
  • Designed and developed a new solution to process the NRT data by using Azure Stream Analytics, Azure Event Hub, and Service Bus Queue.
  • Working with complex SQL views, stored procedures, triggers, and packages in large databases from various servers.

ETL Developer

FICO
India
07.2017 - 10.2019
  • Designed and customized data models for the Data Warehouse, facilitating real-time data integration from multiple sources.
  • Led the development of ETL architecture and Source to Target mapping, ensuring efficient data loading into the Data Warehouse.
  • Utilized Erwin for logical and physical data modeling in STAR SCHEMA for the Data Warehouse database.
  • Actively engaged with users to gather new requirements and address existing issues, ensuring alignment with business objectives.
  • Conducted business analysis and technical design sessions to develop requirements documents and ETL specifications.
  • · Developed, tested, and implemented processes for loading initial data into the Enterprise Data Warehouse (EDW).
  • Coordinated and prioritized multiple projects, estimating, scheduling, and tracking ETL projects throughout the SDLC.
  • Led ETL Team in analyzing, designing, and developing ETL strategies and processes, providing guidance and mentoring to team members.
  • Extracted data from WEBCRM Databases, Siebel source systems, and Oracle ERP Systems into the EDW for various projects.
  • Normalized data formats, replaced missing values with defaults, standardized values, and mapped attributes to create a master list.
  • Troubleshot and tuned SQL using EXPLAIN PLAN to optimize performance.
  • Developed ETL objects, including mappings, sessions, mapplets, and workflows, to efficiently load data into the EDW.
  • Designed and implemented Data Marts to support mission-critical Data Warehouse applications.
  • Developed complex mappings and mapplets in Informatica using various transformations.
  • Created standard and reusable mappings and mapplets utilizing transformations such as expression, aggregator, joiner, source qualifier, router, lookup, and Router.
  • · Managed repository migrations, user creations, and permissions for testing personnel and developers.
  • · Installed and administered Informatica versions 8.1, 8.6, and 9.1 at the server level.
  • · Implemented various Data Transformations including Slowly Changing Dimensions.
  • · Maintained the Data Warehouse Administration Console (DAC), scheduled DAC Jobs, and monitored Siebel Jobs and EIM Jobs.
  • · Collaborated closely with Siebel database administrators, loading data from the Enterprise database for dimension data.
  • · Worked with EIM tables, Base Table Loads, and conducted troubleshooting analysis.
  • · Performed optimization at both the mapping level and session level in Informatica.
  • · Collaborated with DBAs, Architects, and System Administrators to create and maintain Informatica Architecture.
  • · Reviewed ETL process design and ETL design documents, uploading them to the central repository for future access.
  • Environment: Informatica 9.1/8.1.1/8.6.1, Informatica Power Exchange 9.1/8.6/8.1.1, Informatica Data Quality, Oracle 10g/11g, SQL Server 2005/2000, TSQL, Shell Scripts, Unix Korn Shell, JAVA, Toad, Erwin 4.0, Siebel, OBIEE, DAC, Pervasive ETL Tool.

Education

Bachelor of Science - COMPUTER SCIENCE

AMITY UNIVERSITY
MUMBAI
05-2017

Skills

  • Data Technologies: MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, Kafka, Zookeeper, Yarn, Sparklib
  • Databases: Oracle, MySQL, SQL Server, MongoDB, Cassandra, DynamoDB, PostgreSQL, Teradata, Cosmos
  • Frameworks: Django, Flask, Hadoop, Apache Spark, BigQuery, Apigee
  • Programming Languages: Python, PySpark, Scala, R, Shell Scripting, Java
  • Web Services: AWS, GCP, Azure, Snowflake, Apache Tomcat, WebLogic
  • Web Technologies: CSS, HTML, XHTML, AJAX, XML, JSON, JavaScript
  • Visualization/Reporting: Tableau, Power BI, Looker, SSIS, SSRS, SSAS
  • Development Tools: Databricks, R Studio, PyCharm, Jupyter Notebook, Sublime Text, IntelliJ, Eclipse, NetBeans, Visual Studio, Heroku, Docker, Brackets
  • Version Control: Git, GitHub, SVN, CVS
  • Methodologies: Agile (Scrum), Waterfall

Timeline

Senior Data Engineer

Charles Schwab
02.2023 - Current

Senior Data Engineer

Fidelity Investments
02.2021 - 01.2023

Big Data Engineer

Virgin Pulse
11.2019 - 01.2021

ETL Developer

FICO
07.2017 - 10.2019

Bachelor of Science - COMPUTER SCIENCE

AMITY UNIVERSITY
Saisandesh Devireddy