Summary
Overview
Work History
Education
Skills
Timeline
Generic

SAISANDESH DEVIREDDY

CHICAGO,IL

Summary

Experienced Data Engineer with a proven track record in designing and implementing data solutions for complex business environments. Adept at leading ETL architecture development, source-to-target mapping, and managing data warehouse projects from conception to execution. Skilled in utilizing tools like Erwin for logical and physical data modeling, as well as collaborating closely with stakeholders to gather requirements and ensure alignment with business objectives. Proficient in managing large-scale data processing and storage using technologies such as Apache Spark, Sqoop, and AWS Glue, with a focus on optimizing performance and reliability. Experienced in cloud-native technologies including AWS, GCP, and Azure, with a strong foundation in database management systems and data visualization tools. Dedicated to driving business success through innovative data-driven solutions and strategic insights.

Overview

9
9
years of professional experience

Work History

Senior Data Engineer

Fidelity Investments
Durham, NC
03.2022 - Current
  • Designed and developed data pipelines using Azure Data Factory (ADF), leveraging Linked Services, Datasets, and Pipelines to facilitate ETL processes across various data sources, including Azure SQL, Blob Storage, and Azure SQL Data Warehouse.
  • Automated pipeline deployment using JSON scripts and Cosmos Activity for data processing.
  • Utilized Pig and Hive for synchronizing structured and unstructured data based on business requirements.
  • Developed and deployed Hadoop ecosystem solutions involving MapReduce, Spark, Hive, Pig, Sqoop, HBase, Oozie, and Impala.
  • Employed Apache Kafka and Zookeeper to implement distributed messaging queues integrated with Cassandra.
  • Managed data ingestion using Apache Sqoop to move data between HDFS and RDBMS.
  • Engineered realtime data processing solutions using Azure Stream Analytics, Azure Event Hub, and Service Bus Queue.
  • Created and optimized Spark applications using Scala and SparkSQL/Streaming for efficient data processing.
  • Processed large datasets with Databricks, employing Python notebooks for data transformation and aggregation.
  • Developed and optimized database objects (Stored Procedures, UDFs, Triggers, Indexes, Views) using TSQL in both OLTP and Data Warehouse environments.
  • Designed data models and created Hive external tables to support data scientists' analytical needs.
  • Migrated onpremises Oracle ETL processes to Azure Synapse Analytics, ensuring seamless integration and performance optimization.
  • Utilized Azure Databricks and Data Frames to transform and load data into Parquet and SQL tables.
  • Integrated various NoSQL databases (HBase, Cassandra) and implemented realtime data pipelines using Kafka and Spark Streaming.
  • Automated Azure services remediation using PowerShell scripts and JSON templates.
  • Leveraged Terraform for managing infrastructure as code and orchestrating deployments on Kubernetes.
  • Developed ETL workflows using SQL Server Integration Services (SSIS) for data extraction, transformation, and loading from multiple sources.
  • Designed endtoend ETL/ELT applications with Azure Synapse Analytics for comprehensive data processing solutions.
  • Implemented Big Data Analytics and Advanced Data Science techniques using Azure Databricks, Hive, Hadoop, Python, PySpark, and Spark SQL.
  • Applied machine learning algorithms to identify trends, patterns, and discrepancies in petabytes of data.
  • Created report models from cubes and relational data warehouses for generating adhoc and chart reports.
  • Benchmarked and optimized Hadoop/HBase clusters for internal use.
  • Troubleshot and resolved errors in HBase Shell/API, Pig, Hive, and MapReduce workflows.
  • Ensured data security and compliance with industry standards while handling sensitive and largescale data sets.

Big Data Engineer

UBS Financial Services
Weehawken, NJ
01.2020 - 02.2022
  • Led Requirement Gathering, Design, and Deployment of applications using Scrum (Agile) methodology.
  • Developed Spark sorting applications in Java for data stored on AWS.
  • Wrote Javabased Kafka and Spark streaming programs for realtime data processing.
  • Implemented Spark Streaming for batching streaming data and designed batch processing jobs using Apache Spark for increased speed.
  • Developed Spark applications for data validation, cleansing, transformation, and custom aggregation.
  • Utilized Snowflake to build Power BI dashboards and reports for structured data analysis.
  • Created interactive reports and dashboards in Power BI, scheduling reports and representing metrics and heatmaps.
  • Identified appropriate cloudnative technologies for big data flow development and maintenance.
  • Managed version control using GitHub and implemented CI/CD Pipelines in Azure DevOps environments with Jenkins.
  • Utilized JIRA for issue tracking and added algorithm selection options for data and address generation.
  • Developed big data web applications using Agile methodology in Scala, combining functional and objectoriented programming.
  • Designed Logical and Physical data modeling for various data sources.
  • Developed ETL jobs to extract data from Salesforce replica and load it into Redshift data mart.
  • Integrated Cassandra as a distributed metadata store for network entity resolution.
  • Queried and analyzed data from DataStax Cassandra for searching, sorting, and grouping.
  • Migrated data from Teradata Systems to Hortonworks HDInsight cluster on Azure.
  • Monitored AWS resources and applications using CloudWatch, creating alarms and notifications.
  • Created Spark clusters and imported data using AWS EMR and EC2 instances.
  • Integrated jobs using Informatica PowerCenter Designer and Workflow Manager.
  • Provisioned Databricks clusters, notebooks, jobs, and autoscaling.
  • Implemented custom functions in Java for Hive to process complex data.
  • Maintained a data dictionary with metadata about the database schema.
  • Proficient in migrating data to GCP using Transfer Appliance and Cloud Data Transfer Service.
  • Continuously optimized MongoDB performance, including query tuning and resource allocation.
  • Environment: Azure HDInsight, Databricks, Data Lake, Cosmos DB, MySQL, Azure SQL, Snowflake, MongoDB, Cassandra

Azure Engineer

Virgin Pulse
Providence, RI
10.2017 - 12.2019
  • Develop dashboards and visualizations to help business users analyse data as well as providing data insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.
  • Knowledge of U-SQL and how it can be used for data transformation as part of a cloud data integration strategy
  • Performed the migration of large data sets to Databricks (Spark), create and administer cluster, load data, configure data pipelines, loading data from ADLS Gen2 to Databricks using ADF pipelines.
  • Created Linked service to land the data from SFTP location to Azure Data Lake.
  • Created various pipelines to load the data from Azure data lake into Staging SQLDB and followed by to Azure SQL DB
  • Write Test scripts for Inbound and Out bound ETL processes and interfaces with other systems and streams.
  • Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
  • Created Databrick notebooks to streamline and curate the data for various business use cases and mounted blob storage on Databrick.
  • Developed streaming pipelines using Apache Spark with Python
  • Created pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Databricks.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
  • Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Databricks.
  • Expertise in Python and Scala, user - defined functions (UDF) for Hive and Pig using Python.
  • Implemented the data warehousing solution in Azure Synapse Analytics.
  • To meet specific business requirements wrote UDF’s in Scala and PySpark.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Used Informatica as ETL tool to transfer the data from source to staging and staging to target
  • Experience with Azure technologies such as Storage solutions Azure Blob storage, Azure Data Lake Storage gen2, Azure SQL Database
  • Hands-on experience on developing SQL Scripts for automation purpose.
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS)
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.
  • Designed and developed a new solution to process the NRT data by using Azure Stream Analytics, Azure Event Hub, and Service Bus Queue.
  • Working with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.

ETL Developer

FICO
, INDIA
07.2015 - 08.2017
  • Designed and customized data models for the Data Warehouse, facilitating realtime data integration from multiple sources.
  • Led the development of ETL architecture and Source to Target mapping, ensuring efficient data loading into the Data Warehouse.
  • Utilized Erwin for logical and physical data modeling in STAR SCHEMA for the Data Warehouse database.
  • Actively engaged with users to gather new requirements and address existing issues, ensuring alignment with business objectives.
  • Conducted business analysis and technical design sessions to develop requirements documents and ETL specifications.
  • Developed, tested, and implemented processes for loading initial data into the Enterprise Data Warehouse (EDW).
  • Coordinated and prioritized multiple projects, estimating, scheduling, and tracking ETL projects throughout the SDLC.
  • Led ETL Team in analyzing, designing, and developing ETL strategies and processes, providing guidance and mentoring to team members.
  • Extracted data from WEBCRM Databases, Siebel source systems, and Oracle ERP Systems into the EDW for various projects.
  • Normalized data formats, replaced missing values with defaults, standardized values, and mapped attributes to create a master list.
  • Troubleshot and tuned SQL using EXPLAIN PLAN to optimize performance.
  • Developed ETL objects including mappings, sessions, mapplets, and workflows to efficiently load data into the EDW.
  • Designed and implemented Data Marts to support missioncritical Data Warehouse applications.
  • Developed complex mappings and mapplets in Informatica using various transformations.
  • Created standard and reusable mappings and mapplets utilizing transformations such as expression, aggregator, joiner, source qualifier, router, lookup, and Router.
  • Managed repository migrations, user creations, and permissions for testing personnel and developers.
  • Installed and administered Informatica versions 8.1, 8.6, and 9.1 at the server level.
  • Implemented various Data Transformations including Slowly Changing Dimensions.
  • Maintained the Data Warehouse Administration Console (DAC), scheduled DAC Jobs, and monitored Siebel Jobs and EIM Jobs.
  • Collaborated closely with Siebel database administrators, loading data from the Enterprise database for dimension data.
  • Worked with EIM tables, Base Table Loads, and conducted troubleshooting analysis.
  • Performed optimization at both the mapping level and session level in Informatica.
  • Collaborated with DBAs, Architects, and System Administrators to create and maintain Informatica Architecture.
  • Reviewed ETL process design and ETL design documents, uploading them to the central repository for future access.
  • Environment: Informatica 9.1/8.1.1/8.6.1, Informatica Power Exchange 9.1/8.6/8.1.1, Informatica Data Quality, Oracle 10g/11g, SQL Server 2005/2000, TSQL, Shell Scripts, Unix Korn Shell, JAVA, Toad, Erwin 4.0, Siebel, OBIEE, DAC, Pervasive ETL Tool.

Education

Bachelor of Science - Computer Science

AMITY UNIVERSITY
MUMBAI
05-2015

Skills

  • Data Technologies: MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, Kafka, Zookeeper, Yarn, Sparklib
  • Databases: Oracle, MySQL, SQL Server, MongoDB, Cassandra, DynamoDB, PostgreSQL, Teradata, Cosmos
  • Frameworks: Django, Flask, Hadoop, Apache Spark, BigQuery, Apigee
  • Programming Languages: Python, PySpark, Scala, R, Shell Scripting, Java
  • Web Services: AWS, GCP, Azure, Snowflake, Apache Tomcat, WebLogic
  • Web Technologies: CSS, HTML, XHTML, AJAX, XML, JSON, JavaScript
  • Visualization/Reporting: Tableau, Power BI, Looker, SSIS, SSRS, SSAS
  • Development Tools: Databricks, R Studio, PyCharm, Jupyter Notebook, Sublime Text, IntelliJ, Eclipse, NetBeans, Visual Studio, Heroku, Docker, Brackets
  • Version Control: Git, GitHub, SVN, CVS
  • Methodologies: Agile (Scrum), Waterfall

Timeline

Senior Data Engineer

Fidelity Investments
03.2022 - Current

Big Data Engineer

UBS Financial Services
01.2020 - 02.2022

Azure Engineer

Virgin Pulse
10.2017 - 12.2019

ETL Developer

FICO
07.2015 - 08.2017

Bachelor of Science - Computer Science

AMITY UNIVERSITY
SAISANDESH DEVIREDDY