Summary

Overview

Work History

Education

Skills

Certification

Timeline

SATISH REDDY MUTTHANA

Macomb,IL

Summary

Over all 9+ years of professional IT experience and over 5+ years in Data Engineering, and 4+ years in Datawarehouse. Experienced data professional with a strong background in end-to-end management of ETL data pipelines, ensuring scalability and smooth operations. Proficient in optimizing query techniques and indexing strategies to enhance data fetching efficiency. Skilled in utilizing SQL queries, including DDL, DML, and various database objects, for data manipulation and retrieval. Expertise in integrating on-premises and cloud-based data sources using Azure Data Factory, applying transformations, and loading data into Snowflake. Strong knowledge of data warehousing techniques, including data cleansing, Slowly Changing Dimension handling, surrogate key assignment, and change data capture for Snowflake modeling. Experienced in designing and implementing scalable data ingestion pipelines using tools such as Apache Kafka, Apache Flume, and Apache Nifi. Proficient in developing and maintaining ETL/ELT workflows using technologies like Apache Spark, Apache Beam, or Apache Airflow for efficient data extraction, transformation, and loading processes. Skilled in implementing data quality checks and cleansing techniques to ensure data accuracy and integrity throughout the pipeline. Experienced in building and optimizing data models and schemas using technologies like Apache Hive, Apache HBase, or Snowflake for efficient data storage and retrieval for analytics and reporting. Strong proficiency in developing ELT/ETL pipelines using Python and Snowflake Snow SQL. Skilled in creating ETL transformations and validations using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory. Collaborative team member, working closely with Azure Logic Apps administrators and DevOps engineers to monitor and resolve issues related to process automation and data processing pipelines. Experienced in optimizing code for Azure Functions to extract, transform, and load data from diverse sources. Strong experience in designing, building, and maintaining data integration programs within Hadoop and RDBMS environments. Proficient in implementing CI/CD frameworks for data pipelines using tools like Jenkins, ensuring efficient automation and deployment. Skilled in executing Hive scripts through Hive on Spark and SparkSQL to address various data processing needs. Collaborative team member, ensuring data integrity and stable data pipelines while collaborating on ETL tasks. Strong experience in utilizing Kafka, Spark Streaming, and Hive to process streaming data, developing robust data pipelines for ingestion, transformation, and analysis. Proficient in utilizing Spark Core and Spark SQL scripts using Scala to accelerate data processing capabilities. Experienced in utilizing JIRA for project reporting, task management, and ensuring efficient project execution within Agile methodologies. Actively participated in Agile ceremonies, including daily stand-ups and PI Planning, demonstrating effective project management skills.

Overview

years of professional experience

Certification

Work History

Azure Data Engineer

Technicolor

03.2022 - Current

Managed end-to-end operations of ETL data pipelines, ensuring scalability and smooth functioning
Implemented optimized query techniques and indexing strategies to enhance data fetching efficiency
Utilized SQL queries, including DDL, DML, and various database objects (indexes, triggers, views, stored procedures, functions, and packages) for data manipulation and retrieval
Integrated on-premises (MySQL, Cassandra) and cloud-based (Blob storage, Azure SQL DB) data using Azure Data Factory, applying transformations and loading data into Snowflake
Orchestrated seamless data movement into SQL databases using Data Factory's data pipelines
Developed data warehousing techniques, data cleansing, Slowly Changing Dimension (SCD) handling, surrogate key assignment, and change data capture for Snowflake modelling
Designed and implemented scalable data ingestion pipelines using tools such as Apache Kafka, Apache Flume, and Apache Nifi to collect and process large volumes of data from various sources
Developed and maintained ETL/ELT workflows using technologies like Apache Spark, Apache Beam, or Apache Airflow, enabling efficient data extraction, transformation, and loading processes
Implemented data quality checks and data cleansing techniques to ensure the accuracy and integrity of the data throughout the pipeline
Built and optimized data models and schemas using technologies like Apache Hive, Apache HBase, or Snowflake to support efficient data storage and retrieval for analytics and reporting purposes
Developed ELT/ETL pipelines using Python and Snowflake Snow SQL to facilitate data movement to and from Snowflake data store
Created ETL transformations and validations using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory
Collaborated with Azure Logic Apps administrators to monitor and resolve issues related to process automation and data processing pipelines
Optimized code for Azure Functions to extract, transform, and load data from diverse sources, including databases, APIs, and file systems
Designed, built, and maintained data integration programs within Hadoop and RDBMS environments
Implemented a CI/CD framework for data pipelines using the Jenkins tool, enabling efficient automation and deployment
Collaborated with DevOps engineers to establish automated CI/CD and test-driven development pipelines using Azure, aligning with client requirements
Demonstrated proficiency in scripting languages like Python and Scala for efficient data processing
Executed Hive scripts through Hive on Spark and SparkSQL to address diverse data processing needs
Collaborated on ETL tasks, ensuring data integrity and maintaining stable data pipelines
Utilized Kafka, Spark Streaming, and Hive to process streaming data, developing a robust data pipeline for ingestion, transformation, and analysis
Utilized Spark Core and Spark SQL scripts using Scala to accelerate data processing capabilities
Utilized JIRA for project reporting, creating subtasks for development, QA, and partner validation
Actively participated in Agile ceremonies, including daily stand-ups and internationally coordinated PI Planning, ensuring efficient project management and execution
Environment: Azure Databricks, Data Factory, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance, data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi.

Azure Data Engineer

Black Knight Financial Services

11.2020 - 02.2022

Enhanced Spark performance by optimizing data processing algorithms, leveraging techniques such as partitioning, caching, and broadcast variables
Implemented efficient data integration solutions to seamlessly ingest and integrate data from diverse sources, including databases, APIs, and file systems, using tools like Apache Kafka, Apache NiFi, and Azure Data Factory
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory and Logic Apps and also done POC on Azure Data Bricks
Perform ETL using Azure Data Bricks, Migrated on premise Oracle ETL process to azure synapse analytics
Worked on Migrating SQL database to Azure data lake, Azure data lake analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse
Controlling and granting database access and Migrating on Premise databases to azure data lake store using Azure Data Factory
Data transfer using azure synapse and Polybase
Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development
Developed enterprise level solution using batch processing and streaming framework (using Spark Streaming, apache Kafka
Designed and implemented robust data models and schemas to support efficient data storage, retrieval, and analysis using technologies like Apache Hive, Apache Parquet, or Snowflake
Developed and maintained end-to-end data pipelines using Apache Spark, Apache Airflow, or Azure Data Factory, ensuring reliable and timely data processing and delivery
Collaborated with cross-functional teams to gather requirements, design data integration workflows, and implement scalable data solutions
Provided production support and troubleshooting for data pipelines, identifying and resolving performance bottlenecks, data quality issues, and system failures
Processed the schema oriented and non-schema-oriented data using Scala and Spark
Created Partitions, Buckets based on State to further process using Bucket based Hive joins
Created Hive Generic UDF's to process business logic that varies based on policy
Worked with Data Lakes and big data ecosystems (Hadoop, Spark, Hortonworks, Cloudera)
Load and transform large sets of structured, semi structured, and unstructured data
Written Hive queries for data analysis to meet the Business requirements
Wrote Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities
Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing data
Worked on RDD’s & Data frames (SparkSQL) using PySpark for analyzing and processing the data
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Implemented CICD pipelines to build and deploy the projects in Hadoop environment
Using JIRA to manage the issues/project workflow
Worked on Spark using Python (PySpark) and Spark SQL for faster testing and processing of data
Used Git as version control tools to maintain the code repository
Environment: Azure Databricks, Data Factory, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance, data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi.

Data Engineer

TMW Systems

08.2019 - 10.2020

Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing data
Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect
Worked on creating tabular models on Azure analytic services for meeting business reporting requirements
Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and cloud migration processing the data in Azure Databricks
Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks
Working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW)
Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform
Worked on Apache Spark Utilizing the Spark, SQL, and Streaming components to support the intraday
And real-time data processing
Set up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and Map Reduce to access cluster for new users
Used Spark SQL for Scala & amp, Python interface that automatically converts RDD case classes to schema RDD
Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response
Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning, and bucketing in hive, doing map side joins etc
Good knowledge on Spark platform parameters like memory, cores and executors
Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects
Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages)
Environment: Azure, Azure Data Factory, Databricks, PySpark, Python, Apache Spark, HBase, HIVE, SQOOP, Snowflake, Python, SSRS, Tableau.

Big data Developer

State Farm

05.2018 - 07.2019

Designed and developed the applications on the data lake to transform the data according business users to perform analytics
In depth understanding/ knowledge of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts
Involved in developing a Map Reduce framework that filters bad and unnecessary records
Involved heavily in setting up the CI/CD pipeline using Jenkins, Maven, Nexus, GitHub, and AWS
Developed data pipeline using flume, SQOOP, pig and map reduce to ingest customer behavioural data and purchase histories into HDFS for analysis
Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables handled structured data using Spark SQL
Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS
The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
Implemented the workflows using Apache OOZIE framework to automate tasks
Developing design documents considering all possible approaches and identifying best of them
Written Map Reduce code that will take input as log files and parse the and structures them in tabular format to facilitate effective querying on the log data
Developed scripts and automated data management from end to end and sync up b/w all the Clusters
Implemented Fair schedulers on the Job Tracker to share the resources of the cluster for the Map Reduce jobs given by the users
Environment: Cloudera CDH 3/4, Hadoop, HDFS, MapReduce, Hive, Oozie, Pig, Shell Scripting, MySQL.

Data warehouse Developer

Taylor Technology

05.2016 - 04.2018

Create and maintain database for Server Inventory, Performance Inventory
Worked in Agile Scrum Methodology with daily stand up meetings, great knowledge working with Visual SourceSafe for Visual studio 2010 and tracking the projects using Trello
Generated Drill through and Drill down reports with Drop down menu option, sorting the data, and defining subtotals in Power BI
Used Data warehouse for developing Data Mart which for feeding downstream reports, development of User Access Tool using which users can create ad-hoc reports and run queries to analyze data in the proposed Cube
Deployed the SSIS Packages and created jobs for efficient running of the packages
Expertise in creating ETL packages using SSIS to extract data from heterogeneous database and then transform and load into the data mart
Involved in creating SSIS jobs to automate the reports generation, cube refresh packages
Great Expertise in Deploying SSIS Package to Production and used different types of Package configurations to export various package properties to make package environment independent
Experienced with SQL Server Reporting Services (SSRS) to author, manage, and deliver both paper-based and interactive Web-based reports
Developed stored procedures and triggers to facilitate consistent data entry into the database
Shared data outside using Snowflake to quickly set up to share data without transferring or developing pipelines
Environment: Windows server, MS SQL Server 2014, SSIS, SSAS, SSRS, SQL Profiler, Power BI, C#, Performance Point Server, MS Office, SharePoint

Data warehouse Developer

Aetna

03.2014 - 04.2016

Worked in Agile Scrum Methodology with daily stand up meetings, great knowledge working with Visual SourceSafe for Visual studio 2010 and tracking the projects using Trello
Generated Drill through and Drill down reports with Drop down menu option, sorting the data, and defining subtotals in Power BI
Used Data warehouse for developing Data Mart which for feeding downstream reports, development of User Access Tool using which users can create ad-hoc reports and run queries to analyze data in the proposed Cube
Deployed the SSIS Packages and created jobs for efficient running of the packages
Experienced in Building Cubes and Dimensions with different Architectures and Data Sources for Business Intelligence and writing MDX Scripting
Involved in creating SSIS jobs to automate the reports generation, cube refresh packages
Great Expertise in Deploying SSIS Package to Production and used different types of Package configurations to export various package properties to make package environment independent
Experienced with SQL Server Reporting Services (SSRS) to author, manage, and deliver both paper-based and interactive Web-based reports
Developed stored procedures and triggers to facilitate consistent data entry into the database
Shared data outside using Snowflake to quickly set up to share data without transferring or developing pipelines.

Education

Bachelors - Electronics and Communication Engineering

Sathyabama Institute of Science & Technology

05.2011

Skills

TECHNICAL SKILLS
Big Data Technologies
MapReduce, Hive, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper
Hadoop Distribution
Cloudera, Horton Works
Azure Services
Azure data Factory, Airflow, Azure Data Bricks, Logic Apps, Functional App, Snowflake, Azure DevOps
Languages
Java, SQL, PL/SQL, Python, HiveQL, Scala
Operating Systems
Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS
Build Automation tools
Ant, Maven
Version Control

GIT, GitHub
IDE & Build Tools, Design
Eclipse, Visual Studio
Databases
MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse MS Excel, MS Access, Oracle 11g/12c, Cosmos DB
Environment:
Windows server, MS SQL Server 2014, SSIS, SSAS, SSRS, SQL Profiler, Power BI, C#, Performance Point Server, MS Office, SharePoint
Linux
Apache Hive
Automated Scheduling
Hadoop Cloudera
Python
Apache Spark

Certification

Azure Administrator (Az-104)

Timeline

Azure Data Engineer

Technicolor

03.2022 - Current

Azure Data Engineer

Black Knight Financial Services

11.2020 - 02.2022

Data Engineer

TMW Systems

08.2019 - 10.2020

Big data Developer

State Farm

05.2018 - 07.2019

Data warehouse Developer

Taylor Technology

05.2016 - 04.2018

Data warehouse Developer

Aetna

03.2014 - 04.2016

Bachelors - Electronics and Communication Engineering

Sathyabama Institute of Science & Technology

SATISH REDDY MUTTHANA

Summary

Overview

Work History

Azure Data Engineer

Azure Data Engineer

Data Engineer

Big data Developer

Data warehouse Developer

Data warehouse Developer

Education

Bachelors - Electronics and Communication Engineering

Skills

Certification

Timeline

Azure Data Engineer

Azure Data Engineer

Data Engineer

Big data Developer

Data warehouse Developer

Data warehouse Developer

Bachelors - Electronics and Communication Engineering

Similar Profiles

Elizabeth TaylorElizabeth Taylor

Vinoth Raj Vinoth Raj null

Manmohan Reddy AManmohan Reddy A

Jaliyah JeffersonJaliyah Jefferson

Miranda LambertMiranda Lambert