Summary
Overview
Work History
Education
Skills
Timeline
Generic
Sainath Mandadi

Sainath Mandadi

Irving,Texas

Summary

Data Engineer with 5+ years of experience in designing and developing enterprise-level, low-latency, fault-tolerant data platforms. Skilled in building distributed streaming data pipelines and analytics data stores using Spark, Flink, and Kafka. Proficient in Python, SQL, Scala, and clouds AWS, Azure), with a strong focus on CI/CD practices, security, and collaboration in agile environments. Experienced in supporting scalable cloud-based solutions and optimizing data processing for high-quality insights. Strong background in developing data pipelines, optimizing data storage & processing, and analyzing large-scale distributed systems. Applies SDLC methodologies & agile practices to deliver robust solutions for complex data engineering challenges. Adept at using visualization tools and experienced in cloud-based data architectures, ensuring efficient data management & analysis across diverse technological environments.

Overview

5
5
years of professional experience

Work History

Data Engineer

Elevance
12.2023 - Current
  • Led migration from on-premises to AWS, implementing ETL processes with AWS Glue, managing S3 buckets, and optimizing data workflows using Apache Airflow and EMR clusters
  • Developed and maintained distributed streaming data pipelines using Spark Streaming, integrating with Kafka to handle real-time data in a fault-tolerant environment
  • Collaborated with security and infrastructure teams to adhere to application resiliency standards and ensure compliance
  • Built analytics data stores and contributed to CI/CD practices, enhancing scalability and operational performance on AWS
  • Designed and developed robust data pipelines using Spark, Scala, and PySpark, handling diverse data formats (Avro, Parquet, JSON) and integrating with Kafka for real-time streaming
  • Implemented advanced analytics solutions, including AI algorithms and Machine Learning models, to optimize data processing and improve data accuracy in large-scale enterprise environments
  • Established a comprehensive metadata management framework using Collibra, enhancing data lineage tracking and compliance monitoring across the organization
  • Orchestrated end-to-end data solutions, from ingestion to visualization, utilizing technologies such as Hadoop, Snowflake, AWS Redshift, and Tableau, while ensuring data quality through custom validation scripts

Data Engineer

US Bank
11.2022 - 12.2023
  • Developed and implemented CI/CD pipelines using AWS Code Pipeline, AWS Glue, and AWS Databricks for AWS Big Data solutions
  • Developed real-time data processing applications using Scala, Python, and Apache Spark Streaming, integrating with various sources like Kafka and JMS for efficient data handling
  • Implemented and optimized ETL processes using PySpark, Hive, and Spark SQL, creating data frames and performing complex transformations to meet business requirements
  • Built distributed data computing systems with Spark and Kafka, developing real-time streaming applications and ensuring application resiliency
  • Partnered with agile teams to implement CI/CD and application scaling solutions across testing and production environments on AWS
  • Engaged in code reviews, automated testing, and performance tuning for robust, low-latency data solutions
  • Utilized AWS services including S3, EC2, and CloudFormation for scalable infrastructure, while implementing infrastructure as code (IaC) for automated environment deployment and testing
  • Created and optimized Spark clusters using Azure Databricks to accelerate high-quality data preparation, developing Spark applications in Scala for seamless Hadoop transitions
  • Designed and implemented comprehensive testing frameworks to validate data integrity, application performance, and security compliance across cloud migrations and data processing pipelines
  • Migration of data analytics platforms from Azure to AWS, designing strategies for large-scale data warehouses, and establishing secure data pipelines while optimizing costs and performance

Data Engineer

Allstate Solutions Private Limited
05.2020 - 06.2022
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala
  • Developed multiple POCs using Scala and deployed them on the Yarn cluster; compared the performance of Spark with Hive and SQL/Teradata
  • Written Hive queries for data analysis to meet the business requirements
  • Constructed and optimized ETL workflows for distributed data systems, utilizing Spark and NoSQL (DynamoDB) for large-scale data processing
  • Designed and implemented data warehousing solutions using Redshift and Snowflake for scalable data insights
  • Collaborated on agile development cycles, focusing on application performance, scalability, and security compliance
  • Automated all the jobs, for pulling data from the FTP server to load data into Hive tables using Oozie workflows
  • Involved in creating Hive tables, working on them using HiveQL, and performing data analysis using Hive and Pig
  • Worked on creating data pipelines for copy activity, moving, and transforming the data with custom Azure Data Factory pipeline activities for on-cloud ETL processing
  • Extensive experience in Azure Data Lake Analytics, Azure Data Lake Storage, AZURE Data Factory, Azure SQL databases, and Azure SQL Data Warehouse for providing analytics and reports for improving marketing strategies
  • Involved in all the steps and scope of the project reference data approach to MDM, created a data dictionary, and mapped from sources to the target in the MDM Data Model
  • Defined UDFs using PIG and Hive to capture customer behavior
  • Design and implement MapReduce jobs to support distributed processing using Java, Hive, and Apache Pig
  • Create Hive external tables on the MapReduce output before partitioning, and bucketing are applied to it
  • Worked with ETL tools Including Talend Data Integration, Talend Big Data, Pentaho Data Integration, and Informatica
  • Writing a Data Bricks code and ADF pipeline fully parameterized for efficient code management
  • Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map-reduce jobs that extract

Education

Master of Sciences - Computer Science

Campbellsville University
Louisville, KY
12.2023

Skills

  • Hadoop
  • Spark
  • PySpark
  • Hive
  • HDFS
  • MapReduce
  • Sqoop
  • Kafka
  • Flume
  • Impala
  • Oozie
  • MapR
  • Apache NiFi
  • Apache Pig
  • YARN
  • Informatica
  • Zookeeper
  • Hbase
  • Python
  • Java
  • Scala
  • SQL
  • PL/SQL
  • Linux shell scripts
  • Flink
  • Snowflake
  • Redshift
  • Data Warehousing
  • Real-time Data Processing
  • NoSQL
  • DynamoDB
  • AWS
  • Azure
  • Application Resiliency Standards
  • Agile
  • CI/CD Practises
  • Automated Testing
  • Security Compliance
  • MySQL
  • MongoDB
  • PostgreSQL
  • DB2
  • MS-SQL Server
  • HBASE
  • Azure SQL DB
  • Teradata
  • Cassandra
  • Oracle DB
  • Tableau
  • SSRS
  • Crystal Reports
  • Power BI
  • Domo
  • Microsoft Azure
  • Data Factory
  • Databricks
  • SQL DB
  • Synapse Analytics
  • EC2
  • S3
  • RDS
  • EMR
  • Glue
  • Scrum
  • Waterfall
  • Data Migration
  • Data Cleansing
  • ETL Processes
  • Data Profiling
  • Performance Tuning
  • Data Modeling
  • ETL development
  • Big Data Processing
  • Data Pipeline Design
  • NoSQL Databases
  • API Development
  • Machine Learning
  • Scripting Languages
  • Metadata Management
  • Hadoop Ecosystem
  • Data Security
  • SQL Expertise
  • ETL processes
  • Critical Thinking
  • Data Visualization
  • Effective Communication
  • Key Performance Indicators
  • Data Acquisitions
  • Security Protocols
  • Time management abilities
  • Written Communication
  • Organizational Skills
  • Multitasking Abilities
  • Multitasking
  • Self Motivation
  • Professionalism
  • Problem-Solving
  • Recovery planning
  • Excellent Communication
  • Secure Data Retention
  • Team building
  • Data Mining
  • Attention to Detail
  • Problem-solving abilities
  • Interpersonal Skills
  • Load Balancing
  • Structure designs
  • RDMS Design
  • Time Management
  • Task Prioritization
  • Decision-Making
  • Data Quality Assurance
  • Spark Framework
  • Data curating
  • Real-time Analytics
  • Data Governance
  • Data pipeline control
  • Data integration
  • Database Design
  • SQL Programming
  • Relational databases
  • Storage virtualization
  • Data Analysis
  • Technology leadership work streams
  • Business Intelligence
  • SQL and Databases
  • Backup and recovery
  • RDBMS
  • Risk Analysis
  • Database Administration
  • Advanced analytics
  • SQL transactional replications
  • Data Analytics
  • Team Collaboration
  • Adaptability
  • Problem-solving aptitude
  • Analytical Skills
  • Data warehousing expertise
  • Database Development
  • Analytical Thinking
  • XML Web Services
  • Data repositories
  • Adaptability and Flexibility
  • Teamwork and Collaboration

Timeline

Data Engineer

Elevance
12.2023 - Current

Data Engineer

US Bank
11.2022 - 12.2023

Data Engineer

Allstate Solutions Private Limited
05.2020 - 06.2022

Master of Sciences - Computer Science

Campbellsville University
Sainath Mandadi