Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Generic

Dinesh Reddy Vydhyula

Cleveland,OH

Summary

Experienced Data Engineer with over 5 years in designing, deploying, and optimizing large-scale data solutions and enterprise applications. Proven expertise in Hadoop ecosystem (HDFS, MapReduce, Hive, Sqoop, Flume), cloud-based ETL processes (AWS Glue, Azure Data Factory, IDMC/IICS), and real-time data pipelines using Spark, Kafka, and Stream Sets. Adept at creating and maintaining data lakes, data warehouses (AWS Redshift, Snowflake, Azure SQL), and data catalogs to ensure high data accuracy, quality, and accessibility. Skilled in automation through Python, Airflow, and Terraform, as well as managing both relational (Oracle, SQL Server, MySQL) and NoSQL databases (MongoDB, Cassandra, HBase). Advanced understanding of machine learning algorithms and statistical tools ( Pandas) for data analysis, and a solid background in CI/CD and DevOps practices for optimized deployment. Enthusiastic about cloud adoption, open-source data engineering, and exploring ML and automation-driven solutions to enhance data-driven decision-making. Knowledgeable with robust background in data architecture and pipeline development. Proven ability to streamline data processes and enhance data integrity through innovative solutions. Demonstrates advanced proficiency in SQL and Python, leveraging these skills to support cross-functional teams and drive data-driven decision-making.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Azure Data Engineer

Market Digital
04.2023 - Current

Designed and implemented ETL processes in AWS Glue, migrating and transforming data from multiple sources (e.g., S3 and text files) into AWS Redshift, improving data accuracy by 30% and reducing reporting time by 20%

  • Developed scalable data integration solutions in IDMC/IICS, integrating cloud and on-premises systems with robust error logging, recovery, and alerting, leading to a 25% reduction in processing errors and enhancing uptime by 15%
  • Created and optimized ETL pipelines in Snowflake using Python and Snow SQL, supporting real-time and batch processing, achieving a 40% improvement in data processing efficiency
  • Automated data ingestion and transformation workflows via Py Spark, Spark SQL, and Stream Sets across multi-cloud environments, improving processing speed by 50% and reducing manual intervention by 60%
  • Built a real-time streaming pipeline with Kafka, Spark Streaming, and Redshift, increasing data ingestion speed by 35% and enabling near-instantaneous data insights, boosting decision-making by 20%
  • Developed SQL-based solutions using ranking and aggregation functions in Spark and Snowflake environments, leading to a 30% reduction in query execution time and improving reporting accuracy by 25%
  • Established data warehousing and data lake architectures on AWS Redshift, Azure SQL, and Hadoop, reducing data storage costs by 20% while enhancing data retrieval speeds by 40%
  • Managed Kubernetes clusters, Docker containers, and Apache Airflow workflows, achieving 99.9% uptime for data operations, and reduced task completion time by 45% through efficient orchestration and scheduling
  • Processed high volumes of structured and semi-structured data across multiple sources, implementing transformations and machine learning algorithms in Spark, resulting in a 25% increase in data processing capabilities and improved data quality by 30%
  • Configured and maintained data catalog, streamlining metadata management across SQL Server, Oracle, and Azure, enhancing data accessibility by 40% and improving query resolution times by 30%
  • Environment: Hadoop, Spark, Hive, Tableau, Linux, Python, Kafka, AWS S3 Buckets, AWS Glue, Stream sets, Poster AWS EC2, Oracle PL/SQL, Development toolkit (JIRA, Bitbucket/Git, Service now, etc.)

AWS Data Engineer

I Credit Works
07.2022 - 03.2023
  • Architected and optimized AWS data infrastructure using S3, Redshift, Glue, and Lambda, enhancing data processing speed by 40% and reducing data retrieval time by 30%, enabling faster decision-making
  • Designed and implemented scalable ETL pipelines integrating data from SQL, NoSQL, and ServiceNow, improving data accessibility by 35% and enhancing data quality by 25% across systems
  • Enforced data governance and security policies through IDMC/IICS frameworks, achieving a 99% compliance rate and minimizing data security risks
  • Managed large-scale data migrations to cloud platforms using IDMC/IICS, reducing migration times by 20% and supporting seamless data transition for enterprise data assets
  • Collaborated with data scientists and analysts to support ML workflows on AWS Sage Maker, implementing data transformations that improved model accuracy by 15% and reducing data preparation time by 25%
  • Implemented real-time data processing workflows with Amazon Kinesis and Apache Kafka, reducing data latency by 35% and enabling real-time insights via interactive Power BI dashboards, driving 20% faster business decisions
  • Automated resource provisioning and management across AWS services using Terraform for Infrastructure as Code , reducing manual intervention by 60% and cutting infrastructure setup time by 5

Azure Data Engineer

Bank of America
08.2021 - 06.2022
  • Designed and implemented upgrade/downgrade scripts in SQL to filter and validate data, improving data quality by 30% and reducing data retrieval errors by 25%
  • Optimized Azure Storage and SQL Server solutions, including blob storage and data warehousing in Azure SQL Data Warehouse, enabling a 40% increase in data processing efficiency
  • Built and troubleshooted high-volume data extractions with Azure Data Factory (ADF), leading to a 35% reduction in data transfer times and enhancing data accessibility across departments
  • Migrated legacy databases to Azure using Azure Data Factory, SSIS, and PowerShell, achieving a 45% cost reduction through efficient lift-and-shift strategies
  • Developed end-to-end data solutions (storage, integration, processing, and visualization) in Azure, supporting analytics with 99% uptime and enhancing operational insights by 50% through Power BI and SSRS dashboards
  • Automated daily data ingestion from web services into Azure SQL DB using Python and ADF, reducing manual intervention by 60% and improving data availability for analytics by 20%
  • Deployed complex SQL and T-SQL queries, stored procedures, and CTEs to optimize business intelligence processes, enhancing Power BI report generation speed by 30%
  • Developed streaming and ETL pipelines in Data bricks and Azure Event Hubs to analyze IoT data in real-time, improving processing speeds by 45% and providing actionable insights on efficiency metrics
  • Environment: Azure SQL, Azure Storage Explorer, Azure Storage, Azure Blob Storage, Azure Backup, Azure Files, Azure Data Lake Storage, SQL Server Management Studio 2016, Visual Studio 2015, VSTS, Azure Blob, Power BI, PowerShell, C# .Net, SSIS, DataGrid, ETL Extract Transformation and Load, Business Intelligence (BI)

Spark/Big Data Engineer

Mindtree
07.2019 - 07.2021

Designed a data lake workflow in the Hadoop ecosystem, enabling seamless integration with Tableau for reporting, improving data accessibility for stakeholders by 40% and enhancing reporting efficiency by 30%

  • Developed Source-to-Target Mappings (STM) for tables based on business requirements, ensuring accurate data transformation and increasing data consistency by 25% across reports
  • Optimized data redundancy in Snowflake by loading real-time data into HDFS with Kafka, and developed Py Spark and Spark SQL code on Amazon EMR, achieving a 35% reduction in data processing time
  • Created and managed Hive tables on HDFS to store Parquet-formatted data, increasing data storage efficiency by 30% and reducing query response times by 20% on Cloudera Hadoop Cluster
  • Wrote multiple MapReduce programs in Java for data extraction, transformation, and aggregation, enabling efficient handling of various file formats (XML, JSON, CSV) and boosting data processing capacity by 40%
  • Utilized AWS S3 as a storage layer for HDFS and implemented Flume for log data ingestion, enhancing data ingestion speed by 25% and improving data availability for processing
  • Collaborated with cross-functional teams using Bitbucket, Confluence, and Jira, maintaining Agile practices and achieving a 95% on-time project delivery rate while ensuring alignment with evolving requirements
  • Environment: SPARK, Hive, Pig, Flume IDE, AWS CLI, AWS EMR, AWS S3, Rest API, shell scripting, Git, Spark, Py Spark, Spark SQL

Education

Master of Science - Computer Science

Western Illinois University
Macomb, IL
05-2023

Skills

  • Hadoop
  • Spark
  • Hive
  • Impala
  • Kafka
  • Airflow
  • Cloudera
  • Scala
  • Python
  • SQL
  • Shell Scripting
  • JavaScript
  • SQL Server
  • MySQL
  • Teradata
  • Snowflake
  • AWS
  • Azure
  • GCP
  • Apache Kafka
  • Tableau
  • Power BI
  • Excel
  • Linux
  • Ubuntu
  • Red Hat
  • Microsoft Windows
  • SSIS
  • SSRS
  • CI/CD

Certification

Microsoft Azure Fundamentals

Languages

English
Professional Working
Telugu
Native or Bilingual
Hindi
Limited Working

Timeline

Azure Data Engineer

Market Digital
04.2023 - Current

AWS Data Engineer

I Credit Works
07.2022 - 03.2023

Azure Data Engineer

Bank of America
08.2021 - 06.2022

Spark/Big Data Engineer

Mindtree
07.2019 - 07.2021

Master of Science - Computer Science

Western Illinois University
Dinesh Reddy Vydhyula