Summary

Overview

Work History

Education

Skills

Certification

Timeline

Akhil Reddy

Sunnyvale,CA

Summary

Data Engineer specializing in designing and maintaining robust data management systems. Expertise in optimizing data retrieval processes to enhance system efficiency and support data-driven decision-making through machine learning. Proficient in big data technologies including Hadoop and Apache Spark, along with SQL database management and data visualization tools like Power-BI and Tableau. Demonstrated problem-solving abilities with a history of delivering innovative data solutions in dynamic environments.

Overview

years of professional experience

Certification

Work History

Azure Data Engineer

Interwell Health

Boston, MA

06.2023 - Current

Spearheaded the development of a customer insights dashboard, integrating diverse data sources such as Azure SQL Database, Azure Blob Storage, Azure Data Lake Storage GEN2, and Azure HDInsight, handling formats including JSON, CSV, Parquet, and ORC.
Optimized data streaming into Azure Event Hubs, leveraging producers and consumers to enhance topic structures, leading to a 40% cost reduction and improved efficiency.
Utilized Azure Synapse Analytics for Spark transformations, employing Synapse catalogs for metadata storage and orchestrating over 100 Spark jobs to process data from Azure Blob Storage and transactional servers.
Enhanced ETL efficiency by optimizing Azure Data Factory pipelines, reducing data processing time by 40%, ensuring accurate and timely analytics and reporting.
Integrated Unity Catalog with Azure Databricks to strengthen security, ensure compliance, and improve data governance for enhanced regulatory adherence.
Designed and implemented complex workflows using Apache Airflow, ensuring seamless orchestration, efficient task scheduling, and reliable data pipelines.
Engineered file-level transformations with Azure Functions, scheduled via Azure Logic Apps, automating data processing tasks while minimizing manual intervention.
Implemented Unity Catalog for streamlined metadata management, ensuring data consistency and enhancing insights across multiple data sources.
Developed Azure Logic Apps workflows, utilizing Python, SQL, and Bash operators to create Directed Acyclic Graphs (DAGs) and execute tasks based on CRON expressions.
Architected data pipelines to transform and load data into Azure Synapse Analytics, reducing storage costs through optimized data management strategies.
Applied Azure RBAC for managing policies and permissions across Azure resources, ensuring secure access and governance.
Led the migration of on-premises data to Snowflake on Azure, improving query times by 40% and enhancing data accessibility.
Established a robust data governance framework on Snowflake, ensuring compliance with industry standards and improving data quality for better decision-making.
Integrated external data sources with Snowflake on Azure, enabling real-time analytics and boosting operational efficiency by 25%.
Developed Azure Functions and API Management for seamless financial data submission and processing.
Identified and implemented cost-saving measures in Azure Databricks, optimizing resource utilization and reducing cloud infrastructure costs by 15%.
Utilized Spark and PySpark for Scalable data processing, accelerating analytics workflows for large datasets.
Leveraged PySpark's RDD and Data Frame APIs for distributed data processing, improving performance and Scalability.
Applied Spark’s machine learning libraries, using PySpark for model development, training, and evaluation on various datasets.
Established a CI/CD pipeline using Azure DevOps and Azure Functions for streamlined financial data processing.
Applied DevOps principles to manage and optimize cloud infrastructure (AWS, Azure, GCP) for data storage, processing, and analytics, ensuring high availability, Scalability, and cost-efficiency. Configured and maintained services such as S3, EC2, and Lambda to support large-scale data pipelines.
Deployed Snowflake on Azure, leveraging Snow SQL for Scalable querying and data management, enhancing analytics with features such as automatic scaling and native support for semi-structured data.
Transformed Hive/SQL queries into Spark RDDs, Data Frames, and Datasets using Python and Scala, customizing them for financial data processing.
Developed microservices for financial data applications using Scala, optimizing data integration and processing.
Utilized Spark SQL to manage Hive queries within a Spark-based environment, improving efficiency by 75% and significantly enhancing query performance.
Created data frames and datasets using Spark and Spark Streaming, executing transformations and actions tailored for financial data needs.
Worked with Azure Event Hubs for publish-subscribe messaging, ensuring effective management of financial data streams.
Leveraged Azure Logic Apps for debugging and monitoring scheduled jobs, streamlining workflow troubleshooting.
Expertly troubleshot and resolved issues within data pipelines, including data inconsistencies, performance bottlenecks, and integration failures, ensuring smooth data flow and minimal downtime by leveraging tools like Apache Airflow, SQL, and log analysis.
Actively participated in Agile ceremonies such as sprint planning, daily stand-ups, and retrospectives to ensure smooth project execution and iterative development of data pipelines, adapting to changes and delivering high-quality data solutions.
Managed and version-controlled code for data engineering projects using Git, ensuring proper branching, merging, and conflict resolution in collaborative environments, enabling seamless collaboration and code integrity across teams.
Fostered strong communication channels with cross-functional teams, including data scientists, analysts, and DevOps, ensuring alignment on project goals, timelines, and data requirements for successful project delivery and continuous improvement.
Integrated transformed data with Power BI, enabling stakeholders to extract actionable insights and drive informed decision-making, leading to significant cost savings.
Collaborated with cross-functional teams, including business analysts, data scientists, and data engineers, aligning solutions with business objectives and delivering high-impact data strategies.
Environment: Azure Blob Storage, Azure SQL Database, Azure Data Lake Storage, Azure HDInsight, Azure Databricks, Unity Catalog, Azure Logic Apps, Azure Synapse Analytics, Azure Event Hubs, Azure Functions, Azure DevOps, Snowflake on Azure, Python, Scala, Spark (PySpark, Spark SQL), Kafka, Power BI, Linux, Java, Airflow, PostgreSQL, Oracle PL/SQL.

Azure Data Engineer

Anheuser-Busch

St Louis, MO

02.2022 - 05.2023

Worked closely with data scientists and utilized Azure Data Factory, including data flows and data catalogs, to develop a Spending Classification model for corporate card transactions, improving data organization and analytical capabilities.
Conducted in-depth data exploration, gathering information from Azure Synapse Analytics, Azure SQL Database, and Azure Data Lake Storage GEN2 to derive valuable insights for data-driven decision-making.
Integrated Azure API Management for efficient API calls, leveraged Azure Functions to ingest data from the Concur Expense Management platform, and dynamically created Azure Data Lake Storage for optimized data management.
Managed global financial data with Azure Cosmos DB and Azure SQL Database, ensuring efficient and Scalable data storage.
Leveraged Azure Data Factory data flows and Azure RBAC for structured metadata storage and effective data governance.
Implemented feature engineering on large datasets using Azure HDInsight Spark clusters, enhancing data preparation and achieving a 20% improvement in processing efficiency.
Successfully integrated machine learning models into Azure Databricks pipelines, enabling predictive analytics and increasing forecasting accuracy by 25%.
Automated SQL queries and designed Azure Logic Apps workflows to streamline data processing and reduce manual interventions.
Provided data access to data scientists through Azure Data Lake Storage and Azure Synapse Analytics, using Azure Functions for flexible and automated data retrieval.
Performed data modeling in Snowflake on Azure, implementing STAR and Snowflake schemas, while utilizing Azure SQL Database for structured representation and Azure RBAC for secure access.
Implemented Azure Data Factory ETL processes to integrate data from diverse sources into Azure Synapse Analytics, optimizing performance and ensuring seamless analytics.
Conducted data cleansing and transformation using PySpark on Azure Databricks, leveraging Spark’s parallel processing for greater efficiency.
Established collaborative data science workflows in Azure Databricks, fostering seamless teamwork between data scientists, analysts, and engineers, leading to a 20% increase in productivity.
Utilized Spark SQL to enhance ETL processes, significantly improving data extraction, transformation, and loading (ETL) by 30%.
Designed and deployed Azure Logic Apps to automate data workflows, optimizing complex data processing across Azure services.
Integrated Azure Synapse Analytics for ad-hoc querying on data stored in Azure Data Lake Storage, enabling quick insights and on-demand analysis.
Deployed Azure Event Hubs for real-time data streaming, processing high-velocity data for timely analytics in event-driven applications.
Led data migration and synchronization between on-premises databases and Azure using Azure Database Migration Service (DMS).
Led the migration of on-premises data warehouses and processing systems to cloud platforms (AWS, Azure, GCP), ensuring minimal disruption, optimized storage, and Scalable data processing environments for improved performance and cost-efficiency.
Managed the migration of legacy ETL processes to modern data pipeline frameworks, improving data extraction, transformation, and loading efficiency, while ensuring data integrity and seamless integration with new systems and platforms.
Configured Azure Monitor for end-to-end monitoring of data pipelines and infrastructure, proactively identifying and resolving performance issues.
Utilized a range of ETL tools such as Apache NiFi, Talend, and Azure Data Factory to design and implement efficient data pipelines, ensuring seamless data extraction, transformation, and loading from multiple sources into data warehouses and cloud platforms.
Implemented Azure DevOps pipelines to automate the migration of data engineering workflows to Azure, streamlining the deployment of data pipelines, storage, and processing services, and ensuring consistent, error-free transitions to cloud-based environments.
Utilized Azure Resource Manager (ARM) templates and terraform to define and provision infrastructure resources, enabling a Scalable, repeatable process for deploying data engineering environments in Azure, while minimizing manual interventions and ensuring seamless migration of data systems.
Automated the ingestion of web server log data using Azure Stream Analytics, efficiently storing data in Azure Data Lake Storage.
Implemented advanced Hive techniques such as Partitioning, Dynamic Partitions, and Buckets, optimizing performance and improving logical data organization.
Developed and enforced automated data quality checks in Azure Databricks, reducing data errors by 15% and maintaining high data integrity.
Automated data workflows with Apache Airflow, reducing engineering overhead and allowing teams to focus on high-value tasks.
Enhanced data warehousing in Snowflake on Azure, ensuring Scalability, multi-cloud flexibility, secure collaboration, time travel, versioning, and schema integration (STAR and Snowflake schemas) for optimized analytics.
Developed real-time data streaming solutions using Azure Kafka, Azure Stream Analytics, and Azure Databricks, providing instant access to continuously generated data.
Optimized ETL processes in Azure Databricks, reducing data processing time by 25%, improving data availability for business users.
Environment: Azure Blob Storage, Azure Data Factory, Azure Synapse Analytics, Azure SQL Database, Azure Data Lake Storage, Azure Databricks, Azure Logic Apps, Azure HDInsight, Azure Event Hubs, Azure Functions, Azure RBAC, Python, Scala, Spark (PySpark, SparkSQL), Kafka, Azure Synapse Analytics, Azure Cosmos DB, Linux, Java, Apache Airflow, PostgreSQL, Snowflake on Azure.

Big Data Developer

Kroger

Cincinnati, Ohio

07.2018 - 10.2020

Improved ETL processes using Python, SQL, and Java, enhancing efficiency by 30% in a large-scale data pipeline, resulting in faster data retrieval and analysis.
Deployed Hadoop and Hive for processing vast datasets, reducing processing time by 40% and enabling seamless healthcare data analysis.
Optimized MapReduce jobs with Pig and Java to efficiently process and transform raw data, improving overall big data pipeline performance.
Implemented HBase for real-time healthcare data processing, ensuring instant access to continuously generated information and providing timely insights for stakeholders.
Utilized YARN for Scalable and resource-efficient data processing, allowing seamless Scalability to manage increasing volumes of healthcare data.
Integrated Spark and PySpark for real-time data streaming, enabling rapid processing and analysis of streaming data sources, enhancing responsiveness and insights.
Applied Spark SQL for structured data analysis within healthcare datasets, optimizing performance and resource utilization for analytical workflows.
Managed and optimized HDFS storage solutions, ensuring efficient accessibility and retrieval of healthcare data for diverse analytical needs.
Developed data processing solutions using Teradata and Oracle databases, enhancing data integrity and facilitating seamless system integration.
Designed and executed efficient PySpark scripts for processing healthcare datasets, incorporating SparkSQL for structured data analysis and creating DAGs for workflow orchestration.
Implemented Postgres for structured data analysis, enhancing analytical capabilities and providing healthcare professionals with advanced reporting and visualization tools.
Actively participated in all stages of the SDLC, from requirement analysis to production, ensuring successful implementation and maintenance of big data projects, strengthening healthcare data infrastructure.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, Linux, Cluster Management.

Education

Master of Science - Information Science

Trine University

Skills

Data pipeline development and ETL
Cloud architecture and governance
Data integration and quality assurance
Agile methodologies
Data warehousing and analysis
Database administration and tuning
Storage virtualization techniques
Scripting languages proficiency
Metadata management strategies

Spark and Hadoop expertise
Real-time analytics implementation
Data security measures
Azure Data Factory expertise
AWS Glue management
Amazon S3 expertise
AWS Redshift proficiency
Data modeling and visualization

Certification

Microsoft DP-203: Azure Data Engineer Associate
SnowPro Advanced - Data Engineer
AWS Certified Data Engineer - Associate

Timeline

Azure Data Engineer

Interwell Health

06.2023 - Current

Azure Data Engineer

Anheuser-Busch

02.2022 - 05.2023

Big Data Developer

Kroger

07.2018 - 10.2020

Master of Science - Information Science

Trine University

Akhil Reddy

Summary

Overview

Work History

Azure Data Engineer

Azure Data Engineer

Big Data Developer

Education

Master of Science - Information Science

Skills

Certification

Timeline

Azure Data Engineer

Azure Data Engineer

Big Data Developer

Master of Science - Information Science

Similar Profiles

Jonathan MontijoJonathan Montijo

Tiana FernandezTiana Fernandez

Julia Gallegos, LMSW, CCMJulia Gallegos, LMSW, CCM

Jordan De LucioJordan De Lucio

Henry JajaHenry Jaja