Summary
Overview
Work History
Education
Skills
Timeline
Generic

Nova Guliyev

Washington,District of Columbia

Summary

Accomplished Lead Data Engineer with 9+ years of experience specializing in designing and implementing innovative data solutions across On-Premises, Azure Cloud, and Databricks. Expert in leading cross-functional teams, optimizing data processing with Apache Spark and Azure Databricks, and integrating disparate systems. A cloud migration expert known for aligning technical solutions with business objectives.

Overview

10
10
years of professional experience

Work History

Principal Data Engineer (Consultant)

THE COCA-COLA COMPANY
Atlanta, GA
03.2023 - Current
  • Led and managed a high-performing data engineering team, overseeing end-to-end data pipeline development and implementation
  • Utilized Databricks for advanced data processing, resulting in a 30% reduction in data preparation time and a 25% increase in model training efficiency
  • Designed and implemented data pipelines in Azure Data Factory, Azure Synapse achieving a 40% improvement in data ingestion speed and a 15% reduction in ETL process failures
  • Created a lambda architecture for ingestion and pre-processing of telemetry data using Azure Databricks and finally storing the data into various data storage like Azure Data lake and Cosmos DB.
  • Developed interactive and insightful Power BI dashboards, enhancing data visualization and enabling stakeholders to make informed decisions.
  • Implemented Spark and Spark ML to build machine learning pipelines, resulting in a 25% improvement in predictive model accuracy and a 20% reduction in model training time
  • Automated business processes using Azure Logic Apps, ensuring timely data integration across systems
  • Created serverless Azure Functions for real-time data processing, Wrote Azure Functions for CDC activities in Azure Cosmos DB and refreshing Power BI models
  • Led the design and implementation of data warehousing solutions using Azure Synapse & Data Factory / Azure Databricks, optimizing data storage and retrieval processes to enhance data accessibility and reporting capabilities
  • Developed and maintained complex stored procedures and functions in Azure SQL Database to facilitate critical data transformations and streamline data processing workflows, enabling efficient and scalable data operations
  • Implemented Unity Catalog, Delta Live Tables, and Spark Structured Streaming to enable real-time data integration, tracking changes, and processing streaming data, ensuring that the organization had access to up-to-the-minute insights for agile decision-making
  • Ensured that Spark codes were fine-tuned for efficiency and maintained high-quality data warehousing solutions for seamless data management
  • Extracted data from diverse sources, including Salesforce, NetSuite, SQL databases, Zendesk, Cosmos DB, and other systems, using REST APIs to fuel Lakehouse development
  • Implemented real-time data processing solutions using Azure Stream Analytics, Azure Event Hubs enabling immediate insights and actions based on streaming data
  • Administered Azure Databricks clusters, implementing CI/CD pipelines for notebook deployment, optimizing Delta tables, documenting data lineage, ensuring security and access control, monitoring and troubleshooting clusters, integrating with Azure services, assuring data quality, and fostering collaborative development through training and documentation.

Lead Data Engineer (Consultant)

Concord
Minneapolis, MN
03.2022 - 03.2023
  • Led the migration of on-premises pipelines to cloud-based Databricks, implementing a modern Medallion architecture for enhanced data processing efficiency and scalability
  • Led architecture design for scalable and robust data solutions, driving innovation and meeting business needs
  • Designed and implemented ETL processes using Python, SQL, and Apache Spark to collect, clean, and transform data from multiple sources, resulting in a 25% improvement in data accuracy
  • Implemented Spark & Delta optimization techniques including partitioning, caching, broadcast joins, Z-Ordering, Auto-Compaction, Versioning and optimized spark codes to significantly improve processing efficiency and reduce job execution times on the Databricks platform.
  • Developed and maintained scalable data pipelines in Databricks, allowing for efficient processing and analysis of large datasets
  • Implemented a robust Medallion architecture, comprising Bronze, Silver, and Gold layers, to effectively manage data lifecycle from raw ingestion to refined insights, while also establishing real-time processing capabilities for immediate data analysis and decision-making
  • Designed and implemented data pipelines using Azure Data Factory to extract data from various data sources, transform and load it into Azure SQL Database for analytical reporting
  • Created and maintained PySpark jobs using Azure Databricks for data transformations, aggregations, and cleansing and Accesses Azure Data Lake Gen2 / Azure Blob Storage via Azure Service Principals
  • Implemented Kafka for real-time data streaming and efficient event processing in the data architecture.
  • Led and managed a team to successfully execute projects, fostering collaboration, guiding team members, and ensuring timely delivery of high-quality results.
  • Utilized Azure DevOps to establish and manage CI/CD pipelines, automating build, test, and deployment processes, thereby enhancing development efficiency and ensuring seamless delivery of applications
  • Collaborated with cross-functional teams, including data analysts and data scientists, to ensure data integrity and improve the quality of business insights

Lead Data Engineer

Hewlett-Packard Co
Houston, TX
12.2016 - 03.2022
  • Configured Azure Virtual Network (VNET) and established private connectivity, implementing secure network architecture to facilitate seamless communication between Azure resources while maintaining data privacy and compliance standards.
  • Led data architecture initiatives, designing robust data warehouse architectures based on Kimball and Inmon methodologies to optimize data storage, retrieval, and analytics.
  • Integrated Lakehouse architecture for unified structured and unstructured data processing, optimizing efficiency
  • Implemented real-time data streaming using Azure Event Hubs and Debezium for immediate processing of dynamic datasets.
  • Expertly migrated on-premises data to the cloud with Azure Databricks, Azure SQL DB, Azure Data Factory, and AWS Database Migration Service, ensuring integrity, security, and performance.
  • Designed and developed scalable data processing pipelines using Spark, Python, and Scala for a large-scale data warehousing project
  • Created Advanced features with Python libraries such as Pandas, NumPy, Matplotlib, Seaborn, Statmodels, Scipy for EDA and Data Transformations
  • Used Python for data engineering/analysis/science, C# for backend development (Dapper, EF Core), and SQL for database management, contributing to the development of versatile and comprehensive data solutions
  • Gained practical experience in Snowflake for cloud data warehousing, implementing scalable data engineering solutions and ensuring efficient data processing and analytics.
  • Designed, developed, and maintained data pipelines using Databricks and Spark for a large e-commerce organization with over 100TB of data
  • Built and maintained a real-time streaming data processing platform using SparkStreaming and Kafka to process clickstream data
  • Used Power BI's capabilities to create insightful data visualizations and reports
  • Leveraged Data Analysis Expressions (DAX) to develop complex calculations, enabling in-depth data analysis
  • Generated comprehensive reports that empowered stakeholders with actionable insights and data-driven decision-making
  • Managed Azure environments, including administration of virtual machines, networking, and security measures, ensuring reliable performance and optimal resource utilization for various projects and applications.

Data Engineer

Berkshire Hathaway
Boston, MA
02.2015 - 12.2016
  • I effectively utilized Hadoop, along with associated technologies like Oozie, Sqoop, and Hive, to manage and process large-scale data efficiently
  • Utilized Docker for containerizing Spark-based ETL processes, enhancing scalability and portability across diverse environments
  • This experience, in conjunction with my proficiency in Cloudera, enhanced our data management capabilities, facilitating the development of scalable and high-performance data solutions
  • Implemented ETL processes in Python, showcasing proficiency in data extraction, transformation, and loading techniques
  • Orchestrated complex data workflows using Apache Airflow, ensuring the reliable execution and monitoring of data pipelines
  • Utilized Kafka for real-time data streaming, enhancing the scalability and efficiency of data ingestion
  • Employed Apache Spark for distributed data processing, optimizing large-scale analytics and machine learning tasks
  • Applied statistical methods and machine learning techniques to derive meaningful insights from data
  • Demonstrated cloud expertise, deploying and managing data solutions on AWS and Azure platforms
  • Implemented DevOps practices, including CI/CD pipelines, to ensure the reliability and scalability of data applications
  • Incorporated statistical methods to analyze and interpret data, providing valuable insights for informed decision-making
  • Executed end-to-end data lifecycle processes, from data engineering and processing to deploying machine learning models in cloud environments
  • Worked on designing and creating the Enterprise Integrated Data warehouse
  • Implemented Azure Load Balancers and Application Gateways in the insurance sector to optimize the performance of online portals and claims processing systems
  • Worked with Index Tuning Wizard, SQL Profiler and SQL trace for Performance Tuning Performed Database Refresh tasks from production to Development and Staging Servers

Education

Master of Science - Data Science

Northeastern University
Boston, MA
04-2022

Bachelor of Science - Computer Science

ADA University
Baku Azerbaijan
05-2017

Skills

  • Expertise in multiple programming languages like Python, Scala, C#, SQL
  • As a Databricks Lead, orchestrated stream and batch processing pipelines using Databricks, Unity Catalog, and Delta tables for real-time data ingestion, processing, and deployment
  • Proficient in Azure Cloud technologies including Data Factory, Synapse, SQL Database, Functions, Logic Apps, and Cosmos DB, delivering end-to-end data solutions Augmented by expertise in CI/CD Azure infrastructure and administration, ensuring efficient and scalable operations
  • Skilled in complex ML algorithms (SVM, Bagging, Boosting, Decision Trees, Random Forests, PCA, LDA, Naive Bayes) and Deep Learning (ANN, RNN, CNN, LSTM, GRU)
  • Big Data Tools: Hadoop, Hive, Spark, Pig, Sqoop, HBase, MongoDB
  • Version Control: GIT Architecture: Relational DBMS, Client-server architecture, OLTP, OLAP
  • Successfully led and managed teams of all sizes, fostering collaboration, guiding team members, and ensuring the timely delivery of high-quality results
  • Effectively communicated with stakeholders to understand project requirements, provide updates, and address concerns, ensuring alignment and successful project outcomes
  • Cloud migration specialist skilled in delivering end-to-end data solutions with expertise in CI/CD infrastructure

Timeline

Principal Data Engineer (Consultant)

THE COCA-COLA COMPANY
03.2023 - Current

Lead Data Engineer (Consultant)

Concord
03.2022 - 03.2023

Lead Data Engineer

Hewlett-Packard Co
12.2016 - 03.2022

Data Engineer

Berkshire Hathaway
02.2015 - 12.2016

Master of Science - Data Science

Northeastern University

Bachelor of Science - Computer Science

ADA University
Nova Guliyev