Summary

Overview

Work History

Education

Skills

Timeline

Nova Guliyev

Washington,District of Columbia

Summary

Accomplished Lead Data Engineer with 9+ years of experience specializing in designing and implementing innovative data solutions across On-Premises, Azure Cloud, and Databricks. Expert in leading cross-functional teams, optimizing data processing with Apache Spark and Azure Databricks, and integrating disparate systems. A cloud migration expert known for aligning technical solutions with business objectives.

Overview

years of professional experience

Work History

Principal Data Engineer (Consultant)

THE COCA-COLA COMPANY

Atlanta, GA

03.2023 - Current

Led and managed a high-performing data engineering team, overseeing end-to-end data pipeline development and implementation
Utilized Databricks for advanced data processing, resulting in a 30% reduction in data preparation time and a 25% increase in model training efficiency
Designed and implemented data pipelines in Azure Data Factory, Azure Synapse achieving a 40% improvement in data ingestion speed and a 15% reduction in ETL process failures
Created a lambda architecture for ingestion and pre-processing of telemetry data using Azure Databricks and finally storing the data into various data storage like Azure Data lake and Cosmos DB.
Developed interactive and insightful Power BI dashboards, enhancing data visualization and enabling stakeholders to make informed decisions.
Implemented Spark and Spark ML to build machine learning pipelines, resulting in a 25% improvement in predictive model accuracy and a 20% reduction in model training time
Automated business processes using Azure Logic Apps, ensuring timely data integration across systems
Created serverless Azure Functions for real-time data processing, Wrote Azure Functions for CDC activities in Azure Cosmos DB and refreshing Power BI models
Led the design and implementation of data warehousing solutions using Azure Synapse & Data Factory / Azure Databricks, optimizing data storage and retrieval processes to enhance data accessibility and reporting capabilities
Developed and maintained complex stored procedures and functions in Azure SQL Database to facilitate critical data transformations and streamline data processing workflows, enabling efficient and scalable data operations
Implemented Unity Catalog, Delta Live Tables, and Spark Structured Streaming to enable real-time data integration, tracking changes, and processing streaming data, ensuring that the organization had access to up-to-the-minute insights for agile decision-making
Ensured that Spark codes were fine-tuned for efficiency and maintained high-quality data warehousing solutions for seamless data management
Extracted data from diverse sources, including Salesforce, NetSuite, SQL databases, Zendesk, Cosmos DB, and other systems, using REST APIs to fuel Lakehouse development
Implemented real-time data processing solutions using Azure Stream Analytics, Azure Event Hubs enabling immediate insights and actions based on streaming data
Administered Azure Databricks clusters, implementing CI/CD pipelines for notebook deployment, optimizing Delta tables, documenting data lineage, ensuring security and access control, monitoring and troubleshooting clusters, integrating with Azure services, assuring data quality, and fostering collaborative development through training and documentation.

Lead Data Engineer (Consultant)

Concord

Minneapolis, MN

03.2022 - 03.2023

Led the migration of on-premises pipelines to cloud-based Databricks, implementing a modern Medallion architecture for enhanced data processing efficiency and scalability
Led architecture design for scalable and robust data solutions, driving innovation and meeting business needs
Designed and implemented ETL processes using Python, SQL, and Apache Spark to collect, clean, and transform data from multiple sources, resulting in a 25% improvement in data accuracy
Implemented Spark & Delta optimization techniques including partitioning, caching, broadcast joins, Z-Ordering, Auto-Compaction, Versioning and optimized spark codes to significantly improve processing efficiency and reduce job execution times on the Databricks platform.
Developed and maintained scalable data pipelines in Databricks, allowing for efficient processing and analysis of large datasets
Implemented a robust Medallion architecture, comprising Bronze, Silver, and Gold layers, to effectively manage data lifecycle from raw ingestion to refined insights, while also establishing real-time processing capabilities for immediate data analysis and decision-making
Designed and implemented data pipelines using Azure Data Factory to extract data from various data sources, transform and load it into Azure SQL Database for analytical reporting
Created and maintained PySpark jobs using Azure Databricks for data transformations, aggregations, and cleansing and Accesses Azure Data Lake Gen2 / Azure Blob Storage via Azure Service Principals
Implemented Kafka for real-time data streaming and efficient event processing in the data architecture.
Led and managed a team to successfully execute projects, fostering collaboration, guiding team members, and ensuring timely delivery of high-quality results.
Utilized Azure DevOps to establish and manage CI/CD pipelines, automating build, test, and deployment processes, thereby enhancing development efficiency and ensuring seamless delivery of applications
Collaborated with cross-functional teams, including data analysts and data scientists, to ensure data integrity and improve the quality of business insights

Lead Data Engineer

Hewlett-Packard Co

Houston, TX

12.2016 - 03.2022

Configured Azure Virtual Network (VNET) and established private connectivity, implementing secure network architecture to facilitate seamless communication between Azure resources while maintaining data privacy and compliance standards.
Led data architecture initiatives, designing robust data warehouse architectures based on Kimball and Inmon methodologies to optimize data storage, retrieval, and analytics.
Integrated Lakehouse architecture for unified structured and unstructured data processing, optimizing efficiency
Implemented real-time data streaming using Azure Event Hubs and Debezium for immediate processing of dynamic datasets.
Expertly migrated on-premises data to the cloud with Azure Databricks, Azure SQL DB, Azure Data Factory, and AWS Database Migration Service, ensuring integrity, security, and performance.
Designed and developed scalable data processing pipelines using Spark, Python, and Scala for a large-scale data warehousing project
Created Advanced features with Python libraries such as Pandas, NumPy, Matplotlib, Seaborn, Statmodels, Scipy for EDA and Data Transformations
Used Python for data engineering/analysis/science, C# for backend development (Dapper, EF Core), and SQL for database management, contributing to the development of versatile and comprehensive data solutions
Gained practical experience in Snowflake for cloud data warehousing, implementing scalable data engineering solutions and ensuring efficient data processing and analytics.
Designed, developed, and maintained data pipelines using Databricks and Spark for a large e-commerce organization with over 100TB of data
Built and maintained a real-time streaming data processing platform using SparkStreaming and Kafka to process clickstream data
Used Power BI's capabilities to create insightful data visualizations and reports
Leveraged Data Analysis Expressions (DAX) to develop complex calculations, enabling in-depth data analysis
Generated comprehensive reports that empowered stakeholders with actionable insights and data-driven decision-making
Managed Azure environments, including administration of virtual machines, networking, and security measures, ensuring reliable performance and optimal resource utilization for various projects and applications.

Data Engineer

Berkshire Hathaway

Boston, MA

02.2015 - 12.2016

I effectively utilized Hadoop, along with associated technologies like Oozie, Sqoop, and Hive, to manage and process large-scale data efficiently
Utilized Docker for containerizing Spark-based ETL processes, enhancing scalability and portability across diverse environments
This experience, in conjunction with my proficiency in Cloudera, enhanced our data management capabilities, facilitating the development of scalable and high-performance data solutions
Implemented ETL processes in Python, showcasing proficiency in data extraction, transformation, and loading techniques
Orchestrated complex data workflows using Apache Airflow, ensuring the reliable execution and monitoring of data pipelines
Utilized Kafka for real-time data streaming, enhancing the scalability and efficiency of data ingestion
Employed Apache Spark for distributed data processing, optimizing large-scale analytics and machine learning tasks
Applied statistical methods and machine learning techniques to derive meaningful insights from data
Demonstrated cloud expertise, deploying and managing data solutions on AWS and Azure platforms
Implemented DevOps practices, including CI/CD pipelines, to ensure the reliability and scalability of data applications
Incorporated statistical methods to analyze and interpret data, providing valuable insights for informed decision-making
Executed end-to-end data lifecycle processes, from data engineering and processing to deploying machine learning models in cloud environments
Worked on designing and creating the Enterprise Integrated Data warehouse
Implemented Azure Load Balancers and Application Gateways in the insurance sector to optimize the performance of online portals and claims processing systems
Worked with Index Tuning Wizard, SQL Profiler and SQL trace for Performance Tuning Performed Database Refresh tasks from production to Development and Staging Servers

Education

Master of Science - Data Science

Northeastern University

Boston, MA

04-2022

Bachelor of Science - Computer Science

ADA University

Baku Azerbaijan

05-2017

Skills

Expertise in multiple programming languages like Python, Scala, C#, SQL
As a Databricks Lead, orchestrated stream and batch processing pipelines using Databricks, Unity Catalog, and Delta tables for real-time data ingestion, processing, and deployment
Proficient in Azure Cloud technologies including Data Factory, Synapse, SQL Database, Functions, Logic Apps, and Cosmos DB, delivering end-to-end data solutions Augmented by expertise in CI/CD Azure infrastructure and administration, ensuring efficient and scalable operations
Skilled in complex ML algorithms (SVM, Bagging, Boosting, Decision Trees, Random Forests, PCA, LDA, Naive Bayes) and Deep Learning (ANN, RNN, CNN, LSTM, GRU)
Big Data Tools: Hadoop, Hive, Spark, Pig, Sqoop, HBase, MongoDB

Version Control: GIT Architecture: Relational DBMS, Client-server architecture, OLTP, OLAP
Successfully led and managed teams of all sizes, fostering collaboration, guiding team members, and ensuring the timely delivery of high-quality results
Effectively communicated with stakeholders to understand project requirements, provide updates, and address concerns, ensuring alignment and successful project outcomes
Cloud migration specialist skilled in delivering end-to-end data solutions with expertise in CI/CD infrastructure

Timeline

Principal Data Engineer (Consultant)

THE COCA-COLA COMPANY

03.2023 - Current

Lead Data Engineer (Consultant)

Concord

03.2022 - 03.2023

Lead Data Engineer

Hewlett-Packard Co

12.2016 - 03.2022

Data Engineer

Berkshire Hathaway

02.2015 - 12.2016

Master of Science - Data Science

Northeastern University

Bachelor of Science - Computer Science

ADA University

Nova Guliyev

Summary

Overview

Work History

Principal Data Engineer (Consultant)

Lead Data Engineer (Consultant)

Lead Data Engineer

Data Engineer

Education

Master of Science - Data Science

Bachelor of Science - Computer Science

Skills

Timeline

Principal Data Engineer (Consultant)

Lead Data Engineer (Consultant)

Lead Data Engineer

Data Engineer

Master of Science - Data Science

Bachelor of Science - Computer Science

Similar Profiles

Larry EasonLarry Eason

Luis Diego Barrientos AuldLuis Diego Barrientos Auld

Joshua HughleyJoshua Hughley

Sumit SadanaSumit Sadana

Joy PalmerJoy Palmer