Around 3.5 years of professional experience as a Azure Data Engineer , involved
in developing, implementing, configuring Hadoop ecosystem components on Linux environment,
development and maintenance of various applications using Python, developing
strategic methods for deploying Big data technologies to efficiently solve Big Data processing requirement
Having good knowledge in Python programming language
Having good experience in Azure data bricks , Azure
data lake,Azure Data Factory,Azure Synapse Analytics,Azure storage services such as Blob and Azure Key vault in Azure Credential Management
Experience of working closely with various departments, conducting requirements workshops
documenting requirements specifications and developing complex ETL logic to load Managements Information marts and build Cubes for data analysis and build business critical report with Power BI.
Experience of working with complex data sets using Pyspark, carrying out data analysis
to understand the relationships, anomalies, patterns etc. to provide robust valuable insights as well as data integration and reporting solutions to various departments in the business.
Knowledge of Master Data Management (MDM) and Data Quality tools and processes
Strong team collaboration and experience working with remote teams.
Knowledge of Dev-Ops processes (including CI/CD) and Infrastructure as code fundamentals.
Working experience with Visual Studio, PowerShell Scripting, and ARM templates.
Proficient in writing complex SQL queries, building ETL pipelines with Azure Data Factory,
Azure Databricks and Python code.
Have knowledge on Big data ecosystem on Hadoop HDFS, HIVE, got good experience in building data marts with Hive and loading data and writing Hive queries.
Identified the dimension, fact tables and designed the data warehouse using star schema
Overview
3
3
years of professional experience
1
1
Certification
Work History
Azure Data Engineer
United Healthcare
12.2022 - 03.2024
Primarily involved in Data Migration using SQL, SQL Azure, Azure Data lake and Azure Data Factory,GCP
As a support to Architect designed administrated and Build analytic tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational
efficiency and other key business performance metrics
Working with Source team to extract the data and it will be loaded in the ADLS
Creating the linked service for source and target connectivity Based on the requirement
Once it's created pipelines and datasets will be triggered based on LOAD (History/Delta) operations
Based on source (big or small) data loaded files will be processed in Azure Databricks
by applying operations in Spark SQL which will be deployed through Azure Data Factory pipelines
Involved in deploying the solutions to QA, DEV and PROD in azure Devops environment
connecting through power shell to azure
Professional in creating a data warehouse, design-related extraction, loading data
functions, testing designs, data modeling, and ensure the smooth running of applications
Responsible for extracting the data from OLTP and OLAP using Azure Data factory and Databricks to Data lake
Used Azure Databricks notebook to extract the data from Data lake and load it into Azure and On-prem SQL database
Worked with complete architecture and large data sets and high capacity big data processing platform, SQL and Data Warehouse projects,Azure Synapse Analytics
Developed pipelines that can extract data from various sources and merge into single source datasets in Data lake using Databricks
Big Data Engineer
Snug IT Sols
01.2021 - 11.2022
Developed PySpark pipelines which transforms the raw data from several formats to parquet files for consumption by downstream system
Used Spark Streaming to receive real-time data from the Kafka and store the stream data to HDFS using Python
Collected data using Spark Streaming from HDFS storage account in near- real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS Built a POC on Spark jobs using python to load data from HDFS to Target DB
Loading data from different sources to a data warehouse to perform some data aggregations for business Intelligence using python
Design and build data structures on data warehouse and data processing using Pysparkwith hortonworks platform to provide efficient reporting and analytics capability
Bring structure to large quantities of data to make analysis possible, extract meaning from data
Designed and implemented Sqoop for the incremental job to read data from DB2 and
load to Hive tables and connected to Tableau for generating interactive reports using Hive server2
Used Sqoop to channel data from different sources of HDFS and RDBMS
Developed Spark applications and Spark-SQL for data extraction, transformation and aggregation from multiple file formats
Used SSIS to build automated multi-dimensional cubes
Used Spark Streaming to receive real-time data from the Kafka and store the stream data to HDFS using Python
Collected data using Spark Streaming from Azure storage account in near- real-time and performs necessary Transformations and Aggregation on the fly to build the common
learner data model and persists the data in HDFS Installing, configuring and maintaining Data pipelines
Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap
Files extracted from Hadoop and transformed using hive and dropped on daily hourly basis into hdfs
Authoring Python(PySpark) Scripts for custom UDF's for Row/ Column manipulations,
merges, aggregations, stacking, data labeling and for all Cleaning and conforming tasks
Education
Master of Science - Computer And Information Sciences
University of North Texas
TX
Bachelor of Science - Computer Science And Engineering
Adisankara Institute of Technology
Skills
Apache Spark
Azure Data Factory
Data Bricks
Git
Power BI
KSQL
SQL
Python
Pandas
ARM Templates
Data Lakes
Azure Synapse Analytics
Certification
Microsoft Certified Azure data engineer Associate DP 203
Microsoft Certified Azure Data Architect AZ104
Timeline
Azure Data Engineer
United Healthcare
12.2022 - 03.2024
Big Data Engineer
Snug IT Sols
01.2021 - 11.2022
Master of Science - Computer And Information Sciences
University of North Texas
Bachelor of Science - Computer Science And Engineering