Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Harani Arumalla

Irving,TX

Summary

  • Around 3.5 years of professional experience as a Azure Data Engineer , involved
    in developing, implementing, configuring Hadoop ecosystem components on Linux environment,
    development and maintenance of various applications using Python, developing
    strategic methods for deploying Big data technologies to efficiently solve Big Data processing requirement
    Having good knowledge in Python programming language
    Having good experience in Azure data bricks , Azure
    data lake,Azure Data Factory,Azure Synapse Analytics,Azure storage services such as Blob and Azure Key vault in Azure Credential Management
    Experience of working closely with various departments, conducting requirements workshops
    documenting requirements specifications and developing complex ETL logic to load Managements Information marts and build Cubes for data analysis and build business critical report with Power BI.
    Experience of working with complex data sets using Pyspark, carrying out data analysis
    to understand the relationships, anomalies, patterns etc. to provide robust valuable insights as well as data integration and reporting solutions to various departments in the business.
    Knowledge of Master Data Management (MDM) and Data Quality tools and processes
    Strong team collaboration and experience working with remote teams.
    Knowledge of Dev-Ops processes (including CI/CD) and Infrastructure as code fundamentals.
    Working experience with Visual Studio, PowerShell Scripting, and ARM templates.
    Proficient in writing complex SQL queries, building ETL pipelines with Azure Data Factory,
    Azure Databricks and Python code.
    Have knowledge on Big data ecosystem on Hadoop HDFS, HIVE, got good experience in building data marts with Hive and loading data and writing Hive queries.
    Identified the dimension, fact tables and designed the data warehouse using star schema

Overview

3
3
years of professional experience
1
1
Certification

Work History

Azure Data Engineer

United Healthcare
12.2022 - 03.2024
  • Primarily involved in Data Migration using SQL, SQL Azure, Azure Data lake and Azure Data Factory,GCP
    As a support to Architect designed administrated and Build analytic tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational
    efficiency and other key business performance metrics
    Working with Source team to extract the data and it will be loaded in the ADLS
    Creating the linked service for source and target connectivity Based on the requirement
    Once it's created pipelines and datasets will be triggered based on LOAD (History/Delta) operations
    Based on source (big or small) data loaded files will be processed in Azure Databricks
    by applying operations in Spark SQL which will be deployed through Azure Data Factory pipelines
    Involved in deploying the solutions to QA, DEV and PROD in azure Devops environment
    connecting through power shell to azure
    Professional in creating a data warehouse, design-related extraction, loading data
    functions, testing designs, data modeling, and ensure the smooth running of applications
    Responsible for extracting the data from OLTP and OLAP using Azure Data factory and Databricks to Data lake
    Used Azure Databricks notebook to extract the data from Data lake and load it into Azure and On-prem SQL database
    Worked with complete architecture and large data sets and high capacity big data processing platform, SQL and Data Warehouse projects,Azure Synapse Analytics
    Developed pipelines that can extract data from various sources and merge into single source datasets in Data lake using Databricks

Big Data Engineer

Snug IT Sols
01.2021 - 11.2022
  • Developed PySpark pipelines which transforms the raw data from several formats to parquet files for consumption by downstream system
    Used Spark Streaming to receive real-time data from the Kafka and store the stream data to HDFS using Python
    Collected data using Spark Streaming from HDFS storage account in near- real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS Built a POC on Spark jobs using python to load data from HDFS to Target DB
    Loading data from different sources to a data warehouse to perform some data aggregations for business Intelligence using python
    Design and build data structures on data warehouse and data processing using Pysparkwith hortonworks platform to provide efficient reporting and analytics capability
    Bring structure to large quantities of data to make analysis possible, extract meaning from data
    Designed and implemented Sqoop for the incremental job to read data from DB2 and
    load to Hive tables and connected to Tableau for generating interactive reports using Hive server2
    Used Sqoop to channel data from different sources of HDFS and RDBMS
  • Developed Spark applications and Spark-SQL for data extraction, transformation and aggregation from multiple file formats
    Used SSIS to build automated multi-dimensional cubes
    Used Spark Streaming to receive real-time data from the Kafka and store the stream data to HDFS using Python
    Collected data using Spark Streaming from Azure storage account in near- real-time and performs necessary Transformations and Aggregation on the fly to build the common
    learner data model and persists the data in HDFS Installing, configuring and maintaining Data pipelines
    Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap
    Files extracted from Hadoop and transformed using hive and dropped on daily hourly basis into hdfs
    Authoring Python(PySpark) Scripts for custom UDF's for Row/ Column manipulations,
    merges, aggregations, stacking, data labeling and for all Cleaning and conforming tasks

Education

Master of Science - Computer And Information Sciences

University of North Texas
TX

Bachelor of Science - Computer Science And Engineering

Adisankara Institute of Technology

Skills

  • Apache Spark
    Azure Data Factory
    Data Bricks
    Git
    Power BI
    KSQL
    SQL
    Python
    Pandas
    ARM Templates
    Data Lakes
    Azure Synapse Analytics

Certification

Microsoft Certified Azure data engineer Associate DP 203

Microsoft Certified Azure Data Architect AZ104

Timeline

Azure Data Engineer

United Healthcare
12.2022 - 03.2024

Big Data Engineer

Snug IT Sols
01.2021 - 11.2022

Master of Science - Computer And Information Sciences

University of North Texas

Bachelor of Science - Computer Science And Engineering

Adisankara Institute of Technology
Harani Arumalla