Summary
Overview
Work History
Education
Skills
Timeline
Generic

PAVAN SRI HARSHA

Dallas,TX

Summary

Around 8 years of professional experience as a Azure Data Engineer , involved in developing, implementing, configuring Hadoop ecosystem components on Linux environment, development and maintenance of various applications using Python,

Developing strategic methods for deploying Big data technologies to efficiently solve Big Data processing requirement

Having good knowledge in Python programming language

Having good experience in Azure data bricks , Azure data lake,Azure Data Factory,GCP,Azure Synapse Analytics,Azure storage services such as Blob and Azure Key vault in Azure Credential Management protect access to applications and resources across the corporate data center and into the cloud with Identity and Access Management experience in Hadoop eco system components such as HDFS, MapReduce, Pig, Hive and Sqoop

Good understanding in processing of real-time data using Spark Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop Experience of working closely with various departments, conducting requirements workshops, documenting requirements specifications and developing complex ETL logic to load Managements Information marts and build Cubes for data analysis and build business critical report with Power BI.

Experience of working with complex data sets, carrying out data analysis to understand the relationships, anomalies, patterns etc. to provide robust valuable insights as well as data integration and reporting solutions to various departments in the business.

Successfully designed and delivered multiple projects that involved working with large data volumes on a variety of platforms such as SQL Server, Hadoop, Teradata, etc.

Proficient in writing complex SQL queries, building ETL packages with SSIS, building ETL pipelines with Azure Data Factory, Azure Databricks and Python code.

Experience of working with non-structured data like xml and JSON. Worked in Agile methodology and waterfall delivery process.

Overview

9
9
years of professional experience

Work History

Azure Data Engineer

United Healthcare Group
09.2023 - 01.2024

· Primarily involved in Data Migration using SQL, SQL Azure, Azure Data Lake, and Azure Data Factory, GCP.

· As a support to architect, designed administrated and build analytic tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.

· Working with Source team to extract the data and it will be loaded in the ADLS Creating the linked service for source and target connectivity Based on the requirement.

· Once created, pipelines and datasets are triggered based on LOAD (History/Delta) operations.

· Based on source (big or small) data loaded files will be processed in Azure Databricks by applying operations in Spark SQL which will be deployed through Azure Data Factory pipelines.

· Involved in deploying the solutions to QA, DEV and PROD in azure Devops environment connecting through power shell to Azure.

· Professional in creating a data warehouse, design-related extraction, loading data functions, testing designs, data modeling, and ensure the smooth running of applications.

· Responsible for extracting the data from OLTP and OLAP using Azure Data factory and Databricks to Data Lake.

· Used Azure Databricks notebook to extract the data from Data Lake and load it into Azure, and On-prem SQL database.

· Worked with complete architecture and large data sets and high-capacity big data processing platform, SQL and Data Warehouse projects, Azure Synapse Analytics.

· Developed pipelines that can extract data from various sources and merge into single source datasets in Data Lake using Databricks.

· Generate and request certificates from trusted certificate authorities (CAs) or Azure Key Vault.

· Ensuring that the certificate's purpose (e.g., SSL/TLS, authentication) and key type (RSA, ECDSA) align with your data engineering needs.

· Store certificates securely to prevent unauthorized access. Azure Key Vault is a secure and centralized service for managing certificates and cryptographic keys.

· Deploy certificates to relevant Azure resources, such as virtual machines, Azure Kubernetes Service (AKS) clusters, or Azure App Service instances.

· Utilize Azure Automation or infrastructure-as-code tools (like Azure Resource Manager templates) for consistent and repeatable deployments.

· Utilize Azure RBAC (Role-Based Access Control) to grant the minimum required permissions to users and applications.

· Utilize Azure Private Link and Private Endpoints to ensure that communication between services and resources within Azure environment remains within the Azure network and is not exposed to the public internet.

· Implement multi-factor authentication for accessing the Azure Key Vault to enhance the security of certificate management.

Azure Data Engineer

Microsoft Corporation
12.2021 - 06.2023


  • Based on source (big or small) data loaded files will be processed in Azure Databricks by applying operations in Spark SQL which will be deployed through Azure Data Factory pipelines
  • Involved in deploying the solutions to QA, DEV and PROD in azure Devops environment connecting through power shell to azure
  • Professional in creating a data warehouse, design-related extraction, loading data functions, testing designs, data modeling, and ensure the smooth running of applications
  • Responsible for extracting the data from OLTP and OLAP using Azure Data factory and Databricks to Data lake
  • Used Azure Databricks notebook to extract the data from Data lake and load it into Azure and On-prem SQL database
  • Worked with complete architecture and large data sets and high capacity big data processing platform, SQL and Data Warehouse projects, Azure Synapse Analytics
  • Developed pipelines that can extract data from various sources and merge into single source datasets in Data lake using Databricks.

Big Data Engineer Spark and Hadoop

Snug It Solutions
01.2017 - 10.2021
  • Developed PySpark pipelines which transforms the raw data from several formats to parquet files for consumption by downstream system
  • Used Spark Streaming to receive real-time data from the Kafka and store the stream data to HDFS using Python
  • Collected data using Spark Streaming from HDFS storage account in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS
  • Built a POC on Spark jobs using python to load data from HDFS to Target DB
  • Loading data from different sources to a data warehouse to perform some data aggregations for business Intelligence using python
  • Design and build data structures on data warehouse and data processing using Pyspark with hortonworks platform to provide efficient reporting and analytics capability
  • Bring structure to large quantities of data to make analysis possible, extract meaning from data
  • Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2
  • Used Sqoop to channel data from different sources of HDFS and RDBMS
  • Developed Spark applications and Spark-SQL for data extraction, transformation and aggregation from multiple file formats
  • Used SSIS to build automated multi-dimensional cubes.

Big Data Developer | Hadoop | SQL

Micromax
06.2015 - 12.2016
  • Develop database management systems for easy access, storage, and retrieval of data
  • Perform DB activities such as indexing, performance tuning, and backup and restore
  • Written new procedures by using Pl/SQL Collections and Exceptions
  • Written new functions, and triggers as per business requirements
  • Wrote Complex SQL Queries using joins, Subqueries, and analytical functions to retrieve data from the database
  • Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP
  • Involved in creating UNIX shell Scripting
  • Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency
  • Expertise in writing HadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language) and custom MapReduce programs
  • Performed pig script which picks the data from one HDFS path and performs aggregation and loads into another path which later pulls populates into another domain.

Education

Masters in CS -

Gannon University
Erie, PA

Bachelors in CS -

RAMCO College of Engineering

Skills

  • Azure data Factory
  • Azure Databricks
  • Snowflake
  • Scala
  • Power BI
  • AWS
  • GCP
  • Data Lake
  • Blob Storage Systems
  • Pyspark
  • Python
  • Hadoop
  • Hive
  • SSiS/SSAS
  • SQl
  • Terradata
  • Azure Machine learning
  • Mongodb
  • Powershell scripting

Timeline

Azure Data Engineer

United Healthcare Group
09.2023 - 01.2024

Azure Data Engineer

Microsoft Corporation
12.2021 - 06.2023

Big Data Engineer Spark and Hadoop

Snug It Solutions
01.2017 - 10.2021

Big Data Developer | Hadoop | SQL

Micromax
06.2015 - 12.2016

Masters in CS -

Gannon University

Bachelors in CS -

RAMCO College of Engineering
PAVAN SRI HARSHA