Summary

Overview

Work History

Education

Skills

Timeline

PAVAN SRI HARSHA

Dallas,TX

Summary

Around 8 years of professional experience as a Azure Data Engineer , involved in developing, implementing, configuring Hadoop ecosystem components on Linux environment, development and maintenance of various applications using Python,

Developing strategic methods for deploying Big data technologies to efficiently solve Big Data processing requirement

Having good knowledge in Python programming language

Having good experience in Azure data bricks , Azure data lake,Azure Data Factory,GCP,Azure Synapse Analytics,Azure storage services such as Blob and Azure Key vault in Azure Credential Management protect access to applications and resources across the corporate data center and into the cloud with Identity and Access Management experience in Hadoop eco system components such as HDFS, MapReduce, Pig, Hive and Sqoop

Good understanding in processing of real-time data using Spark Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop Experience of working closely with various departments, conducting requirements workshops, documenting requirements specifications and developing complex ETL logic to load Managements Information marts and build Cubes for data analysis and build business critical report with Power BI.

Experience of working with complex data sets, carrying out data analysis to understand the relationships, anomalies, patterns etc. to provide robust valuable insights as well as data integration and reporting solutions to various departments in the business.

Successfully designed and delivered multiple projects that involved working with large data volumes on a variety of platforms such as SQL Server, Hadoop, Teradata, etc.

Proficient in writing complex SQL queries, building ETL packages with SSIS, building ETL pipelines with Azure Data Factory, Azure Databricks and Python code.

Experience of working with non-structured data like xml and JSON. Worked in Agile methodology and waterfall delivery process.

Overview

years of professional experience

Work History

Azure Data Engineer

United Healthcare Group

09.2023 - 01.2024

· Primarily involved in Data Migration using SQL, SQL Azure, Azure Data Lake, and Azure Data Factory, GCP.

· As a support to architect, designed administrated and build analytic tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.

· Working with Source team to extract the data and it will be loaded in the ADLS Creating the linked service for source and target connectivity Based on the requirement.

· Once created, pipelines and datasets are triggered based on LOAD (History/Delta) operations.

· Based on source (big or small) data loaded files will be processed in Azure Databricks by applying operations in Spark SQL which will be deployed through Azure Data Factory pipelines.

· Involved in deploying the solutions to QA, DEV and PROD in azure Devops environment connecting through power shell to Azure.

· Professional in creating a data warehouse, design-related extraction, loading data functions, testing designs, data modeling, and ensure the smooth running of applications.

· Responsible for extracting the data from OLTP and OLAP using Azure Data factory and Databricks to Data Lake.

· Used Azure Databricks notebook to extract the data from Data Lake and load it into Azure, and On-prem SQL database.

· Worked with complete architecture and large data sets and high-capacity big data processing platform, SQL and Data Warehouse projects, Azure Synapse Analytics.

· Developed pipelines that can extract data from various sources and merge into single source datasets in Data Lake using Databricks.

· Generate and request certificates from trusted certificate authorities (CAs) or Azure Key Vault.

· Ensuring that the certificate's purpose (e.g., SSL/TLS, authentication) and key type (RSA, ECDSA) align with your data engineering needs.

· Store certificates securely to prevent unauthorized access. Azure Key Vault is a secure and centralized service for managing certificates and cryptographic keys.

· Deploy certificates to relevant Azure resources, such as virtual machines, Azure Kubernetes Service (AKS) clusters, or Azure App Service instances.

· Utilize Azure Automation or infrastructure-as-code tools (like Azure Resource Manager templates) for consistent and repeatable deployments.

· Utilize Azure RBAC (Role-Based Access Control) to grant the minimum required permissions to users and applications.

· Utilize Azure Private Link and Private Endpoints to ensure that communication between services and resources within Azure environment remains within the Azure network and is not exposed to the public internet.

· Implement multi-factor authentication for accessing the Azure Key Vault to enhance the security of certificate management.

Azure Data Engineer

Microsoft Corporation

12.2021 - 06.2023

Based on source (big or small) data loaded files will be processed in Azure Databricks by applying operations in Spark SQL which will be deployed through Azure Data Factory pipelines
Involved in deploying the solutions to QA, DEV and PROD in azure Devops environment connecting through power shell to azure
Professional in creating a data warehouse, design-related extraction, loading data functions, testing designs, data modeling, and ensure the smooth running of applications
Responsible for extracting the data from OLTP and OLAP using Azure Data factory and Databricks to Data lake
Used Azure Databricks notebook to extract the data from Data lake and load it into Azure and On-prem SQL database
Worked with complete architecture and large data sets and high capacity big data processing platform, SQL and Data Warehouse projects, Azure Synapse Analytics
Developed pipelines that can extract data from various sources and merge into single source datasets in Data lake using Databricks.

Big Data Engineer Spark and Hadoop

Snug It Solutions

01.2017 - 10.2021

Developed PySpark pipelines which transforms the raw data from several formats to parquet files for consumption by downstream system
Used Spark Streaming to receive real-time data from the Kafka and store the stream data to HDFS using Python
Collected data using Spark Streaming from HDFS storage account in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS
Built a POC on Spark jobs using python to load data from HDFS to Target DB
Loading data from different sources to a data warehouse to perform some data aggregations for business Intelligence using python
Design and build data structures on data warehouse and data processing using Pyspark with hortonworks platform to provide efficient reporting and analytics capability
Bring structure to large quantities of data to make analysis possible, extract meaning from data
Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2
Used Sqoop to channel data from different sources of HDFS and RDBMS
Developed Spark applications and Spark-SQL for data extraction, transformation and aggregation from multiple file formats
Used SSIS to build automated multi-dimensional cubes.

Big Data Developer | Hadoop | SQL

Micromax

06.2015 - 12.2016

Develop database management systems for easy access, storage, and retrieval of data
Perform DB activities such as indexing, performance tuning, and backup and restore
Written new procedures by using Pl/SQL Collections and Exceptions
Written new functions, and triggers as per business requirements
Wrote Complex SQL Queries using joins, Subqueries, and analytical functions to retrieve data from the database
Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP
Involved in creating UNIX shell Scripting
Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency
Expertise in writing HadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language) and custom MapReduce programs
Performed pig script which picks the data from one HDFS path and performs aggregation and loads into another path which later pulls populates into another domain.

Education

Masters in CS -

Gannon University

Erie, PA

Bachelors in CS -

RAMCO College of Engineering

Skills

Azure data Factory
Azure Databricks
Snowflake
Scala
Power BI
AWS
GCP
Data Lake
Blob Storage Systems
Pyspark

Python
Hadoop
Hive
SSiS/SSAS
SQl
Terradata
Azure Machine learning
Mongodb
Powershell scripting

Timeline

Azure Data Engineer

United Healthcare Group

09.2023 - 01.2024

Azure Data Engineer

Microsoft Corporation

12.2021 - 06.2023

Big Data Engineer Spark and Hadoop

Snug It Solutions

01.2017 - 10.2021

Big Data Developer | Hadoop | SQL

Micromax

06.2015 - 12.2016

Masters in CS -

Gannon University

Bachelors in CS -

RAMCO College of Engineering

PAVAN SRI HARSHA

Summary

Overview

Work History

Azure Data Engineer

Azure Data Engineer

Big Data Engineer Spark and Hadoop

Big Data Developer | Hadoop | SQL

Education

Masters in CS -

Bachelors in CS -

Skills

Timeline

Azure Data Engineer

Azure Data Engineer

Big Data Engineer Spark and Hadoop

Big Data Developer | Hadoop | SQL

Masters in CS -

Bachelors in CS -

Similar Profiles

Shar'Danae WoodardShar'Danae Woodard

Allison GonzalezAllison Gonzalez

Nickesha CowanNickesha Cowan

Lorinda MakinsonLorinda Makinson

Charmagne N. GelseyCharmagne N. Gelsey