Summary
Overview
Work History
Education
Skills
Timeline
Generic

PAWAN NARASIMHA AVVARI

Jersey City

Summary

  • Over 2+ years of experienced and highly skilled data engineer with expertise in AWS, Snowflake, Databricks, and various big data technologies. Throughout career, I have gained extensive experience in analyzing, designing, developing, testing, maintaining, and implementing complex Data Warehousing applications for clients in the banking and financial sectors. Proficiency in using the ETL tool Informatica Power Center in both OLAP and OLTP environments has allowed to successfully deliver efficient and reliable solutions.
  • One of key strengths lies in designing scalable cloud-based data solutions that prioritize data quality. Leveraging deep understanding of AWS services such as S3, Glue, EMR, and Lambda, build robust data pipelines to ensure the integrity of the data. Additionally, have extensive experience utilizing Snowflake for data warehousing and analytics, enabling to deliver high-performance and scalable solutions tailored to each client's unique needs.
  • Furthermore, am proficient in using Databricks for big data processing, enhancing capabilities in handling large volumes of data efficiently. With a strong problem-solving mindset and a passion for driving data insights and informed decision-making, am committed to delivering exceptional results in the field of data engineering.
  • Experience in Aws Cloud, Snowflake, Data bricks, MySQL, SQL Server, S3 storage, Aws Redshift, Big Data Technologies (Hadoop and Apache Spark), and Aws Sagemaker.
  • Experience in developing, supporting, and maintenance for the ETL (Extract, Transform, and Load) processes using Talend Integration Suite.
  • Experience in developing very complex mappings, reusable transformations, sessions, and workflows using the Informatica ETL tool to extract data from various sources and load it into targets.
  • Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage patterns.
  • Used various file formats like Avro, Parquet, Sequence, JSON, ORC, CSV and text for loading data, parsing, gathering, and performing transformations.
  • Designed and created Hive external tables using a shared meta-store with Static & Dynamic partitioning, bucketing, and indexing.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, and pair RDDs.

Overview

2
2
years of professional experience

Work History

Saven Technologies Limited India -Data Engineer

06.2021 - 08.2022


• Involved writing Pyspark code from Golang logic.

• Developed data ingestion workflows using AWS S3 as data storage and Spark's built-in capabilities to efficiently process large-scale datasets.

• Migrated data processing workflows from AWS EMR to Databricks, utilizing Databricks notebooks and clusters for interactive data exploration, prototyping, and job scheduling.

• Utilized Databricks Delta Lake, optimized data lake solution, to efficiently store and manage large volumes of structured and semi-structured data, ensuring data integrity, reliability, and ACID compliance.

• Collaborated with data scientists and analysts to provide them with reliable and curated datasets in Databricks for advanced analytics, machine learning, and AI model development.

• Implemented security measures in Databricks, such as data encryption, role-based access control, and network isolation, to ensure data privacy and compliance with regulatory requirements.

• Optimized Spark SQL queries in Databricks by analyzing query execution plans, identifying inefficient operations or unnecessary shuffling, and applying appropriate optimizations such as predicate pushdown or join reordering.

• Created external tables in Athena, pointing to data stored in S3, using Glue Data Catalog for metadata management, enabling seamless query access to structured, semi-structured, and unstructured data.

• Developed AWS Kinesis Firehouse and lambda and s3 pipeline to fetch live data from API and store it in s3.

• Involved in developing Spark application to process stored data in S3 and write output back to s3.

• Designed and executed data quality jobs in Databricks using SQL queries, Python, or Scala, leveraging Databricks' distributed computing capabilities to process large volumes of data efficiently.

Saven Technologies Limited India - Data Engineer

10.2020 - 06.2021


• Led successful migration project from AWS Redshift to Snowflake, ensuring seamless transition of data and analytics processes to new platform.

• Conducted thorough assessment of existing AWS Redshift infrastructure and identified opportunities for optimization and improvement in migration process.

• Designed and executed comprehensive migration plan, including data extraction from AWS Redshift, data transformation, and loading into Snowflake, while ensuring data integrity and minimal downtime.

• Developed and executed data migration scripts and processes, utilizing Snowflake's data loading capabilities, such as COPY command and Snow pipe, to efficiently transfer data from AWSRedshift to Snowflake.

• Performed data validation and reconciliation to ensure accuracy and consistency of data after migration, identifying and resolving any discrepancies or anomalies.

• Advanced knowledge of the GCP ecosystem with a focus on Big Query

• Designed and implemented complex data processing workflows in Snowflake, leveraging its powerful SQL capabilities and scalable

architecture to handle large volumes of data.

• Analyze user needs to determine how software should be built or if existing software should be modified.

• Designing and coding Big Query to analyze data collections.

• Apache Spark or Python libraries to perform advanced data processing tasks, including machine learning algorithms, natural language processing, or graph analysis.

• Implemented Snowflake's Time Travel and Fail-safe features to manage and recover from data processing errors, ensuring data integrity and maintaining a reliable and consistent data processing environment.

• Optimized query performance in Snowflake for complex data processing scenarios by analyzing query execution plans, leveraging query hints, and applying optimization techniques such as clustering and partitioning.

• Designed and implemented end-to-end data solutions using AWS S3 as a data lake for storing raw and processed data, EMR for big data processing, Snowflake as data warehouse, and Tableau for data visualization and reporting.

• Developed data ingestion processes using AWS S3 and EMR, leveraging technologies such as Apache Spark to extract, transform, and load data from various sources into Snowflake for further analysis.

• Collaborated with business stakeholders and Tableau developers to understand reporting and visualization requirements, translating them into meaningful visualizations and interactive dashboards that provide actionable insights.

• Implemented data quality checks and validation using AWS Lambda and Airflow to ensure integrity and accuracy of data in S3, EMR, and Snowflake.

• Designed and implemented data archiving and backup strategies using S3 Glacier.

• Experienced in installing, configuring Databricks in AWS and azure

Education

Master of Science - Computer And Information Sciences

Pace University
New York, NY

Skills

Apache Spark

Database Development

Unix Shell

AWS Big Data Stack, Azure (s3, EC2, EMR,Lambda, Glue, Athena, Redshift)

Data warehouse (Snowflake,Redshift)

ETL -Informatica Power Center 961, AWS Glue

RDBMS - Microsoft SQL Server,Oracle11g

Databricks

Visualization (Tableau, Python libraries)

Hadoop Eco System (MapReduce,Hive, Scoop)

Programming languages (Scala,Python, Java, Golang)

Version Control (GitHub)

Qlik sense

Amazon Web Services Architect, covering resources like S3, EC2, IAM, Databases (Dynamo DB, Redshift), VPC, Lambda, Glue, Athena, SQS, SNS, SES, API

Gateway, Kinesis

Teamwork and Collaboration
Multitasking Abilities
Data analysis

Timeline

Saven Technologies Limited India -Data Engineer

06.2021 - 08.2022

Saven Technologies Limited India - Data Engineer

10.2020 - 06.2021

Master of Science - Computer And Information Sciences

Pace University
PAWAN NARASIMHA AVVARI