Summary
Overview
Work History
Education
Skills
Timeline
Generic

SIVA PARVATHI NABIGANI

Atlanta,GA

Summary

· 4+ years of development experience on cloud platforms AWS, and GCP. Solid experience in building ETL ingestion flows using AWS.

· Experienced in using distributed computing architectures such as AWS products (EC2, Redshift, and EMR, Elastic search, Athena, and Lambda), Hadoop, Python, Spark, and effective use of MapReduce, SQL, and Cassandra to solve big data type problems.


· Hands-on experience in designing and implementing data engineering pipelines and analyzing data using AWS stacks like AWS EMR, AWS Glue, EC2, AWS Lambda, Athena, Redshift, Sqoop, and Hive.

· Experienced in working with structured data using Hive and optimizing Hive queries.

· Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.

· Working experience in migrating several other databases to Snowflake

· Strong experience with architecting highly formant databases using MySQL and MongoDB.

· Extensive experience in loading and analyzing large datasets with the Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, and Sqoop)

· Developed and maintained AWS Glue ETL workflows to process and transform millions of rows of data daily.

· Hands-on experience in application development using Java, RDBMS, and Linux shell scripting and Oriented Programming (OOPs), multithreading in Core Java, JDBC.

· Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.

· Good experience working on analysis tools like Tableau for regression analysis, pie charts, and bar graphs.

Overview

6
6
years of professional experience

Work History

AWS Data Engineer

Care Source
11.2022 - Current
  • Collaborated with cross-functional teams to ensure seamless integration of new data engineering solutions.
  • Optimized data processing by implementing AWS-based big data solutions.
  • Integrated machine learning models into existing workflows to enhance predictive analysis capabilities.
  • Streamlined data storage and retrieval with the design of efficient data models.· Created AWS Glue crawlers and transformers to automatically discover, catalog, and transform data in S3.
  • Developed AWS Glue UDFs to perform custom data transformations, such as data quality checks and feature engineering.
  • Used AWS Glue Studio to build and visualize data pipelines, making it easier to collaborate with stakeholders and monitor pipeline performance.
  • Automated AWS Glue ETL jobs using AWS Step Functions to create complex data processing workflows.
  • Monitored and managed AWS Glue ETL jobs using AWS CloudWatch to ensure reliability and performance.
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
  • Developed Power Pivot/SSRS (SQL Server Reporting Services) Reports and added logos, pie charts, and bar graphs for display purposes as per business needs.
  • Running of Apache Hadoop, CDH, and Map-R distros, dubbed Elastic MapReduce (EMR) on (EC2).
  • migrate data from on-premises databases to Confidential Redshift, RDS, and S3.
  • Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, and S3.

Data Engineer

Cadila Healthcare
08.2020 - 10.2021
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • .Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
  • Used AWS Redshift, S3, Spectrum, and Athena services to query large amounts of data stored on S3 to create a Virtual Data Lake without having to go through the ETL process.
  • Installed and configured Hadoop MapReduce HDFS Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Worked on AWS Cloud to convert all premises, existing processes, and databases to AWS Cloud.
  • Analyzed the SQL scripts and designed the solution to implement using Spark.
  • Developed Python code to gather the data from HBase and designed the solution to implement using PySpark.
  • Worked on importing metadata into Hive using Python and migrated existing tables and the data pipeline from Legacy to AWS cloud (S3) environment and wrote Lambda functions to run the data pipeline in the cloud.
  • Deployed the codes to multiple environments with the help of the CI/CD process worked on code defects during the SIT and UAT testing and provided support to data loads for testing; Implemented reusable components to reduce manual interventions.
  • Developing Spark (Scala) notebooks to transform and partition the data and organize files in ADLS.
  • Using Data bricks utilities called widgets to pass parameters on run time from ADF to Data bricks.
  • Created Triggers, PowerShell scripts, and the parameter JSON files for the deployments.
  • Worked with VSTS for the CI/CD Implementation
  • Implemented End-to-end logging frameworks for Data factory pipelines.


Data Engineer

L&T Technology Services
02.2018 - 07.2020
  • Worked on development of data ingestion pipelines using ETL tool, Talend & bash scripting with big data technologies including but not limited to Hive, Impala, Spark, Kafka, and Talend.
  • Experience in developing scalable & secure data pipelines for large datasets.
  • Gathered requirements for ingestion of new data sources including life cycle, data quality check, transformations, and metadata enrichment.
  • Supported data quality management by implementing proper data quality checks in data pipelines.
  • Delivered data engineer services like data exploration, ad-hoc ingestions, and subject-matter-expertise to Data scientists in using big data technologies.
  • Build machine learning models to showcase big data capabilities using Pyspark and MLlib.
  • Enhancing Data Ingestion Framework by creating more robust and secure data pipelines.
  • Implemented data streaming capability using Kafka and Talend for multiple data sources.
  • Worked with multiple storage formats (Avro, Parquet) and databases (Hive, Impala, Kudu).
  • S3 - Data Lake Management. Responsible for maintaining and handling data inbound and outbound requests through the big data platform.
  • Working knowledge of cluster security components like Kerberos, Sentry, SSL/TLS, etc.
  • Involved in the development of agile, iterative, and proven data modelling patterns that provide flexibility.
  • Knowledge of implementing the JILs to automate the jobs in the production cluster.
  • Troubleshooted user's analyses bugs (JIRA and IRIS Ticket).
  • Worked with the SCRUM team in delivering agreed user stories on time for every Sprint.
  • Worked on analyzing and resolving production job failures in several scenarios.
  • Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.
  • Knowledge on implementing the JILs to automate the jobs in production cluster.

Education

Master of Science - Data Science

Lewis University
Romeoville, IL
05.2023

Bachelor of Science - Electrical, Electronics Engineering

AMRITA UNIVERSITY
COIMBATORE, TAMILNADU
07.2020

Skills

  • CloudFormation Templates
  • Data Modeling Techniques
  • Python Programming
  • Infrastructure as Code
  • Lambda Functions
  • Data Lake Management
  • Big Data Processing
  • Tableau Visualization
  • NoSQL Databases
  • DevOps principles
  • AWS Redshift Expertise
  • SQL Querying
  • Amazon S3 Proficiency
  • AWS Glue Knowledge
  • Real-time Data Streaming
  • PowerBI Reporting
  • ETL Design and Implementation
  • DynamoDB Experience

Timeline

AWS Data Engineer

Care Source
11.2022 - Current

Data Engineer

Cadila Healthcare
08.2020 - 10.2021

Data Engineer

L&T Technology Services
02.2018 - 07.2020

Master of Science - Data Science

Lewis University

Bachelor of Science - Electrical, Electronics Engineering

AMRITA UNIVERSITY
SIVA PARVATHI NABIGANI