An aspiring Data/Cloud Engineer, ETL architect and visualization Expert with over 4 years of experience in Banking, Retail and Healthcare domains. A team player attitude with effective communication, geared to increase collaboration and team spirit. A Strong Education and hands-on experience with aided technical understanding to support team's progress. Experience working in distributed environments and culturally diverse team dynamics.
Overview
7
7
years of professional experience
Work History
Data/Cloud Engineer
United Health Group- Optum (Remote)
08.2022 - Current
Developed and maintained data pipelines on Google Cloud Platform (GCP) using services such as Google Cloud Storage, Big Query, and Dataflow
Utilized SQL and NoSQL databases to extract, transform, and load (ETL) data from various sources into GCP for analysis
Collaborated with cross-functional teams to understand data requirements and design efficient data solutions
Implemented data processing tasks using Apache Spark and Scala, optimizing performance and scalability
Managed and monitored data workflows, ensuring reliability and data integrity throughout the pipeline
Conducted troubleshooting and performance tuning to improve data processing efficiency
Implemented data quality checks and validation processes to ensure accuracy, completeness, and consistency of data
Collaborated with cross-functional teams to establish data quality metrics and implement automated monitoring solutions
Created and maintained documentation for data pipelines, configurations, and best practices
Provided technical support and training to junior team members on GCP, Sql, PySpark, Hive and Unix utilities
Developed a real-time data analysis dashboard using Google Cloud Platform services
Utilized code and modern cloud-native deployment techniques to design, plan and integrate cloud computing and virtualization systems.
Implemented data ingestion, processing, and visualization using Apache Spark, Scala, and Google Data Studio
Analyzed streaming data from multiple sources to provide insights and visualization in near real-time.
Big Data Developer
Citi Bank
07.2021 - 07.2022
Developed and maintained Spark ETL Pipelines in transforming/cleaning/massaging data using Spark SQL, PySpark, Hive and further analyzing data to predict customer interests in Home lending interest rates
Extensively used Apache Airflow to monitor and manage hundreds of ETL/ELT pipelines
Used PySpark to process large datasets and further storing datasets/files in HDFS in the form of Parquet/ORC/CSV
Used Kafka as a messenger to transfer billions of events from Home lending application to spark ELT pipelines
Managed to transform incoming XML/JSON files from Kafka and further storing into staging/dev/uat/prod environments
Used DBeaver to connect to data sources like Hive for analyzing data
Extensively used hive Data warehouse to upsert/update data into its databases
Extensively used HBase to store and analyze NoSQL data coming from ELT pipelines
Used Apache Storm to analyze and transform home lending stream data and batch data on Hadoop
Analyzed large datasets to identify trends and patterns in customer behaviors.
Worked on End Reporting Architecture to build datasets on Tableau/SSRS.
SQL Developer
IBing Software Solutions Private Limited
06.2017 - 10.2019
Migrating data from FS to Snowflake within the organization
Imported Legacy data from SQL Server and Teradata into Amazon S3
Created consumption views on top of metrics to reduce the running time for complex queries
Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3
Compare the data in a leaf level process from various databases when data transformation or data loading takes place
Analyzed and looked into the data quality when these types of loads are done (To look for any data loss, data corruption)
As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake
Developed SQL scripts to Upload, Retrieve, Manipulate and handle sensitive data (National Provider Identifier Data I.e., Name, Address, SSN, Phone No) in Teradata, SQL Server Management Studio and Snowflake Databases for the Project
Worked on to retrieve the data from FS to S3 using spark commands
Implemented Restful services in Spring
Serialize and de-serialize objects using Play Json library
Developing traits and case classes etc in Scala
Develop quality code adhering to Scala coding Standards and best practices
Writing complex SQL queries.
Education
Master of Science - Computer Science
Southeast Missouri State University
Cape Girardeau, MO
05.2021
Bachelors -
Tirumala Engineering College
India
06.2014
Skills
Data Migration
Google Cloud Platform
PySpark
Hive
Microsoft Azure Databricks
MS SQL Server
UNIX/Linux
Git
Timeline
Data/Cloud Engineer
United Health Group- Optum (Remote)
08.2022 - Current
Big Data Developer
Citi Bank
07.2021 - 07.2022
SQL Developer
IBing Software Solutions Private Limited
06.2017 - 10.2019
Master of Science - Computer Science
Southeast Missouri State University
Bachelors -
Tirumala Engineering College
Similar Profiles
Danielle ArnoldDanielle Arnold
Contract Analyst at United Health Group AKA Optum RemoteContract Analyst at United Health Group AKA Optum Remote