Responsive expert experienced in Data Engineering, ETL processes, monitoring database performance, troubleshooting issues and optimizing database environment. Possesses strong understanding of cloud services, database technologies and optimizing data pipelines to drive data-driven decision-making in a dynamic organization. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.
Cloud Platforms: Google Cloud Platform (GCP), AWS
Over 15 years of experience (5+ years in Data Engineering) in ETL Processes, Data Warehousing, GCP Service (IAM, Google Cloud Storage, Airflow Composer, Data Proc cluster, Dataflow), AWS Services (EC2, Glue, S3, Redshift, Athena, Lambda), Hive, Airflow, Sqoop, Hadoop, SQL-based technologies, Python, PL-SQL.
Migrated on prem data ware house to GCP from SAS, Netezza and Hadoop.
Good hands on with cloud technolgies (GCP and AWS).
Created 100+ Airflow DAGs using many Operators like Python, PythonVirtual & TriggerDagRun, BigqueryInsert Job, CreateDataproc, SubmitJob etc.
Extensively worked on GIT, Jenkins & CI/CD pipelines including .
Developed a Python based ETL Engine to ingest data from AWS S3, AWS CLI, SFTP into AWS Redshift using metadata in AWS RDS which replaced existing ETL tool and reduced license cost (around $70K/year).
Developed a python script to check & fetch latest available data files from SFTP and store the output to AWS S3 creating date folders., Managed Airflow DAG scheduling, monitoring, and error handling to ensure data pipeline reliability and data quality., Developed Apache Airflow DAGs to configure multiple tasks into one complete end-to-end process/job., Automated the Tableau Data sources/workbooks using python TSC module reducing a lot of manual efforts., Developed highly optimized Hive-based fact/dimensional data model using partition/bucket concept & fast processing file types (Parquet, ORC etc.), Developed interactive parametrized business level metrics dashboard using Tableau, MSSQL & Azure Databricks for US Healthcare based RCM Project., User segmentation/Fraud Analytics using business models like RFM & Cohort Analysis.
Google Cloud Associate
Google Cloud Associate