Data Engineer with 5 years of experience specializing in building scalable ETL pipelines, data processing, and cloud data solutions using AWS, Azure, Python, PySpark, SQL, and Databricks. I am proficient in containerization, CI/CD, business intelligence, and data warehousing with tools like Tableau, Power BI, and Kubernetes.
ICare Application, 2022, Built a robust ETL pipeline using PySpark and SQL to process large volumes of semi-structured patient data (e.g., appointment records,) collected from simulated healthcare sources., Developed scalable ETL pipelines using PySpark and SQL to process large volumes of semi-structured healthcare data, including patient records, diagnostics, and vital stats, ensuring high data quality and consistency., Ingested and validated raw CSV/JSON data from local storage and AWS S3, applying schema checks and automated data quality rules to cleanse and standardize healthcare information., Created interactive Tableau dashboards for healthcare insights such as chronic condition tracking, readmission trends and physician workload metrics, supporting data-driven clinical decisions. Electronic Health Record, 2019, 2020, Designed and implemented an AI/ML-based prediction algorithm to assess heart disease risk, applying statistical analysis on patient datasets, with a targeted focus on individuals aged 45+, identifying a 29.5% risk threshold., Managed and stored sensitive health records securely in Azure Blob Storage, enabling scalable and cloud-native storage for high-volume healthcare datasets., Developed optimized SQL stored procedures to support fast and reliable querying of patient data, maintaining data integrity and performance across the application.
Python, SQL, Shell scripting, JavaScript, PySpark, Pandas, Apache Kafka, Apache Airflow, Data Transformation (UDFs, Spark DataFrames), ETL processes, AWS (S3, EC2, Glue, Lambda, EventBridge), Azure (Blob Storage, Data Factory, Synapse Analytics), Snowflake, PostgreSQL, MySQL, Oracle DB, MongoDB, Cosmos DB, AWS EMR, Data Lake, Data Warehouse, Azure SQL Database, Docker, Kubernetes (EKS), Helm, Terraform, Git, GitHub, Jenkins, GitHub Actions, CI/CD Pipelines, Tableau, Power BI, Prometheus, Grafana (Reporting, Monitoring & Visualization)