
Accomplished Data Engineer with over 5 years of experience leveraging Python, R, SQL, AWS, and Tableau to drive data-driven decision-making and enhance healthcare outcomes. Designed and implemented scalable data lake solutions using AWS Lake Formation, Azure Data Lake, and Google Cloud Storage, enabling 40% faster data processing and real-time analytics for business decision-making. Skilled in creating detailed JIRA stories, Visio diagrams, and PowerPoint presentations to facilitate project execution. Adept at working with AI/ML modules, risk management frameworks, and backend SQL-based systems in banking and financial environments. Engineered distributed data pipelines with Apache Spark, Databricks, and Hadoop HDFS, ensuring efficient processing of terabyte-scale datasets with advanced fault-tolerance mechanisms. Proficient in Python, SQL, and TypeScript, with expertise in version control systems like Git. Experienced in designing and maintaining ETL pipelines using Apache Spark, AWS Glue, and Airflow. Developed and optimized serverless ETL workflows using AWS Glue, Azure Data Factory, and GCP Dataflow, achieving high data quality and seamless integration of structured and unstructured data sources. Proficient in database design, administration, and optimization using NoSQL platforms (MongoDB, DynamoDB) and SQL databases (SQL Server, Snowflake, PostgreSQL) to ensure efficient, scalable data storage solutions. Expert in T-SQL development, including writing complex stored procedures, triggers, views, and performance tuning queries to optimize transaction processing in SQL Server. Implemented event-driven architectures with Apache Kafka, AWS Kinesis, and Azure Event Hubs, enabling real-time data streaming and analytics pipelines for critical applications. Enhanced data governance and schema management with Delta Lake, Apache Iceberg, and Hive Metastore, ensuring reliable and versioned data pipelines. Automated data ingestion pipelines using PySpark, Airflow DAGs, Terraform, and SSIS, reducing operational overhead and improving workflow efficiency. Optimized SQL query performance through advanced techniques like indexing, partitioning, and query refactoring in SQL Server, Redshift, and BigQuery, ensuring minimal query latency. Built dynamic data visualization dashboards with Tableau, Power BI, and Python libraries (Matplotlib, Plotly, Seaborn) to support actionable insights and informed decision-making. Applied Natural Language Processing (NLP) and Transformer-based models (BERT, GPT) to process unstructured text data, enabling advanced analytics in sentiment analysis and document automation workflows. Designed and deployed cloud-native architectures using AWS S3, Google BigQuery, Azure Synapse Analytics, and Terraform, enabling seamless data pipeline integration across hybrid cloud environments. Architected and optimized data warehouse solutions with Snowflake and Azure Synapse, implementing dimensional modeling, partitioning, and OLAP for large-scale analytical operations. Collaborated with DevOps teams to establish CI/CD pipelines for data engineering workflows using GitLab CI, Jenkins, Docker, Kubernetes, and Ansible, improving deployment reliability and scalability. Built real-time stream processing systems using Apache Flink, Spark Streaming, and Kafka Streams, ensuring sub-second latency for time-sensitive data analytics. Developed end-to-end machine learning model pipelines with tools like MLflow, TensorFlow Extended (TFX), and Kubeflow, integrating predictive capabilities into production environments.