Accomplished Data Engineer with 4 years of experience building scalable data pipelines, ETL workflows, and real-time streaming systems across AWS and Azure. Proficient in Apache Spark, Kafka, Airflow, and cloud-native services (AWS Glue, Azure Data Factory). Expertise in Redshift, Synapse, and Power BI for data warehousing and BI. Experienced with HIPAA/GDPR-compliant architectures and cross-functional collaboration to deliver data-driven insights.
Programming Languages:
Python, SQL, Shell Scripting, R
Cloud Platforms:
AWS (Glue, Lambda, Redshift, S3, Kinesis, API Gateway),
Azure (Data Factory, Synapse Analytics, Databricks, Data Lake Storage, Event Hubs, Stream Analytics, Active Directory, Security Center)
Big Data & ETL Frameworks:
Apache Spark (PySpark), Apache Kafka, Apache Airflow, AWS Glue, Azure Data Factory, Change Data Capture (CDC), WTX
Data Warehousing:
Amazon Redshift, Azure Synapse, PostgreSQL, SQL Server, Oracle SQL
Streaming & Real-Time Processing:
Apache Kafka, Amazon Kinesis, Azure Event Hubs, Azure Stream Analytics
APIs & Microservices:
REST APIs, Python-based Microservices, AWS API Gateway
DevOps & Monitoring:
Git, Linux, Azure Monitor, Airflow Monitoring, Elastic Stack
Security & Compliance:
HIPAA, GDPR, RBAC, Data Encryption, Data Masking, Access Control, Azure Security Center
Business Intelligence & Visualization:
Power BI, Tableau, Excel
Machine Learning & Analytics:
Spark MLlib, Scikit-learn (Random Forest, Logistic Regression), Exploratory Data Analysis (EDA), Anomaly Detection