· Analytical and process-oriented Data Engineer with 3 years of experience in building scalable data solutions and real-time analytics pipelines, with expertise in Azure Data Services, Kafka, Spark, PySpark, SQL, and Power BI, leveraging Agile and Waterfall methodologies to deliver actionable insights.
· Built and orchestrated containerized data infrastructure using Docker Compose with services like Zookeeper, Kafka, Spark (Master/Workers), and Cassandra, enabling efficient development, testing, and streaming of event-driven data pipelines.
· Developed streaming data pipelines using Apache Kafka and Azure Event Hubs, integrating with Spark Structured Streaming and Azure Func tions to process and transform high-throughput event data in real-time for downstream analytics and storage.
· Engineered scalable ETL workflows on Azure Databricks using PySpark and SQL, performing large-scale batch and streaming transformations on structured and semi-structured data, and storing results in Azure Synapse and Data Lake Storage.
· Delivered real-time dashboards in Power BI by integrating with Microsoft Fabric and streaming sources like Kafka and Event Hubs, allowing business stakeholders to monitor operational metrics with minimal latency and drive timely decision-making.
· Worked through the entire data science lifecycle by performing exploratory data analysis (EDA), feature engineering, and dimensionality reduction using PCA, and then training and evaluating machine learning models using Apache Spark MLlib, effectively bridging data engineering with model deployment in distributed environments.