Proactive Data Engineering Intern with experience optimizing data pipelines, automating workflows, and deploying real-time streaming platforms. Skilled in Python, Java, Azure, and big data technologies, with a track record of improving data accessibility and processing efficiency. Bringing strong problem-solving abilities and a drive for continuous improvement.
Increased data pipeline efficiency by 20% by optimizing Hive queries for large-scale data warehousing, reducing processing time for daily sales reports.
Automated daily data ingestion pipeline using Airflow, ensuring timely data availability for machine learning models, resulting in a 10% improvement in model accuracy.
Designed and deployed a real-time Kafka streaming platform to capture website clickstream data, enabling real-time customer behavior analysis and personalization.
constructed interactive Power BI dashboards for senior management, reducing report generation time and improving data accessibility.
Calculated and managed Azure Data Lake Storage for scalable storage of structured, semi-structured and unstructured data.
Facilitated cross-functional collaboration by integrating Azure DevOps with collaboration tools, a 30% improvement in communication and information sharing across development teams.
Migrated existing batch processing pipelines to Databricks, Leveraging its cloud-based scalability to handle increasing data volumes and reduce on-premise hardware costs.
Enhanced data pipeline efficiency by 15% through code optimization and leveraging PySpark's in memory processing capabilities.