Big Data Engineering Bootcamp Internship
[Ineuron]
• Mastered key big data technologies including Apache Spark, Kafka, and MongoDB.
• Developed a comprehensive COVID-19 Data Analysis project:
• Implemented Kafka Producers to consume data from multiple CSV files.
• Utilized MongoDB sink connector in Confluent Kafka to stream data into MongoDB.
• Conducted in-depth data analysis using Apache Spark, establishing a direct connection with MongoDB.
• Created an AWS Data Pipeline project:
• Engineered a Lambda function to fetch data from external APIs and store it in AWS S3.
• Leveraged AWS Glue with Apache Spark to perform complex data transformations.
• Designed and implemented a DynamoDB table structure for storing transformed data.
• Implemented scheduling for the Lambda function to automate regular data downloads.
• Demonstrated proficiency in cloud-based big data solutions, particularly within the AWS ecosystem.
• Gained hands-on experience in data ingestion, streaming, storage, and analysis using industry-standard tools and technologies.
• Project Repository: https://github.com/swaroop201/AWS-DATA-PIPELINE/tree/master/AWS-END-TO-END-PIPELINE-main