Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
Web_Scraping_Code.ipynb		Web_Scraping_Code.ipynb

README.md

🎓 Web Scraping Multimedia Content

Name	Matric
ADAM WAFII BIN AZUAR	A20EC0003
AHMAD MUHAIMIN BIN AHMAD HAMBALI	A20EC0006
FARAH IRDINA BINTI AHMAD BAHARUDIN	A20EC0035
MUHAMMAD DINIE HAZIM BIN AZALI	A20EC0084
MIKHEL ADAM BIN MUHAMMAD EZRIN	A20EC0237

Contents📝

📑Code

1. Introduction

This assignment involves the use of the Pixabay API to retrieve 100 images based on a keyword entered by the user. The retrieved images will be stored in MongoDB along with their metadata. The process involves obtaining an API key from Pixabay, prompting the user to input a keyword, making a request to the Pixabay API using the API key and the user's keyword, parsing the JSON object returned by the API to extract relevant information, establishing a connection to MongoDB using PyMongo, creating a new database and collection to store image metadata, and creating a new document for each image in the collection. Finally, the images can be displayed to the user using a library such as Pillow or OpenCV. This assignment provides an opportunity to gain experience in working with APIs, parsing JSON data, and using MongoDB to store and retrieve data.

2. Web Scraping using Pixabay

Pixabay is a website that provides a platform for users to access a vast collection of free-to-use multimedia content, including images, videos, and illustrations. It allows users to search for multimedia content by keywords, filter by image or video type, and sort by popularity or date uploaded. The site also provides an easy-to-use editor that allows users to make minor adjustments to images before downloading them.

Pixabay's content is released under a Creative Commons license, which means that users can use the multimedia content without paying any licensing fees or attribution. However, there are some restrictions on the use of the content, and users are encouraged to read the site's terms of service before using any of the multimedia content. Overall, Pixabay is a valuable resource for anyone who needs free-to-use multimedia content for their personal or commercial projects.

Advantages using Pixabay for multimedia content

Wide range of multimedia content: Pixabay offers a vast collection of over 2 million high-quality, royalty-free images, videos, and illustrations. This means that you can easily find multimedia content for almost any type of project, whether it's for personal or commercial use.
Easy to use: Pixabay has a simple and user-friendly interface that allows you to search for multimedia content quickly and easily. You can use keywords to search for specific content, filter results by image or video, and even sort by popular or latest uploads.
High-quality content: All multimedia content on Pixabay is hand-picked and reviewed to ensure that it meets their high-quality standards. This means that you can be confident that the content you find on Pixabay is of excellent quality and will enhance your projects.
Free to use: Pixabay offers all of its multimedia content for free, which is a huge advantage for anyone on a tight budget. You can download and use any multimedia content from the site for personal or commercial use without paying any licensing fees.

Process of web scraping

Install the necessary libraries.
Insert the API key, search keyword, and the number of images to be retrieved.

Pixabay API documentation

Write the coding to web scrape.
Connect to the MongoDB server.

Description of metadata obtained

Field	Description
id	A unique identifier for this image.
imageWidth	Width of the image.
imageHeight	Height of the image.
previewWidth	Width of the preview image.
previewHeight	Height of the preview image.
webformatWidth	Width of the web format image.
webformatHeight	Height of the web format image.
imageSize	Size of the image.
type	Type of file.
tags	Keyword that describes the image.
view	Total number of views.
downloads	Total number of downloads.
likes	Total number of likes.
comments	Total number of comments.
user_id, user	User ID and name of the contributor.
pageURL	Source page on Pixabay, which provides a download link for the original image of the dimension imageWidth x imageHeight and the file size imageSize.
previewURL	Low resolution images with a maximum width or height of 150 px (previewWidth x previewHeight).
userImageURL	Profile picture URL (250 x 250 px).
webformatURL	Medium sized image with a maximum width or height of 640 px (webformatWidth x webformatHeight).

3. Choosing a Library for Web Scraping

We are working on a web scraping project using the Pixabay API. To implement the project, we have chosen to use several libraries, including Request, PyMongo, and OS. When selecting libraries for web scraping, we consider several factors to ensure that they meet our needs. Firstly, ease of use is essential for us, as some members of the team are relatively new to web scraping. We need libraries with clear documentation and simple syntax that we can all easily understand. Performance is also critical since we're working with a large amount of data. We require libraries that can handle large datasets quickly and efficiently to save us time and improve the accuracy of our results. Compatibility with other libraries and programming languages is another key factor to consider. We want to make sure that the libraries we choose work seamlessly with the other tools we're using to avoid compatibility issues. Community support is also an important consideration for us. We want to be part of a community of developers who can provide guidance and support when we encounter issues or need advice. This way, we can learn from others' experiences and collaborate on projects with like-minded individuals. Lastly, legal considerations are critical to ensure that we're complying with any relevant terms of service or legal requirements. We want to avoid any legal issues that could arise from using certain libraries or tools for web scraping. Overall, by carefully considering the above factors, we have selected the Request, PyMongo, and OS libraries to implement our web scraping project using the Pixabay API. These libraries meet our needs for ease of use, performance, compatibility, community support, and legal considerations, making them ideal for our project.

4. Storing Data in MongoDB

MongoDB provides several benefits when it comes to storing multimedia content data. One key advantage is its ability to handle unstructured data, such as multimedia files, in a flexible and scalable manner. This makes it easier to manage large volumes of multimedia content, and also allows for easy updates and modifications to the data. In relation to the previous summary, MongoDB can be used to store image metadata retrieved from the Pixabay API. The best way to store multimedia content data in MongoDB involves organizing the data into collections based on their type or purpose. Each document within the collection can contain metadata such as the filename, file type, file size, and other relevant information. To query and analyze the data, MongoDB provides a powerful set of tools that allow for filtering, sorting, and aggregation. For instance, one can query and analyze the image metadata stored in MongoDB to identify the most popular tags or image dimensions across all the retrieved images. Overall, MongoDB is an effective solution for storing and managing multimedia content data, offering flexibility, scalability, and powerful querying and analysis capabilities.

5. Conclusion

In conclusion, this assignment provides us with an excellent opportunity to gain practical experience in web scraping using an API and working with MongoDB to store and retrieve data. By retrieving and storing images based on a user's input keyword, we will become familiar with the process of making API requests, parsing JSON data, and establishing a connection to a database using PyMongo. Additionally, this project provides an opportunity to explore different libraries for displaying images such as Pillow or OpenCV. Overall, this project is an excellent way for us to improve our skills in web scraping and data management, which are valuable in various data-driven industries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rivertion

Rivertion

README.md

🎓 Web Scraping Multimedia Content

Contents📝

1. Introduction

2. Web Scraping using Pixabay

Advantages using Pixabay for multimedia content

Process of web scraping

Description of metadata obtained

3. Choosing a Library for Web Scraping

4. Storing Data in MongoDB

5. Conclusion

Files

Rivertion

Directory actions

More options

Directory actions

More options

Latest commit

History

Rivertion

Folders and files

parent directory

README.md

🎓 Web Scraping Multimedia Content

Contents📝

1. Introduction

2. Web Scraping using Pixabay

Advantages using Pixabay for multimedia content

Process of web scraping

Description of metadata obtained

3. Choosing a Library for Web Scraping

4. Storing Data in MongoDB

5. Conclusion