- Introduction
- Instructions
- Option 1 - Use pre-collected dataset
- Option 2 - Run custom queries to collect data
- Preparing your local environment
- Customising a query
- Working with the Data
- Data structure
- Loading data from a text file
- Access information within the data
- Contributions
- Authour
News API is an amazing source of information as it provides a single API that exposes breaking news headlines and older articles from over 3,000 news sources and blogs!
You can sign up here for a free developer plan which allows you to make 1,000 requets per day (with a limit of 250 requests per every 6 hours). Please be aware that it is possible to accidentally exceed the limit, e.g. by entering an infinite loop whilst making batch requests... 🙈
There are two options for you to work on this topic.
I've made some queries and collected a ready-to-use dataset in a JSON format, which you can find in data.txt
.
It contains English news articles and blog posts that match the query term (women OR gender OR minotiry OR gap) AND tech
, sorted by the relevancy.
You're welcome to download the file (~1MB) and explore the data. For example, you could use the link provided in each article to do a bit of web scraping, analyse what's being said with natural language processing tools and visualise the results.
For a detailed explanation of the data structure, how to load the data from the file etc., pelase see the section Working with the Data.
You can also use the script news_api.py
to make customised queries, should you wish so. It uses Requests under the hood to make HTTP GET requests to the News API endpoints.
In order to start making queries, you need a bit of preparation to get your local environment up and running.
-
Clone the repository
$ git clone [email protected]:womendrivendev/news-api.git
-
Move into the
news-api
directory:$ cd news-api
-
Create a virtual environment to manage local dependencies:
$ virtualenv venv
(make sure you havevirtualenv
installed: documendation) -
Activate the virtual enviromnent just created:
$ source venv/bin/activate
-
Install dependencies:
$ pip install -r requirements.txt
-
Obtain a unique API key from newsapi.org
-
Create a
.env
file under thenews-api
directory It won't work otherwise, unless you manually change the path to the.env
file by modifying theenv_path
variable innews_api.py
. The file structure should look something like the diagram below.news-api/ .env <-- Here! .gitignore README.md news_api.py load_data.py data.txt requirements.txt ...
❗IMPORTANT❗ The
.env
file is ignored in.gitignore
file in the root of the project. This means that it will not get tracked bygit
, and hence will not be checked into your commits. This is important for security purposes, as you never want to expose your credentials to publically available spaces 🙅 -
Paste your API key to the
.env
file# .env API_KEY="YOUR API KEY"
-
Now you're ready to make a customised query using
news_api.py
💥Usage: news_api.py <endpoint> <output_filename> e.g. $ python news_api.py everything test.txt
You have to provide both
<endpoint>
and<output_filename>
. In the next section I explain how to use it in detail.
The News API provides two endpoints for quering articles: top-headlines
and everything
. News API Endpoints explains what each endpoint is best used for. Once you specify the <endpoint>
, the script will enter an interactive mode for you to set query parameters.
Depending on the endpoint you are using i.e. top-headlines
or everything
, there are different sets of parameters you can customise. You can see all the available parameters and their default values here: News API Everything, News API Top headlines.
Below is an example of using the script, where I set the endpoint to everything
.
$ python news_api.py everything test.txt
Constructing query parameters for /everything endpoint...
Documentation (default/available values etc.): https://newsapi.org/docs/endpoints/everything
Press Enter to use the default value for each parameter
Enter a value for "q":
...
The script saves data in a JSON format.
custom_params
: a dictionary of parameters you customised
datetime
: date & time the query was made
total_articles
: number of articles collected
articles
: a list of articles that matched your query
{
"custom_params": {
"q": "gender AND technology",
"language": "en",
"pageSize": 100
},
"datetime": "2018-09-18 21:23:29.426768",
"total_articles": 2,
"articles": [
{
"source": {
"id": null,
"name": "Insidehighered.com"
},
"author": "Colleen Flaherty",
"title": "New analysis suggests women's success in STEM Ph.D. programs has much to do with having female peers, especially in their first year in graduate school",
"description": "Having female peers -- even just a few of them -- can increase a woman\u2019s odds of making it through her Ph.D. program in the natural sciences, technology, engineering or math, says a new working paper from the National Bureau of Economic Research. Based on a s\u2026",
"url": "https://www.insidehighered.com/news/2018/09/18/new-analysis-suggests-womens-success-stem-phd-programs-has-much-do-having-female",
"urlToImage": "https://www.insidehighered.com/sites/default/server_files/media/723520409-170667a.jpg",
"publishedAt": "2018-09-18T07:00:00Z",
"content": "Having female peers -- even just a few of them -- can increase a woman\u2019s odds of making it through her Ph.D. program in the natural sciences, technology, engineering or math, says a new working paper from the National Bureau of Economic Research. Based on a s\u2026 [+7864 chars]"
},
{
"source": {
"id": null,
"name": "Qz.com"
},
"author": "Ananya Bhattacharya",
"title": "Overwhelming evidence why India Inc should hire and promote more women",
"description": "Women rarely make it to the top in Indian companies, but once they do, they stick. With an attrition rate of just 4%, women are\u2026",
"url": "https://qz.com/india/1392825/why-india-inc-should-hire-and-promote-more-women/",
"urlToImage": "https://cms.qz.com/wp-content/uploads/2018/09/RTR3E3E0-e1537216571463.jpg?quality=75&strip=all&w=1400",
"publishedAt": "2018-09-18T06:23:05Z",
"content": "Women rarely make it to the top in Indian companies, but once they do, they stick. With an attrition rate of just 4%, women are twice as likely to stay back in their organisations at the C-Suite level, according to a 2018 study by workplace diversity expert A\u2026 [+3163 chars]"
}
]
}
Here is an example of how you can load the file as a JSON object in Python.
import json
with open('data.txt','r') as file:
data = json.load(file)
I've created another script (load_data.py
) to make a quick inspection of the data easier. It uses python-dateutil
package to parse the publishedAt
attribute which is in ISO 8601 format.
Usage: load_data.py <filename>
e.g. $ python load_data.py data.txt
$ python load_data.py data.txt
Source: Insidehighered.com
Title: New analysis suggests women's success in STEM Ph.D. programs has much to do with having female peers, especially in their first year in graduate school
Written by: Colleen Flaherty
Published at: 2018-09-18 07:00:00+00:00
URL: https://www.insidehighered.com/news/2018/09/18/new-analysis-suggests-womens-success-stem-phd-programs-has-much-do-having-female
Source: Qz.com
Title: Overwhelming evidence why India Inc should hire and promote more women
Written by: Ananya Bhattacharya
Published at: 2018-09-18 06:23:05+00:00
URL: https://qz.com/india/1392825/why-india-inc-should-hire-and-promote-more-women/
...
Please feel free to raise issues or pull requests as you see room for improvement 🙏
💻 Software Engineer @ BBC R&D
🌈 Co-founder @ Women Driven Development