Encyclopedic knowledge graphs, such as Wikidata, host an extensive repository of millions of knowledge statements. However, domain-specific knowledge from fields such as history, physics, or medicine is significantly underrepresented in those graphs. Although few domain-specific knowledge graphs exist (e.g., Pubmed for medicine), developing specialized retrieval applications for many domains still requires constructing knowledge graphs from scratch. To facilitate knowledge graph construction, we introduce WAKA: a Web application that allows domain experts to create knowledge graphs through the medium with which they are most familiar: natural language.
To use WAKA, you can either use the publicly available service or deploy WAKA locally on your machine.
The public service is available at https://waka.webis.de.
In addition to the knowledge graph authoring GUI, there is an API endpoint available to automatically construct knowledge graphs from text.
Domain: POST waka.webis.de/api/v1/kg
Request body:
{"content": "<your text>"}
Response body:
{
"text": "<your text>",
"triples": [
{"subject": "<ENTITY_OBJ>", "predicate": "<PROPERTY_OBJ>", "object": "<ENTITY_OBJ>"},
{"subject": "<ENTITY_OBJ>", "predicate": "<PROPERTY_OBJ>", "object": "<ENTITY_OBJ>"},
"..."
],
"entities": [
"<ENTITY_OBJ>",
"<ENTITY_OBJ>",
"..."
],
"entity_mentions": [
"<ENTITY_MENTION_OBJ>",
"<ENTITY_MENTION_OBJ>",
"..."
]
}
JSON object schemas:
{ // <ENTITY_OBJ>
"url": "http://www.wikidata.org/entity/...",
"label": "label in Wikidata",
"description": "description in Wikidata",
"score": 1.0,
"mentions": [
"<ENTITY_MENTION_OBJ>",
"..."
]
}
{ // <ENTITY_MENTION_OBJ>
"url": "http://www.wikidata.org/entity/...",
"label": "label in Wikidata",
"description": "description in Wikidata",
"start_idx": 0,
"end_idx": 20,
"text": "mention span content",
"score": 1.0,
"e_type": "NER Type"
}
{ // <PROPERTY_OBJ>
"url": "http://www.wikidata.org/prop/direct/...",
"label": "label in Wikidata",
"description": "description in Wikidata"
}
Example call with curl:
curl -X POST -H "Content-Type: application/json" -d "{\"content\": \"The Bauhaus-Universität Weimar is a university located in Weimar, Germany.\"}" https://waka.webis.de/api/v1/kg
The local deployment of WAKA requires a Nvidia GPU with at least 10GB of VRAM and a minimum of 20GB RAM.
Clone this repository and execute the following command (requires build-essential
):
make clean install
make run
After starting the server, WAKA will be available at http://localhost:8000/static/index.html
A prebuilt docker image of WAKA is available. To spawn a container with this image execute the following command (requires nvidia-container-toolkit
for GPU support):
docker run --gpus all -P 8000:8000 registry.webis.de/code-lib/public-images/waka:latest
After the container is done setting up, WAKA is available at http://localhost:8000/static/index.html
This only becomes necessary if you make adjustments to the code. Execute the following command to build a new image of WAKA from the project directory.
docker build -t <my-name>:<version> .
Performance is measured on the test set of the RED^FM dataset (446 texts).
Step | Task | Macro Precision |
Macro Recall |
Macro F1 |
Micro Precision |
Micro Recall |
Micro F1 |
---|---|---|---|---|---|---|---|
1 | Entity Recognition | 0.0675 | 0.9162 | 0.1220 | 0.1544 | 0.9892 | 0.2671 |
2 | Entity Retrieval | 0.0021 | 0.8258 | 0.0042 | 0.0016 | 0.8340 | 0.0042 |
3 | Entity Reranking | 0.0110 | 0.7849 | 0.0212 | 0.0063 | 0.7907 | 0.0124 |
4 | Relation Extraction | 0.3033 | 0.7775 | 0.4069 | 0.5505 | 1.0000 | 0.7101 |
5 | Relation Linking | 0.3033 | 0.7775 | 0.4069 | 0.5505 | 1.0000 | 0.7101 |
6 | Knowledge Fusion | 0.1548 | 0.3028 | 0.1824 | 0.1425 | 0.3065 | 0.1946 |
7 | Natural Language Inference | 0.2057 | 0.3284 | 0.2270 | 0.1999 | 0.3325 | 0.2497 |
If you make use of WAKA's authoring GUI or the knowledge graph creation algorithm, please cite the following work.
@InProceedings{gohsen:2024a,
author = {Marcel Gohsen and Benno Stein},
booktitle = {9th ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 2024)},
doi = {10.1145/3627508.3638340},
isbn = {979-8-4007-0434-5/24/03},
month = mar,
publisher = {ACM},
site = {Sheffield, United Kingdon},
title = {{Assisted Knowledge Graph Authoring: Human-Supervised Knowledge Graph Construction from Natural Language}},
year = 2024
}