Large Language Models are Limited in Out-of-Context Knowledge Reasoning

Overview

This repository shares the code and data of our latest work Large Language Models are Limited in Out-of-Context Knowledge Reasoning. In this work, we focus on evaluating the Out-of-Context Knowledge Reasoning (OCKR) capabilities of Large Language Models (LLMs). OCKR refers to the ability of models to combine multiple pieces of knowledge and infer new information, independent of the context provided in the prompt. We designed a synthetic dataset with seven representative OCKR tasks to systematically assess these capabilities. Our evaluation shows that LLMs exhibit limitations in OCKR, regardless of training settings. While training models with explicit knowledge retrieval improves attribute knowledge retrieval, it does not significantly enhance relational knowledge reasoning. We also explore cross-lingual knowledge transfer as a distinct form of OCKR, showing that LLMs have limited success in this area.

Dataset Description

This supplementary material contains five primary dataset folders, aimed at helping readers better understand and verify the experimental results:

Basic OCKR Dataset: This folder contains the training data used in Section 4.2 "Basic OCKR Results."
Complete Reasoning Data Dataset: This folder contains the training data used in Section 4.3 "Assisting OCKR with Reasoning Training."
Chain-of-Thought Training Dataset: This folder contains the training data used in Section 4.4 "Assisting OCKR with Retrieval Hints."
Cross-Lingual Dataset: This folder contains the training and testing data used in Section 4.5 "Evaluation of Cross-Lingual OCKR."
Test Dataset: This folder contains the test data used in Sections 4.2 to 4.4.

All datasets are in JSON format. Each test data response includes three parts: the model-generated reference answer, the correct answer for exact match detection (label), and an identifier for distinguishing the type of knowledge (type). For example, a knowledge triplet would be appended after the model-generated answer in the format [[label:2010]][[type:(y, birth year, year)]]. If it is a target knowledge triad it will be labeled [[type:targe]].

Train scripts

Our scripts for full and lora training are trainFull.sh and trainLora.sh respectively. the library used is https://github.com/hiyouga/LLaMA-Factory

Code for generating data

In the genDataCode directory. Since we performed data generation for various types of data and multiple settings (and other data consistent with the conclusions in the paper). Our code for generating data was very messy. We tried our best to make deletions and partial modifications. However, it is still harder to read and is for reference only.

Citation

If you find this repository helpful, feel free to cite our paper.

@article{hu2024limited,
  title={Limited Out-of-Context Knowledge Reasoning in Large Language Models},
  author={Hu, Peng and Gao, Changjiang and Gao, Ruiqi and Chen, Jiajun and Huang, Shujian},
  journal={arXiv preprint arXiv:2406.07393},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models are Limited in Out-of-Context Knowledge Reasoning

Overview

Dataset Description

Train scripts

Code for generating data

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Basic OCKR		Basic OCKR
Chain-of-Thought		Chain-of-Thought
Complete-Reason-Data		Complete-Reason-Data
Cross-Lingual		Cross-Lingual
Test		Test
genDataCode		genDataCode
README.md		README.md
trainFull.sh		trainFull.sh
trainLora.sh		trainLora.sh

NJUNLP/ID-OCKR

Folders and files

Latest commit

History

Repository files navigation

Large Language Models are Limited in Out-of-Context Knowledge Reasoning

Overview

Dataset Description

Train scripts

Code for generating data

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages