LLaNA: Large Language and NeRF Assistant (NeurIPS 2024)

Andrea Amaduzzi Pierluigi Zama Ramirez Giuseppe Lisanti Samuele Salti Luigi Di Stefano
Computer Vision Lab, University of Bologna, Italy

📋 Contents

🔧 Installation
📦 Data Preparation
👨‍🎓 Training
🧑‍🏫 Evaluation
🗣️ Chatting
🔗 Citation
📄 License
📚 Related Work
👏 Acknowledgements

🔧 Installation

The code provided in this repository has been tested in the following environment:

Ubuntu 20.04
CUDA 12
Python 3.10.0

To start:

Clone this repository.

git clone git@github.com:CVLAB-Unibo/LLaNA.git
cd LLaNA

Install packages

conda create -n llana python=3.10 -y
conda activate llana
pip install --upgrade pip
pip install -r requirements.txt

# * for training
pip install ninja
pip install flash-attn

📦 Data Preparation

ShapeNeRF-Text provides paired NeRFs and language annotations for ShapeNet objects, in particular for all the 40K NeRFs available in nf2vec dataset. Such data can be downloaded and prepared from the Huggingface Hub:

python download_shapenerf_text.py

After download, the folder structure will be the following:

LLaNA
├── data
│   ├── shapenerf_text
│   │   ├── train
│   │   │    ├── texts
│   │   │    │    ├── conversations_brief.json
│   │   │    │    ├── conversations_complex.json
│   │   │    ├── vecs     
|   |   |    |    ├── <model_id>.npy
|   |   |    |    ├── ...
|   |   |    |    ├── <model_id>.npy
│   │   ├── val
│   │   │    ├── texts
│   │   │    │    ├── conversations_brief.json
│   │   │    │    ├── conversations_complex.json
│   │   │    ├── vecs     
|   |   |    |    ├── <model_id>.npy
|   |   |    |    ├── ...
|   |   |    |    ├── <model_id>.npy
│   │   ├── test
│   │   │    ├── texts
│   │   │    │    ├── conversations_brief.json
│   │   │    │    ├── conversations_complex.json
│   │   │    ├── vecs     
|   |   |    |    ├── <model_id>.npy
|   |   |    |    ├── ...
|   |   |    |    ├── <model_id>.npy
|   |   ├── hst_dataset_filtered.json

where:

texts/ folder contains the language annotations
vecs/ folder contains the embeddings from nf2vec

👨‍🎓 Training

Training Stage 1

cd LLaNA
bash scripts/LLaNA_train_stage1.sh

Training Stage 2

cd LLaNA
bash scripts/LLaNA_train_stage2.sh

Computational Resources for Training

LLaNA has been trained on 4 NVIDIA A100 with 64GB of VRAM each. Completing both stages requires less than 1 day of training. The weights of the trained models will be saved inside the outputs directory.

Checkpoints of trained LLaNA

The trained LLaNA-7b model is hosted on Huggingface Hub here.

🧑‍🏫 Evaluation

The evaluation metrics reported in the research paper are computed on the test set of ShapeNeRF-Text, which can be downloaded following the instructions in the Data Preparation section.

NeRF captioning

NeRF captioning task can be evaluated on three different data sources:

Brief textual descriptions, from ShapeNeRF-Text Dataset
Detailed textual descriptions, from ShapeNeRF-Text Dataset
GPT2Shape HST, from Looking at words and points with attention

python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data brief_description

python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data detailed_description

python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --hst_dataset

model_name provides the path to the model weights, which must be stored inside the outputs directory. These scripts compute the LLaNA textual predictions for the captioning task. Such output captions will be saved in the directory evaluation_results as json files.

Once obtained such textual data, the evaluation metrics reported on the research paper (SentenceBERT, SimCSE, BLEU-1, ROUGE-L, METEOR) can be computed with the following code:

python llana/eval/traditional_evaluator_shapenet.py --results_path PATH_TO_RESULTS

where results_path provides the path to the json file with the predictions from LLaNA.

NeRF QA

NeRF QA task can be evaluated by using the single-round questions and answers, belonging to the test set of ShapeNeRF-Text Dataset.

python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data single_round

As for the captioning task described before, the quantitative metrics on NeRF QA can be computed in the following way:

python llana/eval/traditional_evaluator_shapenet.py --results_path PATH_TO_RESULTS

where results_path provides the path to the json path with the predictions from LLaNA.

Computational Resources for Evaluation

By default, the evaluation is performed using torch float16 data types. Such choice allows to evaluate LLaNA on a single NVIDIA GeForce RTX 3090 with 24GB of VRAM.

🗣️ Chatting

You can chat with LLaNA about any NeRF from our dataset by running the following code:

python llana/eval/LLaNA_chat.py --model_name andreamaduzzi/LLaNA-7B --torch_dtype float16

Computational Resources for Chatting

As for the NeRF Captioning-QA Tasks, using torch.float16 as data type, the inference of the model can be executed on a single NVIDIA GeForce RTX 3090 with 24GB of VRAM.

🔗 Citation

If you find our work helpful, please consider starring this repo 🌟 and cite:

@InProceedings{NeurIPS24,
  author       = "Amaduzzi, Andrea and Zama Ramirez, Pierluigi and Lisanti, Giuseppe and Salti, Samuele and Di Stefano, Luigi",
  title        = "{LLaNA}: Large Language and {NeRF} Assistant",
  booktitle    = "Advances in Neural Information Processing Systems (NeurIPS)",
  year         = "2024",
  month        = "Dec."
}

📄 License

TODO: correggi licenza
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

📚 Related Work

PointLLM: Our codebase is built upon this work.
3D-LLM
GPT4Point
LLaVA
LLAMA

👏 Acknowledgements

CINECA: We acknowledge the CINECA award under the ISCRA initiative, for the availability of high-performance computing resources and support

Terms of usage

By using this service, users are required to agree to the following terms: The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The service may collect user dialogue data for future research.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LLaNA: Large Language and NeRF Assistant (NeurIPS 2024)

📋 Contents

🔧 Installation

📦 Data Preparation

👨‍🎓 Training

Training Stage 1

Training Stage 2

Computational Resources for Training

Checkpoints of trained LLaNA

🧑‍🏫 Evaluation

NeRF captioning

NeRF QA

Computational Resources for Evaluation

🗣️ Chatting

Computational Resources for Chatting

🔗 Citation

📄 License

📚 Related Work

👏 Acknowledgements

Terms of usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

LLaNA: Large Language and NeRF Assistant (NeurIPS 2024)

📋 Contents

🔧 Installation

📦 Data Preparation

👨‍🎓 Training

Training Stage 1

Training Stage 2

Computational Resources for Training

Checkpoints of trained LLaNA

🧑‍🏫 Evaluation

NeRF captioning

NeRF QA

Computational Resources for Evaluation

🗣️ Chatting

Computational Resources for Chatting

🔗 Citation

📄 License

📚 Related Work

👏 Acknowledgements

Terms of usage