Skip to content

Latest commit

 

History

History
231 lines (195 loc) · 9.15 KB

README.md

File metadata and controls

231 lines (195 loc) · 9.15 KB

LLaNA: Large Language and NeRF Assistant (NeurIPS 2024)

Andrea AmaduzziPierluigi Zama RamirezGiuseppe LisantiSamuele SaltiLuigi Di Stefano
Computer Vision Lab, University of Bologna, Italy

Teaser GIF

📋 Contents

🔧 Installation

The code provided in this repository has been tested in the following environment:

  • Ubuntu 20.04
  • CUDA 12
  • Python 3.10.0

To start:

  1. Clone this repository.
git clone [email protected]:CVLAB-Unibo/LLaNA.git
cd LLaNA
  1. Install packages
conda create -n llana python=3.10 -y
conda activate llana
pip install --upgrade pip
pip install -r requirements.txt

# * for training
pip install ninja
pip install flash-attn

📦 Data Preparation

ShapeNeRF-Text provides paired NeRFs and language annotations for ShapeNet objects, in particular for all the 40K NeRFs available in nf2vec dataset. Such data can be downloaded and prepared from the Huggingface Hub:

python download_shapenerf_text.py

After download, the folder structure will be the following:

LLaNA
├── data
│   ├── shapenerf_text
│   │   ├── train
│   │   │    ├── texts
│   │   │    │    ├── conversations_brief.json
│   │   │    │    ├── conversations_complex.json
│   │   │    ├── vecs     
|   |   |    |    ├── <model_id>.npy
|   |   |    |    ├── ...
|   |   |    |    ├── <model_id>.npy
│   │   ├── val
│   │   │    ├── texts
│   │   │    │    ├── conversations_brief.json
│   │   │    │    ├── conversations_complex.json
│   │   │    ├── vecs     
|   |   |    |    ├── <model_id>.npy
|   |   |    |    ├── ...
|   |   |    |    ├── <model_id>.npy
│   │   ├── test
│   │   │    ├── texts
│   │   │    │    ├── conversations_brief.json
│   │   │    │    ├── conversations_complex.json
│   │   │    ├── vecs     
|   |   |    |    ├── <model_id>.npy
|   |   |    |    ├── ...
|   |   |    |    ├── <model_id>.npy
|   |   ├── hst_dataset_filtered.json

where:

  1. texts/ folder contains the language annotations
  2. vecs/ folder contains the embeddings from nf2vec

👨‍🎓 Training

Model architecture

Training Stage 1

cd LLaNA
bash scripts/LLaNA_train_stage1.sh

Training Stage 2

cd LLaNA
bash scripts/LLaNA_train_stage2.sh

Computational Resources for Training

LLaNA has been trained on 4 NVIDIA A100 with 64GB of VRAM each. Completing both stages requires less than 1 day of training. The weights of the trained models will be saved inside the outputs directory.

Checkpoints of trained LLaNA

The trained LLaNA-7b model is hosted on Huggingface Hub here.

🧑‍🏫 Evaluation

The evaluation metrics reported in the research paper are computed on the test set of ShapeNeRF-Text, which can be downloaded following the instructions in the Data Preparation section.

NeRF captioning

NeRF captioning task can be evaluated on three different data sources:

  1. Brief textual descriptions, from ShapeNeRF-Text Dataset
  2. Detailed textual descriptions, from ShapeNeRF-Text Dataset
  3. GPT2Shape HST, from Looking at words and points with attention
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data brief_description
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data detailed_description
python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --hst_dataset

model_name provides the path to the model weights, which must be stored inside the outputs directory. These scripts compute the LLaNA textual predictions for the captioning task. Such output captions will be saved in the directory evaluation_results as json files.

Once obtained such textual data, the evaluation metrics reported on the research paper (SentenceBERT, SimCSE, BLEU-1, ROUGE-L, METEOR) can be computed with the following code:

python llana/eval/traditional_evaluator_shapenet.py --results_path PATH_TO_RESULTS

where results_path provides the path to the json file with the predictions from LLaNA.

NeRF QA

NeRF QA task can be evaluated by using the single-round questions and answers, belonging to the test set of ShapeNeRF-Text Dataset.

python llana/eval/eval_llana.py --model_name andreamaduzzi/LLaNA-7B --text_data single_round

As for the captioning task described before, the quantitative metrics on NeRF QA can be computed in the following way:

python llana/eval/traditional_evaluator_shapenet.py --results_path PATH_TO_RESULTS

where results_path provides the path to the json path with the predictions from LLaNA.

Computational Resources for Evaluation

By default, the evaluation is performed using torch float16 data types. Such choice allows to evaluate LLaNA on a single NVIDIA GeForce RTX 3090 with 24GB of VRAM.

🗣️ Chatting

You can chat with LLaNA about any NeRF from our dataset by running the following code:

python llana/eval/LLaNA_chat.py --model_name andreamaduzzi/LLaNA-7B --torch_dtype float16

Computational Resources for Chatting

As for the NeRF Captioning-QA Tasks, using torch.float16 as data type, the inference of the model can be executed on a single NVIDIA GeForce RTX 3090 with 24GB of VRAM.

🔗 Citation

If you find our work helpful, please consider starring this repo 🌟 and cite:

@InProceedings{NeurIPS24,
  author       = "Amaduzzi, Andrea and Zama Ramirez, Pierluigi and Lisanti, Giuseppe and Salti, Samuele and Di Stefano, Luigi",
  title        = "{LLaNA}: Large Language and {NeRF} Assistant",
  booktitle    = "Advances in Neural Information Processing Systems (NeurIPS)",
  year         = "2024",
  month        = "Dec."
} 

📄 License

TODO: correggi licenza Creative Commons License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

📚 Related Work

👏 Acknowledgements

CINECA: We acknowledge the CINECA award under the ISCRA initiative, for the availability of high-performance computing resources and support

Terms of usage

By using this service, users are required to agree to the following terms: The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The service may collect user dialogue data for future research.