Skip to content

hsp-iit/hsp-land-annotation-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HSP-LAND

This repository contains the source code needed to carry out a preliminary set of experiments within the HSP-LAND collaboration.

🔧 Installation & Requirements

The framework relies on the use of the code from a set of repositories available online.

All these codebases can be used in one of two ways, either:

  • installing every repository locally on your machine, or
  • relying on the use of a Singularity container prepared ad-hoc.

Repositories to install

To make use of all the functionalities available, the user should get and install the repositories listed below.

Make sure that all the repositories are cloned and installed within the same root folder, e.g. /home/user/source.

Once all the codebases have been set up, clone the hsp-land-annotation-tool repository within the same /home/user/source folder.

NOTE: in the following of this README, we assume that the user has stored all the codebases within the same /home/user/source directory.

Finally, download and install the Anaconda package manager. Once the installation is done, move to this repository folder and create the Anaconda environment from the .yml file available.

cd \home\usr\source
conda env create -f environment.yml

Model Checkpoints

The user should create a weights folder within the hsp-land-annotation-tool repository. Here, they should gather the checkpoints needed by some of the techniques employed within this framework. At the time of writing of this README document (December 2023), the files contained in the weights folder are:

  • gmflow-scale2-regrefine6-mixdata-train320x576-4e7b215d.pth (UniMatch)
  • mf2_model_final_94dc52.pkl (Mask2Former)
  • rd16-uni.pth (CLIPSeg)
  • rd64-uni.pth (CLIPSeg)
  • rd64-uni-refined.pth (CLIPSeg)
  • seem_focall_v1.pt (Segment Everything Everywhere All At Once)

These files should be retrieved by the corresponding repositories, and copied within the weights directory just created.

At this point, everything should be set up correctly.

In case the user decides to follow the installation steps above, skip the following subsection (Singularity Image) and jump directly to General.

Singularity Image (work in progress)

First, download the singularity image available here.

This section is a draft, and will be completed once the Singularity/Docker image will be available.

General

This codebase has been built with a couple of objectives in mind:

  • flexibility w.r.t. the set of videos employed
  • ease of use for external, non-technical users
  • should be simple to expand with novel methods

Flexibility w.r.t. the Set of Videos

This feature is enabled by the data structure used within the development of this repository, explained in the following.

Data Structure

The data is organized in a way so that any video file can be included in the dataset, while mantaining a proper organization of the subfolders.

Irrespectively from the video original filename, each video is identified within the framework by a single incremental "ID", from 1 and onwards.

This codebase provides a transparent interface to include a novel video, or to retrieve an existing one, the VideoDatabase object, which is used mainly by the src/utils/create_dataset.py script.

This script handles the creation of a new dataset of videos, starting from the ones contained in a folder specified by the user. By opening a terminal and setting the working directory in the root folder of the repository, the script should be launched in the following way:

python src/utils/create_dataset.py --dataset hsp-land --folder data/videos

Assuming that the data/videos subfolder of the repository contains a set of video sequences, this will create the data/databases/hsp-land.json file, which acts as an auxiliary file to identify each video within the database:

{
    "1": {
        "name": "data/videos/DESPICABLEME_eng.mp4",
        "fps": 24
    },
    "2": {
        "name": "data/videos/Fun_with_fractals.mp4",
        "fps": 30
    },
    "3": {
        "name": "data/videos/PRESENT.mp4",
        "fps": 24
    },
    "4": {
        "name": "data/videos/TOTORO_trees_eng.mp4",
        "fps": 30
    },
    "5": {
        "name": "data/videos/MICHELE.mp4",
        "fps": 30
    },
    "6": {
        "name": "data/videos/train_neutral.mp4",
        "fps": 30
    }
}

NOTE: the script above will also create all the frames corresponding to all the videos, stored within data/<dataset_name>/<video_id>/images.

After a few experiments or operations, the final data structure might look something like this:

data/ 
'-- databases/
'  '-- hsp-land.json
'-- hsp-land/
'   '-- 1/
'   '-- 2/
'   '-- 3/
'   '-- 4/
'   '-- 5/
'   '-- 6/
'       '-- images/
'           '-- frame-0000001.jpg
'           '-- frame-0000002.jpg
'           '-- ...
'       '-- results/
'           '-- unimatch/
'           '-- mf2/
'           '-- clipseg/
'               '-- frame-0000001.png
'               '-- frame-0000002.png
'               '-- ...
'-- videos/
    '-- DESPICABLEME_eng.mp4
    '-- Fun_with_fractals.mp4
    '-- PRESENT.mp4
    '-- TOTORO_trees_eng.mp4
    '-- MICHELE.mp4
    '-- train_neutral.mp4

For the sake of the following explanations, in the rest of this README, the folder video_id/ (as in 1/, 2/, ..) for a given sequence will be called <video_folder>.

Ease of Use

The framework is structured in a way to separate the pipeline in three macro-step:

  • dataset creation: needed to convert a set of sequences from video files to frames, via the src/utils/create_dataset.py script
  • feature extraction: driven by the scripts in the src/sample and src/utils directories
  • annotation and refinement: carried out via the lib/ui/interface.py GUI-based program

Dataset Creation

The creation of a dataset is handled by the src/utils/create_dataset.py, as described in the Data Structure section above.

Feature Extraction

The repository is shipped with a sample script (src/sample/sample_process.py) which runs all the available algorithms on all the six initial videos considered for this project.

As it can be seen by inspecting its content, this file specifies the processing of a set of sequences, within specific intervals.

# target folder where to store the qualitative outputs
output_folder_qualitative = "results"
output_folder_multi_visualization = "multivis"
output_folder_files = "results_files"

# list of (sub)sequences to process
sequences = [
             {"name": "data/videos/DESPICABLEME_eng.mp4",
              "prompts": "'girl with a blue pajama' 'girl with a green sweater' 'girl with glasses' 'man' 'little girl'",
              "limits": ("1:46","3s"),
              "gaze": ""},
             
             {"name": "data/videos/Fun_with_fractals.mp4",
              "prompts": "'broccoli'",
              "limits": ("1:09","3s"),
              "gaze": ""},
             
             {"name": "data/videos/PRESENT.mp4",
              "prompts": "'a boy' 'a dog' 'a door'",
              "limits": ("3:03","3s"),
              "gaze": ""},
             
             {"name": "data/videos/TOTORO_trees_eng.mp4",
              "prompts": "'a child' 'a bunny like monster' 'a tree'",
              "limits": ("2:15","3s"),
              "gaze": ""},

             {"name": "data/videos/MICHELE.mp4",
              "prompts": "'a man' 'a chair' 'a tv screen'",
              "limits": ("0:48","3s"),
              "gaze": ""},

             {"name": "data/videos/train_neutral.mp4",
              "prompts": "'a child' 'a woman' 'a toy car'",
              "limits": ("0:03","3s"),
              "gaze": ""}

             ]

# main processing loop
for seq in sequences:

    cmd = f"./src/utils/process_sequence.sh {seq['name']} {seq['limits'][0]} {seq['limits'][1]} \"{seq['prompts']}\" {output_folder_qualitative} {output_folder_multi_visualization} {output_folder_files} {seq['gaze']}"
    
    print(f"[EXP DEBUG PICKLE] Running: {cmd}")
    os.system(cmd)

NOTE: The "limits" entries should be specified in the format (starting_time,duration_in_seconds).

To prepare and run the processing script, the user should now run the following commands in a terminal (note, this is a temporary solution):

conda activate hsp-land

export PYTHONPATH=:/home/user/source/clipseg:/home/user/source/Mask2Former:/home/user/source/Mask2Former/demo:/home/user/source/Segment-Everything-Everywhere-All-At-Once/demo_code:/home/user/source/unimatch:/home/user/source/tapnet

Finally, the user can start the actual processing step:

python src/utils/sample_process.py

The outcome of the processing will be saved in the results/ and results_files/ subfolders within the dataset directory tree, together with a multi-visualization involving the CLIP-Seg, SPIGA, MaskFormer2, MMPose, Segment Everything Everywhere All at Once, and UniMatch methods.


More in detail: the process_sequence.sh script

This script carries out the processing with all the methods available and creates the "multi-visualization" from the results.

For now, all the methods included in the framework are run sequentially, one further customization of this script will be in the direction of allowing to select the techniques to actually run once executed.

...

python3 src/smart_process_video.py --method clipseg ...

python3 src/smart_process_video.py --method spiga ...

python3 src/smart_process_video.py --method mf2 ...

python3 src/smart_process_video.py --method mmpose ...

python3 src/smart_process_video.py --method seem ...

python3 src/smart_process_video.py --method unimatch ...

python3 src/viewers/multiview.py --format 1:base:images 1:spiga:$output_folder/spiga 1:mmpose:$output_folder/mmpose 2:clipseg:$output_folder/clipseg 2:mf2:$output_folder/mf2 2:unimatch:$output_folder/unimatch 3:seem:$output_folder/seem --output $output_folder_multivis --video $video_title --target_height 600 --padding 8 $start_text

Annotation Refinement

Once all the processing has been executed, the framework has completed the extraction of all the features required, for the videos within the dataset used.

The GUI provided within this framework exploits the information contained in the features extracted in the previous step, and allows the user to annotate the sequence in a fine-grained manner, creating custom entities (e.g. associated with the characters acting in the scenes) and the corresponding per-frame annotations.

SPIGA

The GUI can be launched with the following command:

python lib/ui/interface.py

After the interface has been loaded, the user should click on the "File..." menu, then on the "Open" entry. The user should then navigate to a subfolder within the data/ directory corresponding to one of the videos which has been processed. Once there, they should click on the two folders which have been specified as the output_folder_qualitative and output_folder_files within the sample_process.py script.

Once the data is loaded, the user can proceed using the GUI to carry out the annotation, as explained in the video tutorial available at this link.


Utilities

Included in this repositories are a set of utilities meant to ease the user experience when carrying out a set of specific tasks.

Produce Dummy Gaze Data for Visualization

Since the

For now, for every image named frame-XXXXXX.jpg, the gaze visualization code expects a corresponding frame-XXXXXX.txt file to be available, containing the data about the gaze to be rendered.

The frame-XXXXXX.txt file contains entries (rows) in the format x,y,label, and the following examples shows the content of a sample gaze file:

0.7974334584121431,0.5391578576660154,3
0.5469894291992186,0.437883429175395,5

NOTE: the gaze is reported with values normalized within the 0-1 range, so that the visualization can work with any image format in the results

Since this gaze information might be unavailable, the produce_gaze_data.py script creates a set of files containing dummy gaze data entries for all the frames of a video.

python src/utils/produce_gaze_data.py --video data/videos/TOTORO_trees_eng.mp4

This will create and populate a gaze/ folder within the dataset, for the video specified. At the moment, the folder is populated with entries representing two dummy gaze sequences with IDs 3 and 5, moving respectively in a circular and an infinite-like pattern.


A Note on the Multi-Visualization Script

The multi-visualization script is an utility (which is often invoked automatically during the processing) that allows to produce custom renderings of the results for an experiment.

In particular, it allows to compose mosaic-like visualizations, in order to appreciate qualitatively the outcome of the processing via the several methods available.

The following is an example of invocation of the multi-visualization script:

python3 src/viewers/multiview.py --format 1:base:images 1:spiga:results/spiga 2:clipseg:results/clipseg 3:mf2:results/mf2 3:mmpose:results/mmpose --output multivis --video "data/videos/DESPICABLEME_eng.mp4" --target_height 600 --padding 8 --gaze_data gaze 

In this specific call, the arguments are:

  • format: a customized string that allows to
  • output: the subfolder of <video_folder> in which to store the result of the visualization
  • target_height: the target height to use for every frame, in the final mosaic image
  • padding: the white padding to insert between the frames
  • gaze_data: the directory within the <video_folder> containing the gaze data for the sequence (as briefly explained in the section Produce Dummy Gaze Data for Visualization)

NOTE: the format parameter is the most crucial one to achieve the intended composed visualization of the multiple outputs computed. The string is formed by several <row>:<visualization>:<folder> entries each one specifying:

  • in which row to draw the entry
  • which visualization method to use (drives the way the output files are retrieved and rendered)
  • from which folder to gather the results

In case several entries share the same row number, those are stacked horizontally one after another.

Thus, the string --format 1:base:images 1:spiga:results/spiga 2:clipseg:results/clipseg 3:mf2:results/mf2 3:mmpose:results/mmpose will drive the framework to produce output frames made composed in this way:

MULTIVIS-EXPLAINED

The argument --gaze_data gaze used in the call above also loads the "dummy" gaze data produced via another script available within this framework. The GIF below shows an example of the final results, displaying one frame at a time (remind that the output of the multi-visualization script is still represented by single frames).

MULTIVIS-EXPLAINED


Recovering Results from an Experiment

The src/utils/gather_results.py provided allows to recover easily a set of folders resulting from a given experiment, by creating a single .zip file containing the material requested.

This script makes use of the same convention introduced above, i.e. the results for a given video are retrieved by means of its filename.

python3 src/utils/gather_results.py --video data/videos/DESPICABLEME_eng.mp4 --output results.zip --folders multivis

This call will zip the multi-visualization produced by running the code described previously in this repository, zipping all the frames into a results.zip file, which can then be easily retrieved remotely with any utility as the scp command.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published