-
Notifications
You must be signed in to change notification settings - Fork 13
Stage 5: Tagging and Captioning
Tag, prune tags, and caption
- If you start from this stage, please set
--src_dir
to the training folder with images to tag and caption (can be independently used as tagger). - In-place operation.
- After this stage, you can edit tags and characters yourself using suitable tools, especially if you put
--save_aux processed_tags characters
. More details follow.
In this phase, we use a publicly available taggers to tag images.
-
tagging_method
: Choose a tagger available in waifuc. Choices are 'deepdanbooro', 'wd14_vit', 'wd14_convnext', 'wd14_convnextv2', 'wd14_swinv2', and 'mldanbooru'. Default is 'wd14_convnextv2'.
Example usage: --tagging_method mldanbooru -
overwrite_tags
: Overwrite existing tags even if tags exist in metadata.
Example usage: --overwrite_tags -
tag_threshold
: Threshold for tagger. Default is 0.35.
Example usage: --tag_threshold 0.3
During fine-tuning, each component presented in the images are more easily associated to the text that better represent it. Therefore, to make sure that the trigger words / embeddings are correctly associated to the key characteristics of the concept, we should remove tags that are inherent to the concept. This can be regarded as a trade-off between "convenience" and "flexibility". Without pruning, we can still pretty much get the concept by including all the relevant tags. However, there is also a risk that these tags get "polluted" and get definitely bound to the target concept.
The pruning process goes over a few steps. You can deactivate tag pruning by setting --prune_mode none
.
-
Prune blacklisted tags. Remove tags in the file
--blacklist_tags_file
(one tag per line). Use blacklist_tags.txt by default. -
Prune overlap tags. This includes tags that are sub-string of other tags, and overlapped tags specified in
--overlap_tags_file
. Use overlap_tags.json by default. -
Prune character-related tags. This stage uses
--character_tags_file
, character_tags.json by default, to prune character-related tags. All the character-related tags below difficulty--drop_difficulty
are dropped if--prune_mode
is set tocharacter
; otherwise for--prune_mode
set tocharacter_core
(the default value), only "core tags" of the characters that appear in the images are dropped (see Core tags for details).
The remaining tags are saved to the field processed_tags
of metadata.
Character tag difficulty. We may want to associate a difficulty level to each character tag depending on how hard it is to learn the characteristic. This is done in character_tags.json, where we consider two difficulty levels: 0 for human-related tags and 1 for furry, mecha, demon, etc.
-
prune_mode
: This argument defines the strategy for tag pruning. The available options arecharacter
,character_core
,minimal
, andnone
, with the default beingcharacter_core
.
Example usage: --prune_mode character
Description: Each mode represents a different level of tag pruning:-
none
: No pruning is performed, retaining all tags. -
minimal
: Only the first two steps mentioned in overview are performed. -
character_core
: This mode prunes only the core tags of the relevant characters. -
character
: This mode prunes all character-related tags up to--drop_difficulty
.
-
-
blacklist_tags_file
,overlap_tags_file
, andcharacter_tags_file
: These arguments specify the paths to the files containing different tag filtering information.
Example usage: --blacklist_tags_file path/to/blacklist_tags.txt -
process_from_original_tags
: When enabled, the tag processing starts from the original tags instead of previously processed tags.
Example usage: --process_from_original_tags -
drop_difficulty
: Determines the difficulty level below which character tags are dropped. Tags with a difficulty level less than this value are added to the drop list, while tags at or above this level are not dropped. The default setting is 2.
Example usage: --drop_difficulty 1
A tag is considered a "core tag" of a character if it frequently appears in images containing that character. Identifying these core tags is useful for both deciding which tags should be dropped due to their inherent association with the concept, and determining which tags are suitable for initializing character embeddings in pivotal tuning.
-
compute_core_tag_up_levels
: Specifies the number of directory levels to ascend from the tagged directory for computing core tags. The default is 1, meaning the computation covers all image types.
Example usage: --compute_core_tag_up_levels 0 -
core_frequency_thresh
: Sets the minimum frequency threshold for a tag to be considered a core tag. The default value is 0.4.
Example usage: --core_frequency_thresh 0.5 -
use_existing_core_tag_file
: When enabled, uses the existing core tag file instead of recomputing the core tags.
Example usage: --use_existing_core_tag_file -
drop_all_core
: If enabled, all core tags are dropped, overriding the--drop_difficulty
setting.
Example usage: --drop_all_core -
emb_min_difficulty
: Sets the minimum difficulty level for tags to be used in embedding initialization. The default is 1.
Example usage: --emb_min_difficulty 0 -
emb_max_difficulty
: Determines the maximum difficulty level for tags used in embedding initialization. The default is 2.
Example usage: --emb_max_difficulty 1 -
emb_init_all_core
: If enabled, all core tags are used for embedding initialization, overriding the--emb_min_difficulty
and--emb_max_difficulty
settings.
Example usage: --emb_init_all_core -
append_dropped_character_tags_wildcard
: Append dropped character tags to the wildcard
Example usage: --append_dropped_character_tags_wildcard
Note that core tags are always computed, and core_tags.json
and wildcard.txt
are always saved. However, they are computed at the end of tag processing when --pruned_mode
is not character_core
. You can also use get_core_tags.py to recompute them.
This step consists of tag sorting, optional appending of dropped character tags, and refining the tag list to adhere to the maximum number of tags specified by --max_tag_number
. Initially, we place certain tags like 'solo', '1girl', '1boy', 'Xgirls', and 'Xboys' at the beginning. Subsequently, the tags are organized based on the sort_mode
, which determines their order.
-
sort_mode
: Determines the method for sorting tags. Defaults to score.
Example usage: --sort_mode shuffle
Description: The available modes offer different approaches to tag ordering:-
original
: Maintains the original sequence of the tags as they appear in the data. This mode preserves the initial tag order without any alterations. -
shuffle
: Randomizes the order of tags. This mode introduces variety by shuffling the tags into a random sequence, differing for each image. -
score
: Sorts tags based on their scores, applicable when using a tagger. In this mode, tags are arranged with those having higher scores placed first, prioritizing the most significant tags.
-
-
append_dropped_character_tags
: Adds previously dropped character tags back into the tag set, placing them at the end of the tag list.
Example usage: --append_dropped_character_tags -
max_tag_number
: Limits the total number of tags included in each image's caption. The default limit is 30.
Example usage: --max_tag_number 50
This step uses tags and other fields to produce captions, saved both in caption
field of metadata and as separate .txt
files.
-
caption_ordering
: Specifies the order in which different information types appear in the caption. The default order is['npeople', 'character', 'copyright', 'image_type', 'artist', 'rating', 'crop_info', 'tags']
.
Example usage: --caption_ordering character copyright tags -
caption_inner_sep
,caption_outer_sep
,character_sep
,character_inner_sep
,character_outer_sep
: These parameters define the separators for different elements and levels within the captions, such as separating items within a single field or separating different fields
Example usage: --caption_outer_sep "; " -
use_[XXX]_prob
arguments: These settings control the probability of including specific types of information in the captions, like character info, copyright info, and others.
Example usage: --use_character_prob 0.8
For a complete list of all available arguments for captioning, please refer to the configuration file at configs/pipelines/base.toml.
-
keep_tokens_sep
: Determines the separator for keep tokens specifically used in Kohya trainer. By default, it uses the value set in--character_outer_sep
.
Example usage: --keep_tokens_sep "|| " -
keep_tokens_before
: Specifies the position in the caption where thekeep_tokens_sep
should be placed before. The default setting is 'tags'.
Example usage: --keep_tokens_before crop_info
You can use --save_aux
to save some metadata fields to separate files and use --load aux
to load them.
Typically, you can run with --save_aux processed_tags characters
. You then get files with names like XXX.processed_tags
and XXX.characters
. These can be batch edited with tools such as dataset-tag-editor or BatchPrompter. These changes can then be loaded with --load_aux processed_tags characters
. Remember to run again from this stage to update captions.
- Since tags are not overwritten by default, you don't need to worry about tagger overwriting the edited tags.
- It is better to correct detected characters after stage 3. Here you can edit
XXX.characters
to add non-detected characters. ⚠️ If you edit caption directly, running this stage again will overwrite the changes.
- Home
- Dataset Organization
- Main Arguments
- Organization of the Character Reference Directory
- Start Training
- Conversion Scripts
- Anime and fanart downloading
- Frame extraction and similar image removal
- Character detection and cropping
- Character classification
- Image selection and resizing
- Tagging, captioning, and generating wildcards and embedding initialization information
- Dataset arrangement
- Repeat computation for concept balancing