Skip to content

Latest commit

 

History

History
122 lines (104 loc) · 6.77 KB

README.md

File metadata and controls

122 lines (104 loc) · 6.77 KB

🌟 Introduction

  • We propose a One-shot Diffusion Mimicker (One-DM) for stylized handwritten text generation, which only requires a single reference sample as style input, and imitates its writing style to generate handwritten text with arbitrary content.
  • Previous state-of-the-art methods struggle to accurately extract a user's handwriting style from a single sample due to their limited ability to learn styles. To address this issue, we introduce the high-frequency components of the reference sample to enhance the extraction of handwriting style. The proposed style-enhanced module can effectively capture the writing style patterns and suppress the interference of background noise.
  • Extensive experiments on handwriting datasets in English, Chinese, and Japanese demonstrate that our approach with a single style reference even outperforms previous methods with 15x-more references.

Overview of the proposed One-DM

🌠 Release

  • [2024/10/24] We have provided a well-trained One-DM checkpoint on Google Drive and Baidu Drive :)
  • [2024/09/16] This work is reported by Synced (机器之心).
  • [2024/09/07]🔥🔥🔥 We open-source the first version of One-DM that can generate the handwritten words. (Later versions supporting Chinese and Japanese will be released soon.)

🔨 Requirements

conda create -n One-DM python=3.8 -y
conda activate One-DM
# install all dependencies
conda env create -f environment.yml

☀️ Datasets

We provide English datasets in Google Drive | Baidu Netdisk | ShiZhi AI. Please download these datasets, uzip them and move the extracted files to /data.

🐳 Model Zoo

Model Google Drive Baidu Netdisk ShiZhi AI
Pretrained One-DM Google Drive Baidu Netdisk ShiZhi AI
Pretrained OCR model Google Drive Baidu Netdisk ShiZhi AI
Pretrained Resnet18 Google Drive Baidu Netdisk ShiZhi AI

Note: Please download these weights, and move them to /model_zoo. (If you cannot access the pre-trained VAE model available on Hugging Face, please refer to the pinned issue for guidance.)

🏋️ Training & Test

  • training on English dataset
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=2 train.py \
    --feat_model model_zoo/RN18_class_10400.pth \
    --log English
  • finetune on English dataset
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 train_finetune.py \
    --one_dm ./Saved/IAM64_scratch/English-timestamp/model/epoch-ckpt.pt \
    --ocr_model ./model_zoo/vae_HTR138.pth --log English

Note: Please modify timestamp and epoch according to your own path.

  • test on English dataset
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 test.py \
   --one_dm ./Saved/IAM64_finetune/English-timestamp/model/epoch-ckpt.pt \
   --generate_type oov_u --dir ./Generated/English

Note: Please modify timestamp and epoch according to your own path.

📺 Exhibition

  • Comparisons with industrial image generation methods on handwritten text generation

  • Comparisons with industrial image generation methods on Chinese handwriting generation

  • English handwritten text generation

  • Chinese and Japanese handwriting generation

❤️ Citation

If you find our work inspiring or use our codebase in your research, please cite our work:

@inproceedings{one-dm2024,
  title={One-Shot Diffusion Mimicker for Handwritten Text Generation},
  author={Dai, Gang and Zhang, Yifan and Ke, Quhui and Guo, Qiangya and Huang, Shuangping},
  booktitle={European Conference on Computer Vision},
  year={2024}
}

⭐ StarGraph

Star History Chart