Merge pull request #36 from CreamyLong/patch-1

fix typo
PKU-YuanGroup · Mar 5, 2024 · 26dd677 · 26dd677
2 parents 1bfc48d + c0771cd
commit 26dd677
Showing 1 changed file with 3 additions and 3 deletions.
diff --git a/index.html b/index.html
@@ -230,15 +230,15 @@ <h3 id="1-Variable Aspect Ratios">(1) Variable Aspect Ratios</h3>
 </p>
 <!-- <ul>
   <li><strong>High-quality User Instruct Data</strong>. We implement a dynamic masking stategy for batch training in parallel while maintaining flexible aspect ratios. Specifically, we resize high-resolution videos to make their longest side 256 pixels, maintaining aspect ratios, and then pad them with zeros on the right and bottom to achieve a consistent 256x256 resolution. This facilitates videovae to encode videos in batches and diffusion model to denoise batches of latents with their own attention masks.</li>
-  <li><strong>Multimodal Document/Chart Data</strong>. During inferencing, we use Position Interpolation[xx] to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noist latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.
+  <li><strong>Multimodal Document/Chart Data</strong>. During inferencing, we use Position Interpolation[xx] to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noisy latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.
   </li>
 </ul> -->
 
 <h3 id="2-Variable Resolutions">(2) Variable Resolutions</h3>
-<p>During inferencing, we use <a href="https://arxiv.org/pdf/2306.15595.pdf">Position Interpolation</a> to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noist latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.</p>
+<p>During inferencing, we use <a href="https://arxiv.org/pdf/2306.15595.pdf">Position Interpolation</a> to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noisy latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.</p>
 <!-- <ul>
   <li><strong>High-quality User Instruct Data</strong>. We implement a dynamic masking stategy for batch training in parallel while maintaining flexible aspect ratios. Specifically, we resize high-resolution videos to make their longest side 256 pixels, maintaining aspect ratios, and then pad them with zeros on the right and bottom to achieve a consistent 256x256 resolution. This facilitates videovae to encode videos in batches and diffusion model to denoise batches of latents with their own attention masks.</li>
-  <li><strong>Multimodal Document/Chart Data</strong>. During inferencing, we use Position Interpolation[xx] to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noist latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.
+  <li><strong>Multimodal Document/Chart Data</strong>. During inferencing, we use Position Interpolation[xx] to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noisy latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.
   </li>
 </ul> -->