Skip to content

Commit

Permalink
Merge pull request #36 from CreamyLong/patch-1
Browse files Browse the repository at this point in the history
fix typo
  • Loading branch information
LinB203 authored Mar 5, 2024
2 parents 1bfc48d + c0771cd commit 26dd677
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -230,15 +230,15 @@ <h3 id="1-Variable Aspect Ratios">(1) Variable Aspect Ratios</h3>
</p>
<!-- <ul>
<li><strong>High-quality User Instruct Data</strong>. We implement a dynamic masking stategy for batch training in parallel while maintaining flexible aspect ratios. Specifically, we resize high-resolution videos to make their longest side 256 pixels, maintaining aspect ratios, and then pad them with zeros on the right and bottom to achieve a consistent 256x256 resolution. This facilitates videovae to encode videos in batches and diffusion model to denoise batches of latents with their own attention masks.</li>
<li><strong>Multimodal Document/Chart Data</strong>. During inferencing, we use Position Interpolation[xx] to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noist latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.
<li><strong>Multimodal Document/Chart Data</strong>. During inferencing, we use Position Interpolation[xx] to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noisy latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.
</li>
</ul> -->

<h3 id="2-Variable Resolutions">(2) Variable Resolutions</h3>
<p>During inferencing, we use <a href="https://arxiv.org/pdf/2306.15595.pdf">Position Interpolation</a> to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noist latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.</p>
<p>During inferencing, we use <a href="https://arxiv.org/pdf/2306.15595.pdf">Position Interpolation</a> to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noisy latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.</p>
<!-- <ul>
<li><strong>High-quality User Instruct Data</strong>. We implement a dynamic masking stategy for batch training in parallel while maintaining flexible aspect ratios. Specifically, we resize high-resolution videos to make their longest side 256 pixels, maintaining aspect ratios, and then pad them with zeros on the right and bottom to achieve a consistent 256x256 resolution. This facilitates videovae to encode videos in batches and diffusion model to denoise batches of latents with their own attention masks.</li>
<li><strong>Multimodal Document/Chart Data</strong>. During inferencing, we use Position Interpolation[xx] to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noist latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.
<li><strong>Multimodal Document/Chart Data</strong>. During inferencing, we use Position Interpolation[xx] to enable variable resolution sampling, despite training on a fixed 256x256 resolution. We downscale the position indices of the variable-resolution noisy latents from [0, seq_length-1] to [0, 255] to aligning them with the pretrained range. This adjustment enables the attention-based diffusion model to handle sequences of higher resolutions.
</li>
</ul> -->

Expand Down

0 comments on commit 26dd677

Please sign in to comment.