How to implement late chunking when my context limit is more than 8192 tokens? #2

venkatana-kore · 2024-09-10T08:31:43Z

Jina.ai support a token limit of 8192 for generating the embeddings. For late chunking if my context is more than 8192, then what are the best strategies to implement late chunking?

guenthermi · 2024-09-10T10:24:42Z

I think if you have very long documents, not all of the context might be necessary. So if you can split the text into chapters or longer sections, there might be enough context for the embedding model to interpret all of the tokens correctly. Otherwise you can also pass a bit more text before and after the first chunk yu are interested. Maybe als adding summaries before the text chunks could further improve it, but I haven't tried something like this.

guenthermi · 2024-10-04T11:19:01Z

Now we have implemented a strategy that uses overlapping macro chunks to solve this problem. Just set --long-late-chunking-embed-size to the maximum context length of the model that you are using and it will automatically use this strategy.

Here is the argument in the script:

late-chunking/run_chunked_eval.py

Lines 59 to 64 in 1ea2c66

    
           @click.option( 
        
               '--long-late-chunking-embed-size', 
        
               default=DEFAULT_LONG_LATE_CHUNKING_EMBED_SIZE, 
        
               type=int, 
        
               help='Token length of the embeddings that come before/after soft boundaries (i.e. overlapping embeddings). Above zero, overlap is used between neighbouring embeddings.', 
        
           )

For more information how it works, take a look at Section 3.1 in the paper: https://arxiv.org/pdf/2409.04701

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to implement late chunking when my context limit is more than 8192 tokens? #2

How to implement late chunking when my context limit is more than 8192 tokens? #2

venkatana-kore commented Sep 10, 2024

guenthermi commented Sep 10, 2024

guenthermi commented Oct 4, 2024

How to implement late chunking when my context limit is more than 8192 tokens? #2

How to implement late chunking when my context limit is more than 8192 tokens? #2

Comments

venkatana-kore commented Sep 10, 2024

guenthermi commented Sep 10, 2024

guenthermi commented Oct 4, 2024