Why does Chronos-Bolt achieve significantly better results and performance compared to Chronos-T5 #231

shenxinyuan-hash · 2024-12-10T11:22:00Z

shenxinyuan-hash
Dec 10, 2024

Why does Chronos-Bolt achieve significantly better results and performance compared to Chronos-T5? What are the main contributing factors?

Answered by lostella

Dec 13, 2024

Regarding performance: while both model families rely on the T5 architecture under the hood, chronos-bolt models embed observations in the context in non-overlapping windows of multiple observations. This is the usual "patch-based" embedding that is used by other models, most notably PatchTST. This happens here in the code, and uses a patch length of 16 in the models we released: this effectively "compresses" the context length by a factor 16 in the embedding space, enabling much of the speedup. On the decoding side, instead of autoregressive generation the chronos-bolt models perform direct multi-step prediction for 9 quantiles, which is also faster.

Regarding accuracy, it is hard to say…

View full answer

lostella · 2024-12-13T12:25:26Z

lostella
Dec 13, 2024
Maintainer

Regarding performance: while both model families rely on the T5 architecture under the hood, chronos-bolt models embed observations in the context in non-overlapping windows of multiple observations. This is the usual "patch-based" embedding that is used by other models, most notably PatchTST. This happens here in the code, and uses a patch length of 16 in the models we released: this effectively "compresses" the context length by a factor 16 in the embedding space, enabling much of the speedup. On the decoding side, instead of autoregressive generation the chronos-bolt models perform direct multi-step prediction for 9 quantiles, which is also faster.

Regarding accuracy, it is hard to say which specific aspect of the model leads to most of the improvement. My insight is as follows: because of their architecture, chronos-bolt models are trained for quantile regression, using quantile loss. This is directly the task at which they are evaluated for the WQL experiments. On the other hand, chronos-t5 models output 20 samples by default (not so many, if you think about it), which already adds sampling noise. Quantile regression may even be more token-efficient at training time, in the sense that the model does not need to learn to output probabilities for 4096 classes like in the chronos-t5 models, but a much closer task to the downstream evaluation.

1 reply

shenxinyuan-hash Dec 16, 2024
Author

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does Chronos-Bolt achieve significantly better results and performance compared to Chronos-T5 #231

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Why does Chronos-Bolt achieve significantly better results and performance compared to Chronos-T5 #231

shenxinyuan-hash Dec 10, 2024

Replies: 1 comment · 1 reply

lostella Dec 13, 2024 Maintainer

shenxinyuan-hash Dec 16, 2024 Author

shenxinyuan-hash
Dec 10, 2024

Replies: 1 comment 1 reply

lostella
Dec 13, 2024
Maintainer

shenxinyuan-hash Dec 16, 2024
Author