Can not reproduce Virchow2 segmentation results #733

afilt · 2024-12-10T12:47:27Z

Hello !
I'm currently benchmarking some in-house models with eva, which is working very smoothly 🥇
I get quite low Dice scores on segmentation tasks however (even below Lunit) on Consep and MoNuSAC.
I tried to reproduce Virchow2 results on the screenshot shown here on your website to see whether my models are actually under-performing.

For Virchow2 on Consep I get a G-Dice 0.693 (0.001) instead of 0.723 on the screenshot.
For Virchow2 on MoNuSAC I get a G-Dice 0.594 (0.006) instead of 0.713 on the screenshot.
Which was eva version / main commit when results were generated ?

I'm using offline/segmentation configurations, should I switch to online/segmentation ? FYI, I didn't change the configurations at all and forked the repo last Saturday (commit d0f5a03). Thanks for your answer !

cc @ioangatop @roman807 Could you maybe run it on your side ?

To reproduce (directly taken from here):

TASK="consep"
MODEL_NAME="pathology/paige_virchow2" \
NORMALIZE_MEAN="[0.485,0.456,0.406]" \
NORMALIZE_STD="[0.229,0.224,0.225]" \
IN_FEATURES=1280 \
eva predict_fit --config configs/vision/pathology/offline/segmentation/${TASK}.yaml

The text was updated successfully, but these errors were encountered:

nkaenzig · 2024-12-23T12:05:04Z

Hi @afilt,

There are two reasons for the different results, both linked to recent changes:

In a recent PR we updated the dice metric, which to lower metric values in general, because the metric implementation/definition is slightly different, see Replace GeneralizedDiceScore by DiceScore & fix class-wise metrics #719. There is a pending PR to update the leaderboard in the docs, which hasn't been merged yet.
For segmentation, in the leaderboard we used the online configs i.e. eva fit --config configs/vision/pathology/online/segmentation/${TASK}.yaml, as for segmentation tasks we decided to also add the original image in addition to the last ViT feature map as input to the decoder, to make the evaluation less sensitive to the patch-size of the chosen ViT architecture.

In the meanwhile, you can either:
a. Continue using the version from main, run eva fit with the online config as mentioned in 2. above, and compare against the board in this PR: #734
b. Install version 0.1.6 of eva and use in conjunction with the .yaml configs before this PR was merged.

I'll open an issue to update the documentation & instructions to reproduce the segmentation metrics in the leaderboard, sorry for the confusion.

afilt · 2024-12-23T13:01:48Z

Hello @nkaenzig,

Thank you for taking the time and providing those details, it is very clear.
Considering your PR #734 will be merged in the coming days or weeks and will update the general main leaderboard, I will run the segmentation task with the current main and use online configurations (option "a"). Does it mean the command to reproduce the leaderboard results from PR #734 is simply (for instance for Virchow2):

TASK="consep"
MODEL_NAME="pathology/paige_virchow2" \
NORMALIZE_MEAN="[0.485,0.456,0.406]" \
NORMALIZE_STD="[0.229,0.224,0.225]" \
IN_FEATURES=1280 \
eva predict_fit --config configs/vision/pathology/online/segmentation/${TASK}.yaml

? Thanks a lot !

Last question: how much time do you expect the online configuration to run (e.g. for a ViT-Base) ?

nkaenzig · 2024-12-23T13:21:16Z

Hi @afilt,

Does it mean the command to reproduce the leaderboard results from PR #734 is simply (for instance for Virchow2):

Yes, almost, you just need to replace eva predict_fit by eva fit. For the online configs the predict step is not necessary, because the embeddings are generated on the fly ("online") during fit.

The runtimes depend a lot on the size of the ViT architecture & hardware being used. I'd guess that an online evaluation for consep for instance using a ViT-B16 shouldn't take more than 30min running on a A100.

afilt · 2024-12-23T13:52:00Z

Thank you @nkaenzig !
Regarding slide/tile classification, were there some major changes since main commit d0f5a03 (seems not) ? Just want to know if I should run the benchmarks once again. Thanks !

nkaenzig · 2024-12-23T13:53:17Z

No major changes for slide/tile classification, only segmentation :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not reproduce Virchow2 segmentation results #733

Can not reproduce Virchow2 segmentation results #733

afilt commented Dec 10, 2024 •

edited

Loading

nkaenzig commented Dec 23, 2024

afilt commented Dec 23, 2024 •

edited

Loading

nkaenzig commented Dec 23, 2024

afilt commented Dec 23, 2024

nkaenzig commented Dec 23, 2024

Can not reproduce Virchow2 segmentation results #733

Can not reproduce Virchow2 segmentation results #733

Comments

afilt commented Dec 10, 2024 • edited Loading

nkaenzig commented Dec 23, 2024

afilt commented Dec 23, 2024 • edited Loading

nkaenzig commented Dec 23, 2024

afilt commented Dec 23, 2024

nkaenzig commented Dec 23, 2024

afilt commented Dec 10, 2024 •

edited

Loading

afilt commented Dec 23, 2024 •

edited

Loading