Integrate ChemTEB #1585

Muennighoff · 2024-12-13T01:02:15Z

Muennighoff · 2024-12-13T01:06:04Z

Since it is a fork (https://github.com/basf/chemteb) it should be relatively easy to integrate ; cc @HSILA in case you are interested in opening a PR :)

HSILA · 2024-12-24T00:14:28Z

Thank you for recognizing our work on ChemTEB, and apologies for the delayed response. I can complete the tasks' metadata and open a pull request. ChemTEB currently has over 35 tasks; is it okay to integrate all of them? The performance in bitext mining tasks is around zero. I think I should exclude them so they don't affect the models' average scores. What do you think?

Also, a quick question: in PairClassification tasks, we can have a task with multiple subsets (for example, in LegalBenchPC). Is it possible to do so for classification tasks? (I want to merge some of them.)

Muennighoff · 2024-12-24T02:32:24Z

Thanks for getting back! That would be amazing! I think all of them are fine as long as the Bitext Mining 0 performance is due to models being bad and not the task being unsolvable/random. (cc @KennethEnevoldsen in case of thoughts)

Is it possible to do so for classification tasks ?

Sounds possible to me but not sure about the details 🤔

HSILA · 2024-12-24T05:04:19Z

Thank you for your encouraging words. Regarding the Bitext Mining tasks (and some PairClassification tasks), the performance around zero is likely because they involve matching chemical compound names, descriptions, or formulas with their corresponding SMILES codes. These are highly domain-specific challenges that general-purpose embedding models don’t seem to be trained to handle. While they are not entirely random, they appear unsolvable by generic models.

Muennighoff · 2024-12-24T05:06:40Z

I see; I think these are fine to have then! Probably of high interest for people training chemistry-specific embedding models!

isaac-chung added the new-benchmark label Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate ChemTEB #1585

Integrate ChemTEB #1585

Muennighoff commented Dec 13, 2024

Muennighoff commented Dec 13, 2024

HSILA commented Dec 24, 2024

Muennighoff commented Dec 24, 2024

HSILA commented Dec 24, 2024

Muennighoff commented Dec 24, 2024

Integrate ChemTEB #1585

Integrate ChemTEB #1585

Comments

Muennighoff commented Dec 13, 2024

Muennighoff commented Dec 13, 2024

HSILA commented Dec 24, 2024

Muennighoff commented Dec 24, 2024

HSILA commented Dec 24, 2024

Muennighoff commented Dec 24, 2024