Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper Writing: An overview issue #896

Closed
8 of 25 tasks
KennethEnevoldsen opened this issue Jun 10, 2024 · 25 comments
Closed
8 of 25 tasks

Paper Writing: An overview issue #896

KennethEnevoldsen opened this issue Jun 10, 2024 · 25 comments

Comments

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Jun 10, 2024

This issue is an overview issue for paper writing. For full discussion of what needs to be done check out #784. The intention for this issue is to make it easier for contributors to find places to write on as well as for us to guide them in the right direction and keep an overview.

How to discuss these segments:

  • Keep discussion about specific segments to the overleaf (but feel free to ping us on github)
  • if you want to discuss a section related to one of the point open a issue and link it here

Writing Sections:

Other concerns

  • Ensure that MMTEB is consistently named across
  • Overview figure: @x-tabdeveloping will upload the latest version to github + overleaf
  • Do we need something on cross-lingual tasks?
@KennethEnevoldsen KennethEnevoldsen mentioned this issue Jun 10, 2024
4 tasks
@KennethEnevoldsen KennethEnevoldsen changed the title [WIP] Paper Writing: An overview issue Paper Writing: An overview issue Jun 10, 2024
@gentaiscool
Copy link
Contributor

gentaiscool commented Jun 14, 2024

Hi @KennethEnevoldsen, thanks for the effort in organizing the paper overview. I'd like to assist in completing the related work section by incorporating recent papers to enhance its relevance. I agree that we need paraphrasing the initial segment and adding more distinct aspects to set our work apart from existing research. Additionally, I am aware of several large-scale collaborative projects that could be referenced in our paper to make the related work section more comprehensive. And, I was wondering to know on how we determine contribution points for paper writing. I am happy in general to help writing in any sections if needed.

@KennethEnevoldsen
Copy link
Contributor Author

Sounds wonderful I would be very happy if had the time to go over those sections. Feel free to ping me once you have done so.

Generally, we add points based on relative effort. Since most contributors have added datasets before, they have approximately encoded a points-to-effort ratio. We have the writer suggest points, and then, of course, we can discuss if it makes sense afterward.

This is of course, not a perfect system (but it is always hard to quantify contributions)

@gentaiscool
Copy link
Contributor

Thank you, @KennethEnevoldsen, for the explanation. I will review the entire paper and focus on the sections where I can contribute, particularly those that don't require waiting for experimental results.

@isaac-chung
Copy link
Collaborator

Not sure if we had discussed this before: would any of the language family groupings e.g. in #366 have a place in the paper? or would that require #837 to be completed first?

@MariyaTikhonova
Copy link
Contributor

Hi @KennethEnevoldsen, thanks for the effort in organizing the paper overview.

My colleagues and I, we'd like to help you with the paper writing, if our help is appreciated.

  1. We'd like to assist in completing the limitations and ethical consideration, if it is still actual.

  2. Besides, we could add basic information about the Russian-language datasets we contributed to MTEB, if needed. We could also provide model evaluation we carried out not long ago.

  3. On the final stages we could also contribute to the general paper correction (small typos, uniform model naming, etc.)

@KennethEnevoldsen
Copy link
Contributor Author

@MariyaTikhonova

  1. Sounds great

  2. Can you go over section B. If you have created dataset for the benchmarks then please add that to B3. You might create a new appendix on Benchmark Creation and describe the curation rationale for the Russian benchmark. For now results are not needed, but might be added in the future.

  3. Sounds lovely as well. I would go for 1 and 2 to start with.

@gowitheflow-1998
Copy link
Contributor

hi @KennethEnevoldsen, let me know if you need me to add information of RAR-b tasks to the paper and anything I can help with the paper writing in general!

@KennethEnevoldsen
Copy link
Contributor Author

@gowitheflow-1998 can I ask you to add a section in appendix B4?

@gowitheflow-1998
Copy link
Contributor

@KennethEnevoldsen Sure. Will do today!

@mariyahendriksen
Copy link
Contributor

hi everyone, I am done with the introduction part of the paper. I will start going over the remaining parts sequentially. Please let me know if there is any section/aspect I should pay additional attention to!

@mariyahendriksen
Copy link
Contributor

Hi all,

(cc @KennethEnevoldsen, @isaac-chung, @imenelydiaker)

Now that the paper has been submitted, should we consider posting it on arXiv? ICLR’s double-blind submission policy, similar to other major ML conferences, allows for preprints to be shared on arXiv.

Publishing the paper on arXiv could help with wider dissemination and potentially save us more than four months, which is especially important given how fast-paced the ML field is. Additionally, if reviewers suggest changes during the rebuttal phase, we can always update the arXiv version.

Let me know your thoughts! I’d be happy to assist with the process if we decide to move forward.

@isaac-chung
Copy link
Collaborator

I'm onboard with what Mariya suggested. For those who are curious it's under the "dual submission policy" https://iclr.cc/Conferences/2025/CallForPapers . In the double blind reviewing section: "Having papers on arxiv is allowed per the dual submission policy outlined below."

@KennethEnevoldsen
Copy link
Contributor Author

I completely agree, the hope is to have the leaderboard up and running before we publish the arxiv paper to have the highest possible impact on release. Let me know what you think about re. this?

@imenelydiaker
Copy link
Contributor

imenelydiaker commented Oct 9, 2024

I completely agree, the hope is to have the leaderboard up and running before we publish the arxiv paper to have the highest possible impact on release. Let me know what you think about re. this?

I think you can push it to arxiv before the leaderboard is up. I'm not sure we'll integrate screenshots of the leaderboard in the paper anyway, right ? Once the LB is ready, we can push twitter threads and linkedin posts about the paper.

@mariyahendriksen
Copy link
Contributor

I completely agree, the hope is to have the leaderboard up and running before we publish the arxiv paper to have the highest possible impact on release. Let me know what you think about re. this?

I think you can push it to arxiv before the leaderboard is up. I'm not sure we'll integrate screenshots of the leaderboard in the apper anyway, right ? Once the LB is ready, we can push twitter threads and linkedin posts about the paper.

Makes sense to me as well.

Posting the paper on arXiv could take up to a week, given the high submission volume. I’m happy to handle the process of getting the paper arXiv-ready and, once we have everyone’s approval, I can submit it. I recently went through the same process for another paper under review, so it’s still fresh in my mind. That said, if someone else prefers to manage this, I’m equally happy to pass it on!

Let me know what you think!

@KennethEnevoldsen
Copy link
Contributor Author

Thanks @mariyahendriksen. I think most of the stuff that needs to be done is on my end (eg final author list) - I agree that it would be it would be nice to have it available online as soon as possible.

@Muennighoff wdyt? Should we also include some additional models?

@Muennighoff
Copy link
Contributor

Great points; I think having the leaderboard ready first and also adding a few more models and then doing one social media push upon release would maximize impact. (I think there's a very low risk of getting "scooped" here in case people are worried about that)

@KennethEnevoldsen which models from the ones we discussed should I still run? I think some APIs i.e. voyage, openai etc would be great - I will ask them for credits.

@KennethEnevoldsen
Copy link
Contributor Author

I def. think the commercial API's voyage, cohere, OpenAI.

I was also thinking about moving this up to the main paper:
Screenshot 2024-10-01 at 22 59 26

Potentially with some edits (e.g. add individual points)

@Muennighoff
Copy link
Contributor

Muennighoff commented Oct 9, 2024

Okay will look into running them!

I think the plot is great though maybe it would benefit from

  • Ordering the legend models in the same way as the lines (not fully the case on the right one I think e.g. light blue is in a different spot)
  • It looks a bit like it is only the 4 languages depicted, maybe indicating the total number of languages plotted in the caption or adding it to the plot (e.g. a line marking top 10 or top 100 ; I guess adding individual points would also solve this)
  • Maybe making the vertical language lines dashed instead? (Since it is a moving average whatever appears before the vertical lines still impacts the borda afaik now, but the solid lines make it look a bit like a hard reset I think)

@Muennighoff
Copy link
Contributor

If someone has bandwidth to estimate the amount of credits from OpenAI we'd need, that'd be super useful. I think they're willing to sponsor, we just need to provide an estimate!

@KennethEnevoldsen
Copy link
Contributor Author

@Muennighoff something like this might work:

benchmarks = mteb.get_benchmarks()

total_characters = 0

for benchmark in benchmarks:
    n_characters = 0
    for task in benchmark.tasks:
        try:
            desc_stats = task.metadata.descriptive_stats

            for split in desc_stats["n_samples"]:
                n_samples = desc_stats["n_samples"][split]
                avg_char_leng = desc_stats["avg_character_length"][split]

                if task.metadata.type == "Retrieval":
                    n_characters = (
                        avg_char_leng["average_document_length"]
                        * avg_char_leng["num_documents"]
                        + avg_char_leng["average_query_length"]
                        * avg_char_leng["num_queries"]
                    )
                else:
                    n_characters += n_samples * avg_char_leng
        except Exception as e:
            print(f"Missing/incomplete descriptive stats for {task.metadata.name}: {e}")

    print(f"{benchmark.name}: {n_characters:,} characters")

    total_characters += n_characters


print(f"Total characters: {total_characters:,}")

Sadly we have a lot of incomplete descriptive_stats so currently the numbers are probably quite far off

@Muennighoff
Copy link
Contributor

Great I got 3701778834.0939293 characters from that! Should correspond to ~925444708.5234823 tokens (divided by 4) so around 1B tokens (though maybe more like 10B as some are missing) - Maybe useful to put the final character/token count or other inference stats in the paper 🤔

@Muennighoff
Copy link
Contributor

I added text-embedding-3-small results here: embeddings-benchmark/results#40
I will run text-embedding-3-large now, but would be interesting to already check if the results make sense and how it ranks vs the other models on MMTEB

@KennethEnevoldsen
Copy link
Contributor Author

Will look at getting it merged in then we can look at it on the new leaderboard

@KennethEnevoldsen
Copy link
Contributor Author

Closing this in favor of #1405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants