-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paper Writing: An overview issue #896
Comments
Hi @KennethEnevoldsen, thanks for the effort in organizing the paper overview. I'd like to assist in completing the related work section by incorporating recent papers to enhance its relevance. I agree that we need paraphrasing the initial segment and adding more distinct aspects to set our work apart from existing research. Additionally, I am aware of several large-scale collaborative projects that could be referenced in our paper to make the related work section more comprehensive. And, I was wondering to know on how we determine contribution points for paper writing. I am happy in general to help writing in any sections if needed. |
Sounds wonderful I would be very happy if had the time to go over those sections. Feel free to ping me once you have done so. Generally, we add points based on relative effort. Since most contributors have added datasets before, they have approximately encoded a points-to-effort ratio. We have the writer suggest points, and then, of course, we can discuss if it makes sense afterward. This is of course, not a perfect system (but it is always hard to quantify contributions) |
Thank you, @KennethEnevoldsen, for the explanation. I will review the entire paper and focus on the sections where I can contribute, particularly those that don't require waiting for experimental results. |
Hi @KennethEnevoldsen, thanks for the effort in organizing the paper overview. My colleagues and I, we'd like to help you with the paper writing, if our help is appreciated.
|
|
hi @KennethEnevoldsen, let me know if you need me to add information of RAR-b tasks to the paper and anything I can help with the paper writing in general! |
@gowitheflow-1998 can I ask you to add a section in appendix B4? |
@KennethEnevoldsen Sure. Will do today! |
hi everyone, I am done with the introduction part of the paper. I will start going over the remaining parts sequentially. Please let me know if there is any section/aspect I should pay additional attention to! |
Hi all, (cc @KennethEnevoldsen, @isaac-chung, @imenelydiaker) Now that the paper has been submitted, should we consider posting it on arXiv? ICLR’s double-blind submission policy, similar to other major ML conferences, allows for preprints to be shared on arXiv. Publishing the paper on arXiv could help with wider dissemination and potentially save us more than four months, which is especially important given how fast-paced the ML field is. Additionally, if reviewers suggest changes during the rebuttal phase, we can always update the arXiv version. Let me know your thoughts! I’d be happy to assist with the process if we decide to move forward. |
I'm onboard with what Mariya suggested. For those who are curious it's under the "dual submission policy" https://iclr.cc/Conferences/2025/CallForPapers . In the double blind reviewing section: "Having papers on arxiv is allowed per the dual submission policy outlined below." |
I completely agree, the hope is to have the leaderboard up and running before we publish the arxiv paper to have the highest possible impact on release. Let me know what you think about re. this? |
I think you can push it to arxiv before the leaderboard is up. I'm not sure we'll integrate screenshots of the leaderboard in the paper anyway, right ? Once the LB is ready, we can push twitter threads and linkedin posts about the paper. |
Makes sense to me as well. Posting the paper on arXiv could take up to a week, given the high submission volume. I’m happy to handle the process of getting the paper arXiv-ready and, once we have everyone’s approval, I can submit it. I recently went through the same process for another paper under review, so it’s still fresh in my mind. That said, if someone else prefers to manage this, I’m equally happy to pass it on! Let me know what you think! |
Thanks @mariyahendriksen. I think most of the stuff that needs to be done is on my end (eg final author list) - I agree that it would be it would be nice to have it available online as soon as possible. @Muennighoff wdyt? Should we also include some additional models? |
Great points; I think having the leaderboard ready first and also adding a few more models and then doing one social media push upon release would maximize impact. (I think there's a very low risk of getting "scooped" here in case people are worried about that) @KennethEnevoldsen which models from the ones we discussed should I still run? I think some APIs i.e. voyage, openai etc would be great - I will ask them for credits. |
Okay will look into running them! I think the plot is great though maybe it would benefit from
|
If someone has bandwidth to estimate the amount of credits from OpenAI we'd need, that'd be super useful. I think they're willing to sponsor, we just need to provide an estimate! |
@Muennighoff something like this might work:
Sadly we have a lot of incomplete descriptive_stats so currently the numbers are probably quite far off |
Great I got 3701778834.0939293 characters from that! Should correspond to ~925444708.5234823 tokens (divided by 4) so around 1B tokens (though maybe more like 10B as some are missing) - Maybe useful to put the final character/token count or other inference stats in the paper 🤔 |
I added text-embedding-3-small results here: embeddings-benchmark/results#40 |
Will look at getting it merged in then we can look at it on the new leaderboard |
Closing this in favor of #1405 |
This issue is an overview issue for paper writing. For full discussion of what needs to be done check out #784. The intention for this issue is to make it easier for contributors to find places to write on as well as for us to guide them in the right direction and keep an overview.
How to discuss these segments:
Writing Sections:
Other concerns
The text was updated successfully, but these errors were encountered: