Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effect of FIM on StarCoder pre-training #138

Open
gojkoc54 opened this issue Sep 6, 2023 · 2 comments
Open

Effect of FIM on StarCoder pre-training #138

gojkoc54 opened this issue Sep 6, 2023 · 2 comments

Comments

@gojkoc54
Copy link

gojkoc54 commented Sep 6, 2023

Hi!

Curious to know some more details about FIM and its effect on the pre-trained model.
Here's a paragraph from the SantaCoder paper:

FIM for cheap
We observe a minor drop in performance of the FIM model compared to the No-FIM model. Specifically, we see that the pass@100 performance of the FIM model is 2-4% lower on HumanEval and 1% lower on MBPP. While Bavarian et al. (2022) presented evidence for the existence of a FIM-for-free property (i.e., arguing that autoregressive models can be trained with FIM without harming left-to-right capabilities), we do find a small but consistent drop of FIM models on left-to-right text2code benchmarks.

  1. Was a similar analysis carried out on StarCoder?
  2. Was StarCoder pre-trained on a 50-50 split between FIM and next-token data? (as indicated in this Megatron script)
@loubnabnl
Copy link
Contributor

loubnabnl commented Nov 15, 2023

Hello, we didn't perform the ablation for StarCoder given the amount of compute it requires for training, but you can check the CodeLLama paper where the authors observed similar behavior at different scales.

Regarding FIM percentage, we used 50%.

@yiyepiaoling0715
Copy link

Hello, we didn't perform the ablation for StarCoder given the amount of compute it requires for training, but you can check the CodeLLama paper where the authors observed similar behavior at different scales.

Regarding FIM percentage, we used 50%.

i have a question, as the known ratio, many eval ratios drop because of fim under pretrain stage, why you still use fim with 50% percentage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants