Skip to content

Question of training on conversational datasets #284

Answered by rasbt
qibin0506 asked this question in Q&A
Discussion options

You must be logged in to vote

Is training loss, and I also think it was overfitting, but when I train on a large dataset, the loss may also be small, but the effect was poor.

Do you have a portion of the data for validation? How do the training/validation loss curves look like?

Yes, I am using BERT's tokenizer, and I pretrained on this dataset from scratch.

I think in this case the dataset of 10k conversational examples may be too small for pretraining. Unless you used a different dataset for pretraining?

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@rasbt
Comment options

Answer selected by qibin0506
@qibin0506
Comment options

@rasbt
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
question Further information is requested
2 participants
Converted from issue

This discussion was converted from issue #283 on July 24, 2024 11:49.