-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 501 (Not Implemented) Response when trying to load HF dataset #684
Comments
Hi here @jphme sorry for the inconvenience! I believe that may be due to the fact that your dataset is private and Hugging Face now set the Datasets Server on private repositories for Pro users only I'm afraid :/ |
Hi @jphme, we will work on something to avoid using the API. |
Ah, that might be it, thanks 👍 . We are even Corporate Pro users. But don't you think the current implementation is flawed when and all this for a nice-to-have-feature (don't having to load the dataset in full) that doesn't affect ~95+% (for us so far 100%) of usecases? ;-) Sorry don't want to sound too negative, I just spent the better part of an hour trying to figure out why this isn't working. |
Fair, indeed the message has already been updated as we noticed about this a couple days ago and should roll out in the next |
One solution could be to load the dataset in streaming mode and fetch a single row. Maybe you can get the features even without fetching a row. |
Sounds great, many thanks!. I think sometimes the tradeoff between optimizing performance + optimizing usability should be more in the direct of usability; but this is often very hard to know in advance... |
Closing this with #691, needs some more examples in the docs still. |
Distilabel Version 1.1.1.
Trying to load a private HF Dataset with
LoadHubDataset
.I get
I have HF_Token in my enf, the Bearer token shown is correct and I can load the dataset just fine with
load_dataset
.I REALLY think the hacky custom loading implementation should be disabled by default, this caused us so much headaches already... why don't add some opt-in streaming HF Dataset loading and just use load_dataset for everything else?
The text was updated successfully, but these errors were encountered: