wikitext-2 is not available anymore #2247

huangjia2019 · 2024-03-26T14:33:23Z

🐛 Bug

Describe the bug

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip
This exception is thrown by iter of HTTPReaderIterDataPipe(skip_on_error=False, source_datapipe=OnDiskCacheHolderIterDataPipe, timeout=None)

To Reproduce Steps to reproduce the behavior:

from torchtext.datasets import WikiText2
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
from torch.utils.data import DataLoader, Dataset

tokenizer = get_tokenizer("basic_english")

train_iter = WikiText2(split='train')
valid_iter = WikiText2(split='valid')

def yield_tokens(data_iter):
for item in data_iter:
yield tokenizer(item)

vocab = build_vocab_from_iterator(yield_tokens(train_iter),
specials=["", "", ""])
vocab.set_default_index(vocab[""])

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Environment

Please copy and paste the output from our
environment collection script (or
fill out the checklist below manually).

You can get the script and run it with:

wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip

PyTorch Version (e.g., 1.0):
OS (e.g., Linux):
How you installed PyTorch (conda, pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

leedrake5 · 2024-05-30T20:02:53Z

Is there an alternate link we can get? The documentation here says:

import os
from functools import partial
from typing import Union, Tuple

from torchtext._internal.module_utils import is_module_available
from torchtext.data.datasets_utils import (
    _wrap_split_argument,
    _create_dataset_directory,
)

URL = "https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip"

MD5 = "542ccefacc6c27f945fb54453812b3cd"

... can we just find an alternate URL and change the function?

WenqiangZhang003 · 2024-07-05T01:22:02Z

Hi team, about this error, is there any solution now? We also encountered the same error.

WangX0111 · 2024-09-10T08:41:42Z

how can we change the URL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wikitext-2 is not available anymore #2247

wikitext-2 is not available anymore #2247

huangjia2019 commented Mar 26, 2024 •

edited

Loading

leedrake5 commented May 30, 2024

WenqiangZhang003 commented Jul 5, 2024

WangX0111 commented Sep 10, 2024

wikitext-2 is not available anymore #2247

wikitext-2 is not available anymore #2247

Comments

huangjia2019 commented Mar 26, 2024 • edited Loading

🐛 Bug

leedrake5 commented May 30, 2024

WenqiangZhang003 commented Jul 5, 2024

WangX0111 commented Sep 10, 2024

huangjia2019 commented Mar 26, 2024 •

edited

Loading