You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to download Common Voice 17.0 hy-AM but it returns an error.
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_name='hfds_config', config_path=None)
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1429: FutureWarning: The repository for mozilla-foundation/common_voice_17_0 contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/mozilla-foundation/common_voice_17_0
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
Reading metadata...: 6180it [00:00, 133224.37it/s]les/s]
Generating train split: 0 examples [00:00, ? examples/s]
HuggingFace datasets failed due to some reason (stack trace below).
For certain datasets (eg: MCV), it may be necessary to login to the huggingface-cli (via `huggingface-cli login`).
Once logged in, you need to set `use_auth_token=True` when calling this script.
Traceback error for reference :
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1743, in _prepare_split_single
example = self.info.features.encode_example(record) if self.info.features is not None else record
File "/usr/local/lib/python3.10/dist-packages/datasets/features/features.py", line 1878, in encode_example
return encode_nested_example(self, example)
File "/usr/local/lib/python3.10/dist-packages/datasets/features/features.py", line 1243, in encode_nested_example
{
File "/usr/local/lib/python3.10/dist-packages/datasets/features/features.py", line 1243, in <dictcomp>
{
File "/usr/local/lib/python3.10/dist-packages/datasets/utils/py_utils.py", line 326, in zip_dict
yield key, tuple(d[key] for d in dicts)
File "/usr/local/lib/python3.10/dist-packages/datasets/utils/py_utils.py", line 326, in <genexpr>
yield key, tuple(d[key] for d in dicts)
KeyError: 'sentence_id'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/nemo/scripts/speech_recognition/convert_hf_dataset_to_nemo.py", line 358, in main
dataset = load_dataset(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2549, in load_dataset
builder_instance.download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1005, in download_and_prepare
self._download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1767, in _download_and_prepare
super()._download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1100, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1605, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1762, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
Steps to reproduce the bug
from datasets import load_dataset
cv_17 = load_dataset("mozilla-foundation/common_voice_17_0", "hy-AM")
Describe the bug
I want to download Common Voice 17.0 hy-AM but it returns an error.
Steps to reproduce the bug
Expected behavior
It works fine with common_voice_16_1
Environment info
datasets
version: 2.18.0huggingface_hub
version: 0.22.2fsspec
version: 2024.2.0The text was updated successfully, but these errors were encountered: