We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NonMatchingSplitsSizesError
While trying to load the dataset https://huggingface.co/datasets/pysentimiento/spanish-tweets-small, I get this error:
https://huggingface.co/datasets/pysentimiento/spanish-tweets-small
--------------------------------------------------------------------------- NonMatchingSplitsSizesError Traceback (most recent call last) [<ipython-input-1-d6a3c721d3b8>](https://localhost:8080/#) in <cell line: 3>() 1 from datasets import load_dataset 2 ----> 3 ds = load_dataset("pysentimiento/spanish-tweets-small") 3 frames [/usr/local/lib/python3.10/dist-packages/datasets/load.py](https://localhost:8080/#) in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs) 2150 2151 # Download and prepare data -> 2152 builder_instance.download_and_prepare( 2153 download_config=download_config, 2154 download_mode=download_mode, [/usr/local/lib/python3.10/dist-packages/datasets/builder.py](https://localhost:8080/#) in download_and_prepare(self, output_dir, download_config, download_mode, verification_mode, ignore_verifications, try_from_hf_gcs, dl_manager, base_path, use_auth_token, file_format, max_shard_size, num_proc, storage_options, **download_and_prepare_kwargs) 946 if num_proc is not None: 947 prepare_split_kwargs["num_proc"] = num_proc --> 948 self._download_and_prepare( 949 dl_manager=dl_manager, 950 verification_mode=verification_mode, [/usr/local/lib/python3.10/dist-packages/datasets/builder.py](https://localhost:8080/#) in _download_and_prepare(self, dl_manager, verification_mode, **prepare_split_kwargs) 1059 1060 if verification_mode == VerificationMode.BASIC_CHECKS or verification_mode == VerificationMode.ALL_CHECKS: -> 1061 verify_splits(self.info.splits, split_dict) 1062 1063 # Update the info object with the splits. [/usr/local/lib/python3.10/dist-packages/datasets/utils/info_utils.py](https://localhost:8080/#) in verify_splits(expected_splits, recorded_splits) 98 ] 99 if len(bad_splits) > 0: --> 100 raise NonMatchingSplitsSizesError(str(bad_splits)) 101 logger.info("All the splits matched successfully.") 102 NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=82649695458, num_examples=597433111, shard_lengths=None, dataset_name=None), 'recorded': SplitInfo(name='train', num_bytes=3358310095, num_examples=24898932, shard_lengths=[3626991, 3716991, 4036990, 3506990, 3676990, 3716990, 2616990], dataset_name='spanish-tweets-small')}]
I think I had this dataset updated, might be related to #6271
It is working fine as late in 2.10.0 , but not in 2.13.0 onwards.
2.10.0
2.13.0
from datasets import load_dataset ds = load_dataset("pysentimiento/spanish-tweets-small")
You can run it in this notebook
Load the dataset without any error
datasets
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the bug
While trying to load the dataset
https://huggingface.co/datasets/pysentimiento/spanish-tweets-small
, I get this error:I think I had this dataset updated, might be related to #6271
It is working fine as late in
2.10.0
, but not in2.13.0
onwards.Steps to reproduce the bug
You can run it in this notebook
Expected behavior
Load the dataset without any error
Environment info
datasets
version: 2.13.0The text was updated successfully, but these errors were encountered: