You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a dataset contains large images (e.g. 16384x16384 pixels), the metadata for those images don't get populated with the image info (width, heigth, channels etc) when running functions such as add_coco_labels, dataset.compute_metadata etc.
See slack thread: https://fiftyone-users.slack.com/archives/C0189A476EQ/p1714051750838069 @swheaton
Code to reproduce issue
dataset = fo.Dataset.from_dir(dataset_dir="path/to/image/folder", # folder needs to contain at least one very large image
dataset_type=fo.types.ImageDirectory,
name="name")
dataset.compute_metadata(skip_failures=False)
Computing metadata...
INFO:fiftyone.core.metadata:Computing metadata...
100% |█████████████████████| 6/6 [4.8ms elapsed, 0s remaining, 1.2K samples/s]
INFO:eta.core.utils: 100% |█████████████████████| 6/6 [4.8ms elapsed, 0s remaining, 1.2K samples/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[51], line 1
----> 1 dataset.compute_metadata(skip_failures=False)
File c:\Users\anaconda3\envs\openvino\Lib\site-packages\fiftyone\core\collections.py:2888, in SampleCollection.compute_metadata(self, overwrite, num_workers, skip_failures, warn_failures, progress)
2864 def compute_metadata(
2865 self,
2866 overwrite=False,
(...)
2870 progress=None,
2871 ):
2872 """Populates the ``metadata`` field of all samples in the collection.
2873
2874 Any samples with existing metadata are skipped, unless
(...)
2886 (None), or a progress callback function to invoke instead
2887 """
-> 2888 fomt.compute_metadata(
2889 self,
2890 overwrite=overwrite,
2891 num_workers=num_workers,
2892 skip_failures=skip_failures,
2893 warn_failures=warn_failures,
2894 progress=progress,
2895 )
File c:\Users\anaconda3\envs\openvino\Lib\site-packages\fiftyone\core\metadata.py:290, in compute_metadata(sample_collection, overwrite, num_workers, skip_failures, warn_failures, progress)
288 logger.warning(msg)
289 else:
--> 290 raise ValueError(msg)
ValueError: Failed to populate metadata on 6 samples. Use `dataset.exists("metadata", False)` to retrieve them
This is probably due to the use of the function fo.core.metadata.get_image_info, that uses the library PIL to read the image.
In fact, trying to run this function on one of the large images gives the following error:
DecompressionBombError Traceback (most recent call last)
Cell In[53], line 1
----> 1 fo.core.metadata.get_image_info("path/to/large/image.jpg")
File c:\Users\anaconda3\envs\openvino\Lib\site-packages\fiftyone\core\metadata.py:332, in get_image_info(f)
322 def get_image_info(f):
323 """Retrieves the dimensions and number of channels of the given image from
324 a file-like object that is streaming its contents.
325
(...)
330 ``(width, height, num_channels)``
331 """
--> 332 img = Image.open(f)
334 # Flip the dimensions if image metadata requires us to. PIL.Image doesn't
335 # handle by default.
336 if _image_has_flipped_dimensions(img):
File c:\Users\anaconda3\envs\openvino\Lib\site-packages\PIL\Image.py:3288, in open(fp, mode, formats)
3285 raise
3286 return None
-> 3288 im = _open_core(fp, filename, prefix, formats)
3290 if im is None and formats is ID:
3291 checked_formats = formats.copy()
File c:\Users\anaconda3\envs\openvino\Lib\site-packages\PIL\Image.py:3275, in open.<locals>._open_core(fp, filename, prefix, formats)
3273 fp.seek(0)
3274 im = factory(fp, filename)
-> 3275 _decompression_bomb_check(im.size)
3276 return im
3277 except (SyntaxError, IndexError, TypeError, struct.error):
3278 # Leave disabled by default, spams the logs with image
3279 # opening failures that are entirely expected.
3280 # logger.debug("", exc_info=True)
File c:\Users\anaconda3\envs\openvino\Lib\site-packages\PIL\Image.py:3183, in _decompression_bomb_check(size)
3178 if pixels > 2 * MAX_IMAGE_PIXELS:
3179 msg = (
3180 f"Image size ({pixels} pixels) exceeds limit of {2 * MAX_IMAGE_PIXELS} "
3181 "pixels, could be decompression bomb DOS attack."
3182 )
-> 3183 raise DecompressionBombError(msg)
3185 if pixels > MAX_IMAGE_PIXELS:
3186 warnings.warn(
3187 f"Image size ({pixels} pixels) exceeds limit of {MAX_IMAGE_PIXELS} pixels, "
3188 "could be decompression bomb DOS attack.",
3189 DecompressionBombWarning,
3190 )
DecompressionBombError: Image size (268435456 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.
This can be fixed by changing the PIL.Image.MAX_IMAGE_PIXELS default value to a large one (e.g. 1000000000)
System information
OS Platform and Distribution (e.g., Linux Ubuntu 22.04): Window 11
Python version (python --version): 3.11.7
FiftyOne version (fiftyone --version): v0.23.8
FiftyOne installed from (pip or source): pip
Other info/logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and
files should be attached. Please do not use screenshots for sharing text. Code
snippets should be used instead when providing tracebacks, logs, etc.
Willingness to contribute
The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?
Yes. I can contribute a fix for this bug independently
Yes. I would be willing to contribute a fix for this bug with guidance
from the FiftyOne community
No. I cannot contribute a bug fix at this time
The text was updated successfully, but these errors were encountered:
The DecompressionBombError is a safety feature to prevent Denial of Service attacks on a web server. If you are loading an image in fiftyone, it is probably safe to assume you are loading from a trusted source.
Additionally in metadata computation, we do not read the full image, just the header.
I propose this limit be patched (set then reset to default) to None within the metadata computation. This is because we wouldn't want to set this value broadly as it seems to be a global setting.
Instructions
Describe the problem
When a dataset contains large images (e.g. 16384x16384 pixels), the metadata for those images don't get populated with the image info (width, heigth, channels etc) when running functions such as
add_coco_labels
,dataset.compute_metadata
etc.See slack thread: https://fiftyone-users.slack.com/archives/C0189A476EQ/p1714051750838069
@swheaton
Code to reproduce issue
This is probably due to the use of the function
fo.core.metadata.get_image_info
, that uses the libraryPIL
to read the image.In fact, trying to run this function on one of the large images gives the following error:
This can be fixed by changing the
PIL.Image.MAX_IMAGE_PIXELS
default value to a large one (e.g. 1000000000)System information
python --version
): 3.11.7fiftyone --version
): v0.23.8Other info/logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and
files should be attached. Please do not use screenshots for sharing text. Code
snippets should be used instead when providing tracebacks, logs, etc.
Willingness to contribute
The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?
from the FiftyOne community
The text was updated successfully, but these errors were encountered: