Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fiftyone fails to create the metadata for large images #4313

Open
1 of 3 tasks
fgraffitti-cyberhawk opened this issue Apr 26, 2024 · 1 comment
Open
1 of 3 tasks
Labels
bug Bug fixes

Comments

@fgraffitti-cyberhawk
Copy link

fgraffitti-cyberhawk commented Apr 26, 2024

Instructions

Describe the problem

When a dataset contains large images (e.g. 16384x16384 pixels), the metadata for those images don't get populated with the image info (width, heigth, channels etc) when running functions such as add_coco_labels, dataset.compute_metadata etc.
See slack thread: https://fiftyone-users.slack.com/archives/C0189A476EQ/p1714051750838069
@swheaton

Code to reproduce issue

dataset = fo.Dataset.from_dir(dataset_dir="path/to/image/folder", # folder needs to contain at least one very large image 
                                                dataset_type=fo.types.ImageDirectory,
                                                name="name")

dataset.compute_metadata(skip_failures=False)
Computing metadata...
INFO:fiftyone.core.metadata:Computing metadata...
 100% |█████████████████████| 6/6 [4.8ms elapsed, 0s remaining, 1.2K samples/s] 
INFO:eta.core.utils: 100% |█████████████████████| 6/6 [4.8ms elapsed, 0s remaining, 1.2K samples/s] 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[51], line 1
----> 1 dataset.compute_metadata(skip_failures=False)

File c:\Users\anaconda3\envs\openvino\Lib\site-packages\fiftyone\core\collections.py:2888, in SampleCollection.compute_metadata(self, overwrite, num_workers, skip_failures, warn_failures, progress)
   2864 def compute_metadata(
   2865     self,
   2866     overwrite=False,
   (...)
   2870     progress=None,
   2871 ):
   2872     """Populates the ``metadata`` field of all samples in the collection.
   2873 
   2874     Any samples with existing metadata are skipped, unless
   (...)
   2886             (None), or a progress callback function to invoke instead
   2887     """
-> 2888     fomt.compute_metadata(
   2889         self,
   2890         overwrite=overwrite,
   2891         num_workers=num_workers,
   2892         skip_failures=skip_failures,
   2893         warn_failures=warn_failures,
   2894         progress=progress,
   2895     )

File c:\Users\anaconda3\envs\openvino\Lib\site-packages\fiftyone\core\metadata.py:290, in compute_metadata(sample_collection, overwrite, num_workers, skip_failures, warn_failures, progress)
    288     logger.warning(msg)
    289 else:
--> 290     raise ValueError(msg)

ValueError: Failed to populate metadata on 6 samples. Use `dataset.exists("metadata", False)` to retrieve them

This is probably due to the use of the function fo.core.metadata.get_image_info, that uses the library PIL to read the image.
In fact, trying to run this function on one of the large images gives the following error:

fo.core.metadata.get_image_info("path/to/large/image.jpg")
DecompressionBombError                    Traceback (most recent call last)
Cell In[53], line 1
----> 1 fo.core.metadata.get_image_info("path/to/large/image.jpg")

File c:\Users\anaconda3\envs\openvino\Lib\site-packages\fiftyone\core\metadata.py:332, in get_image_info(f)
    322 def get_image_info(f):
    323     """Retrieves the dimensions and number of channels of the given image from
    324     a file-like object that is streaming its contents.
    325 
   (...)
    330         ``(width, height, num_channels)``
    331     """
--> 332     img = Image.open(f)
    334     # Flip the dimensions if image metadata requires us to. PIL.Image doesn't
    335     #   handle by default.
    336     if _image_has_flipped_dimensions(img):

File c:\Users\anaconda3\envs\openvino\Lib\site-packages\PIL\Image.py:3288, in open(fp, mode, formats)
   3285             raise
   3286     return None
-> 3288 im = _open_core(fp, filename, prefix, formats)
   3290 if im is None and formats is ID:
   3291     checked_formats = formats.copy()

File c:\Users\anaconda3\envs\openvino\Lib\site-packages\PIL\Image.py:3275, in open.<locals>._open_core(fp, filename, prefix, formats)
   3273         fp.seek(0)
   3274         im = factory(fp, filename)
-> 3275         _decompression_bomb_check(im.size)
   3276         return im
   3277 except (SyntaxError, IndexError, TypeError, struct.error):
   3278     # Leave disabled by default, spams the logs with image
   3279     # opening failures that are entirely expected.
   3280     # logger.debug("", exc_info=True)

File c:\Users\anaconda3\envs\openvino\Lib\site-packages\PIL\Image.py:3183, in _decompression_bomb_check(size)
   3178 if pixels > 2 * MAX_IMAGE_PIXELS:
   3179     msg = (
   3180         f"Image size ({pixels} pixels) exceeds limit of {2 * MAX_IMAGE_PIXELS} "
   3181         "pixels, could be decompression bomb DOS attack."
   3182     )
-> 3183     raise DecompressionBombError(msg)
   3185 if pixels > MAX_IMAGE_PIXELS:
   3186     warnings.warn(
   3187         f"Image size ({pixels} pixels) exceeds limit of {MAX_IMAGE_PIXELS} pixels, "
   3188         "could be decompression bomb DOS attack.",
   3189         DecompressionBombWarning,
   3190     )

DecompressionBombError: Image size (268435456 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.

This can be fixed by changing the PIL.Image.MAX_IMAGE_PIXELS default value to a large one (e.g. 1000000000)

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 22.04): Window 11
  • Python version (python --version): 3.11.7
  • FiftyOne version (fiftyone --version): v0.23.8
  • FiftyOne installed from (pip or source): pip

Other info/logs

Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and
files should be attached. Please do not use screenshots for sharing text. Code
snippets should be used instead when providing tracebacks, logs, etc.

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?

  • Yes. I can contribute a fix for this bug independently
  • Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community
  • No. I cannot contribute a bug fix at this time
@fgraffitti-cyberhawk fgraffitti-cyberhawk added the bug Bug fixes label Apr 26, 2024
@swheaton
Copy link
Contributor

@fgraffitti-cyberhawk thanks!

The DecompressionBombError is a safety feature to prevent Denial of Service attacks on a web server. If you are loading an image in fiftyone, it is probably safe to assume you are loading from a trusted source.
Additionally in metadata computation, we do not read the full image, just the header.

I propose this limit be patched (set then reset to default) to None within the metadata computation. This is because we wouldn't want to set this value broadly as it seems to be a global setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

No branches or pull requests

2 participants