Clean up `reduce_labels` for image processors #30799

qubvel · 2024-05-14T12:04:23Z

What does this PR do?

Deprecate reduce_labels for Mask2FormerImageProcessor (it is already deprecated for all other image processors)
Remove deprecated reduce_labels for other segmentation models
Update examples and docs with do_reduce_labels

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-05-14T12:52:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for working on this - great to see more alignment between the image processors.

Only comment is to not raise an exception if reduce_labels is passed in

amyeroberts · 2024-05-15T15:30:40Z

src/transformers/models/beit/image_processing_beit.py

-            warnings.warn(
-                "The `reduce_labels` parameter is deprecated and will be removed in a future version. Please use"
-                " `do_reduce_labels` instead.",
-                FutureWarning,
-            )
-            do_reduce_labels = kwargs.pop("reduce_labels")
+            raise ValueError("The `reduce_labels` parameter has been deprecated. Use `do_reduce_labels` instead.")


Sadly we can't do this directly for two reasons:

A deprecation version wasn't specified in the message (my bad)

This is going to break a lot of things for users

We absolutely want to push users to try and use the correct argument, and we should update this value on all of the official checkpoints. However, because this is from the config, which users might not have control over or know how to change; and there might be hundreds of configs with the old value on and off the hub, we're at risk of breaking code and losing users.

What I'd suggest here is:

We update the warning message to specify a specific version (typically two versions after the release version this commit will be included in).

After this version, we remove the warning. We silently handle this to maintain backwards compatibility i.e. popping the value, and make sure all official references to the flag are removed; we don't guarantee any future maintenance for this value.

Yes, that's breaking changes...

The idea was to support reading old configs overriding from_dict method with

@classmethod def from_dict(cls, image_processor_dict: Dict[str, Any], **kwargs): ... if "reduce_labels" in image_processor_dict: image_processor_dict["do_reduce_labels"] = image_processor_dict.pop("reduce_labels")

Here we silently replace reduce_labels with do_reduce_labels.

By raising an exception in __init__, we prevent users from creating new image_processor with deprecated parameters. That might break some fine-tuning scripts, which the user should have control of.

Does it make sense, or am I missing some use cases?
Are such kinds of breaking changes possible only over major versions like v4->v5?

Here we silently replace reduce_labels with do_reduce_labels.

This is great - we definitely want to do this! It means any old config, if saved out, will be fully compatible.

Are such kinds of breaking changes possible only over major versions like v4->v5?

Kind of, it's more to do with the type and scale of breaking change, and the controls users have over it.

We do deprecate methods, flags, etc. within the codebase semi-regularly. And we do tolerate a certain amount of breaking changes. For example, in musicgen recently, training was enabled. This technically created a breaking change, as one of the models previously returned a loss. The update meant that anyone using the model would see the loss suddenly change. However, as the previous loss was incorrect and the model use relatively low this was deemed OK.

There's two main reasons why we wouldn't want to raise an exception here at the moment:

Ability to change the code behaviour. Users don't always have the option of updating the config: they might not have permissions to modify on the hub.

Awareness of configs. Many users are not familiar with the configs. It's easier to deprecate calls to e.g. object.method(foo, bar) as we emit a warning, and the user just removes that line of code. Changing the config is less obvious.

The reason I think this is because when we first tried to deprecate flags for the image processors, we got quite a few messages from users either complaining or confused about why they were happening. I'm yet to see a single config updated because of them!

It is something we could do in v5, although we may not want to.

Thanks a lot for the clarification, I will summarize PR just to make sure we are on the same page.

There are 3 most popular ways to create an image processor with deprecated parameters (sorted by popularity, high->low, IMHO)

Method Is BC in current PR

image_processor = ImageProcessor.from_pretrained(
"repo/model_with_deprecated_params"
) ✅ Yes
Because the user does not
have control of it (by overriding from_dict())

image_processor = ImageProcessor.from_pretrained(
"repo/model_with_deprecated_params",
reduce_labels=True,
) 🟥 No, raise error
Because it is explicit parameter setting

image_processor = ImageProcessor(
...
reduce_labels=True,
) 🟥 No, raise error
Because it is explicit parameter setting

I hope this approach:

Pushes the user to update their codebase, which they have control over, while still allowing using previously created models.

Will allow us to gradually move backward compatibility for deprecated params only in one method from_dict (ideally), making code easier to support.

Probably, we can also get rid of **kwargs, have a strict set of arguments, and replace checks with a decorator for renamed/removed arguments:

@depecated("reduce_labels", replace_with="do_reduce_labels", from="4.42.0", to="4.45.0") def preprocess(...):

The decorator will rename the argument and raise a warning in between, and then we can remove it with automated checks, or leave it for a while to raise an explicit error.
At the moment, because of **kwargs, anyone might easily make a typo in an argument name and it will be silently passed, leading to unexpected behavior that is hard to debug.
As a simple example:

from transformers import DetrImageProcessor image_processor = DetrImageProcessor.from_pretrained( "facebook/detr-resnet-50", do_nomalize=False, <----- here set as False ) print(image_processor) # DetrImageProcessor { # ... # "do_convert_annotations": true, # "do_normalize": true, <----- but we get True # "do_pad": true, # "do_rescale": true, # "do_resize": true, # ... # }

We can test this strategy with some particular models, not super popular, just to see how it works.

I might overestimate the problem or underestimate risks, user behavior, or edge cases. You have much more experience maintaining the library, so let me know if any of these thoughts are valuable. And thank you once again for your clarifications with examples, it is much easier to follow 🤗

100% aligned with you comment above, thanks for taking the time to write this up so clearly.

I really like the idea of a decorator. Let's ask @molbap his opinion on this, as he's been handling a lot of these considerations across processors and image processor.

OOO for a few days but checked and it's an interesting take. I like the decorator idea too. I wholeheartedly agree with

At the moment, because of **kwargs, anyone might easily make a typo in an argument name and it will be silently passed, leading to unexpected behavior that is hard to debug.

That is almost always true. That's why getting rid of **kwargs is impossible, as we need to be able to handle arbitrary inputs without breaking runtime in situations where people have non-canonical configs.

Hence the point of having a list of valid arguments: that way, we can at least raise a warning and list the arguments passed by a user that are not doing anything. In your case, if do_normalize is part of the valid image processor arguments but do_nomalize (typo) is not, then we can raise a clear warning saying that do_nomalize isn't used anywhere.
We can even push this a bit and do a quick levenshtein check conditioning a perhaps you made a typo? message. That might be too much.

For the decorator option to mark args as deprecated, LGTM too! Less loc, and forces us to write a version number. I'd say the "from" is not necessarily useful as we want to encourage using the latest version, imo.
Thanks for looking into this @qubvel! Kind of related is #30511 that I want to merge ASAP, didn't have time to get back to it with but it's closely connected to this problem.

Thanks for looking into this @molbap

Probably we can use a function signature in decorator to filter not-used kwargs instead of passing them.
What do you think about the following?

import functools import inspect import warnings import numpy as np def filter_not_used_arguments(func): """Filter out named arguments that are not in the function signature.""" @functools.wraps(func) def wrapper(*args, **kwargs): sig = inspect.signature(func) function_named_args = sig.parameters.keys() valid_kwargs = {k: v for k, v in kwargs.items() if k in function_named_args} invalid_kwargs = {k: v for k, v in kwargs.items() if k not in function_named_args} if invalid_kwargs: warnings.warn(f"Unused named arguments: {', '.join(invalid_kwargs.keys())}") return func(*args, **valid_kwargs) return wrapper class ImageProcessor: @filter_not_used_arguments def preprocess(self, image, do_normalize=False, do_reduce_labels=False): pass image_processor = ImageProcessor() image = np.ones((100, 100, 3)) # passing invalid `do_nomalize` instead of `do_normalize` image_processor.preprocess(image, do_nomalize=True, do_reduce_labels=True) # UserWarning: Unused named arguments: do_nomalize

This will simplify validation, we will not need to store the list of available arguments

Make validation method-specific instead of class-specific

While this is backward compatible and we do not raise an error here, IDE will help users and force users to update parameters if no **kwargs is specified for the function

That looks like a great solution :) looks very good to me

src/transformers/models/mask2former/image_processing_mask2former.py

qubvel · 2024-05-24T19:47:05Z

src/transformers/models/maskformer/image_processing_maskformer.py

@@ -406,38 +412,18 @@ def __init__(
        image_std: Union[float, List[float]] = None,
        ignore_index: Optional[int] = None,
        do_reduce_labels: bool = False,
+        num_labels: Optional[int] = None,


Here, in Mask2Former, and OneFormer: num_labels is not used somewhere across code of image_processor, however it is widely used in tests. I added it explicitly for backward compatibility, in case some pipelines rely on that

Also could be passed in **kwargs and excluded from filter

@filter_out_non_signature_kwargs(extra=["max_size", "num_labels"])

qubvel · 2024-05-28T11:00:18Z

@molbap @amyeroberts I made a new iteration here. Added two decorators

@deprecate_kwarg("reduce_labels", new_name="do_reduce_labels", version="4.41.0")

rename argument if new_name is provided
raise warning if current version < version specified, then just do it silently

filter_out_non_signature_kwargs(extra=["max_size"])

filter out all named args that are not in the function signature and not specified in extra. As mentioned by @molbap it is not always possible to get rid of all **kwargs (for example, it might include complex logic of parameters interaction).

Please, have a look, feedback is appreciated 🤗

amyeroberts

Really beautiful work - this could be very useful across the library and will help kill so much repeated code ❤️

Only major comment is about the new files src/transformers/utils/kwargs_validation.py and src/transformers/utils/deprecation.py, and whether we want to add these whole new modules into utils.

Personally, I think src/transformers/utils/deprecation.py makes sense. For filter_out_non_signature_kwargs, I'd place it under (the admittedly badly named) utils/generic.py for now

Would be great to get a second opinion on this from @molbap!

amyeroberts · 2024-05-28T14:55:09Z

src/transformers/utils/deprecation.py

nit - missing copyright header

amyeroberts · 2024-05-28T14:55:19Z

src/transformers/utils/kwargs_validation.py

nit - missing copyright header

amyeroberts · 2024-05-28T14:57:41Z

src/transformers/utils/deprecation.py

+        old_name (str): name of the deprecated keyword argument
+        version (str): version when the keyword argument was (will be) deprecated
+        new_name (Optional[str], optional): new name of the keyword argument. Defaults to None.
+        raise_if_ge_version (bool, optional): raise ValueError if deprecated version is greater or equal to current version. Defaults to False.
+        raise_if_both_names (bool, optional): raise ValueError if both deprecated and new keyword arguments are set. Defaults to False.


nit - convention in the library is to not mention default value if it is None. If it controls specific behaviour, we normally refer to it being set or unset

Suggested change

old_name (str): name of the deprecated keyword argument

version (str): version when the keyword argument was (will be) deprecated

new_name (Optional[str], optional): new name of the keyword argument. Defaults to None.

raise_if_ge_version (bool, optional): raise ValueError if deprecated version is greater or equal to current version. Defaults to False.

raise_if_both_names (bool, optional): raise ValueError if both deprecated and new keyword arguments are set. Defaults to False.

old_name (`str`): name of the deprecated keyword argument

version (`str`): version when the keyword argument was (will be) deprecated

new_name (`Optional[str]`, *optional*): new name of the keyword argument.

raise_if_ge_version (`bool`, *optional*, defaults to `False`): raise ValueError if deprecated version is greater or equal to current version.

raise_if_both_names (`bool`, *optional*, defaults to `False`): raise ValueError if both deprecated and new keyword arguments are set.

amyeroberts · 2024-05-28T16:30:21Z

examples/pytorch/semantic-segmentation/run_semantic_segmentation.py

@@ -108,7 +108,7 @@ class DataTrainingArguments:
            )
        },
    )
-    reduce_labels: Optional[bool] = field(
+    do_reduce_labels: Optional[bool] = field(


For the script, we'll need to deprecate reduce_labels e.g. like here

amyeroberts · 2024-05-28T16:30:46Z

examples/pytorch/semantic-segmentation/run_semantic_segmentation_no_trainer.py

@@ -86,7 +86,7 @@ def parse_args():
        default="segments/sidewalk-semantic",
    )
    parser.add_argument(
-        "--reduce_labels",
+        "--do_reduce_labels",


Same here re deprecation

amyeroberts · 2024-05-28T16:35:15Z

src/transformers/utils/deprecation.py

+    new_name: Optional[str] = None,
+    raise_if_ge_version: bool = False,
+    raise_if_both_names: bool = False,
+    add_message: Optional[str] = None,


nit - could we call this additional_message? add_message reads as a bool to me (don't feel strongly about this, so happy if you decide to stick with the original)

amyeroberts · 2024-05-28T16:41:38Z

src/transformers/utils/deprecation.py

+
+            # raise error or notify user
+            if minimum_action == Action.RAISE:
+                raise ValueError(message)
+            elif minimum_action == Action.NOTIFY and raise_if_ge_version and is_already_deprecated:
+                raise ValueError(message)


nit - I'd separate this out between setting the minimum action and the raise/notify logic, just to make it a bit clearer

Suggested change

# raise error or notify user

if minimum_action == Action.RAISE:

raise ValueError(message)

elif minimum_action == Action.NOTIFY and raise_if_ge_version and is_already_deprecated:

raise ValueError(message)

if minimum_action == Action.NOTIFY and raise_if_ge_version and is_already_deprecated:

miminum_action = Action.RAISE

# raise error or notify user

if minimum_action == Action.RAISE:

raise ValueError(message)

amyeroberts · 2024-05-28T16:46:57Z

src/transformers/utils/kwargs_validation.py

+from typing import Optional
+
+
+def filter_out_non_signature_kwargs(extra: Optional[list] = None):


qubvel changed the title ~~Clean up do_reduce_labels for image processors~~ Clean up reduce_labels for image processors May 14, 2024

qubvel requested a review from amyeroberts May 14, 2024 14:09

amyeroberts reviewed May 15, 2024

View reviewed changes

qubvel marked this pull request as draft May 16, 2024 13:02

qubvel added 16 commits May 23, 2024 22:35

Fix do_reduce_labels for maskformer image processor

724b324

Deprecate reduce_labels in favor to do_reduce_labels

c4c1489

Deprecate reduce_labels in favor to do_reduce_labels (segformer)

59bef2d

Deprecate reduce_labels in favor to do_reduce_labels (oneformer)

51f204e

Deprecate reduce_labels in favor to do_reduce_labels (maskformer)

ad9d97d

Deprecate reduce_labels in favor to do_reduce_labels (mask2former)

79bc165

Fix typo

c785245

Update mask2former test

fbedce3

fixup

df88179

Update segmentation examples

e905cd6

Update docs

07ac6aa

Fixup

d747c88

Imports fixup

5861b86

Add deprecation decorator draft

9e2c114

Add deprecation decorator

9a7da87

Fixup

b0e26f5

qubvel force-pushed the clean-up-do-reduce-labels branch from 3f256a1 to b0e26f5 Compare May 24, 2024 00:09

qubvel added 8 commits May 24, 2024 10:25

Add deprecate_kwarg decorator

2698fa1

Validate kwargs decorator

a54bbec

Kwargs validation (beit)

130c972

fixup

25d6367

Kwargs validation (mask2former)

8f4ad5d

Kwargs validation (maskformer)

af830a5

Kwargs validation (oneformer)

5eb3d43

Kwargs validation (segformer)

67af1a5

qubvel added 2 commits May 24, 2024 18:59

Better message

5f23472

Fix oneformer processor save-load test

0feee61

qubvel commented May 24, 2024

View reviewed changes

qubvel mentioned this pull request May 28, 2024

Instance segmentation examples #31084

Open

3 tasks

amyeroberts reviewed May 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up `reduce_labels` for image processors #30799

Clean up `reduce_labels` for image processors #30799

qubvel commented May 14, 2024 •

edited

HuggingFaceDocBuilderDev commented May 14, 2024

amyeroberts left a comment

amyeroberts May 15, 2024

qubvel May 15, 2024 •

edited

amyeroberts May 15, 2024

qubvel May 15, 2024

amyeroberts May 16, 2024

molbap May 17, 2024

qubvel May 20, 2024 •

edited

molbap May 21, 2024

qubvel May 24, 2024

qubvel May 24, 2024 •

edited

qubvel commented May 28, 2024

amyeroberts left a comment

amyeroberts May 28, 2024

amyeroberts May 28, 2024

amyeroberts May 28, 2024

amyeroberts May 28, 2024

amyeroberts May 28, 2024

amyeroberts May 28, 2024

amyeroberts May 28, 2024

amyeroberts May 28, 2024

Method	Is BC in current PR
image_processor = ImageProcessor.from_pretrained( "repo/model_with_deprecated_params" )	✅ Yes Because the user does not have control of it (by overriding `from_dict()`)
image_processor = ImageProcessor.from_pretrained( "repo/model_with_deprecated_params", reduce_labels=True, )	🟥 No, raise error Because it is explicit parameter setting
image_processor = ImageProcessor( ... reduce_labels=True, )	🟥 No, raise error Because it is explicit parameter setting

		from typing import Optional


		def filter_out_non_signature_kwargs(extra: Optional[list] = None):

Clean up reduce_labels for image processors #30799

Are you sure you want to change the base?

Clean up reduce_labels for image processors #30799

Conversation

qubvel commented May 14, 2024 • edited

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented May 14, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qubvel May 15, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qubvel May 20, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qubvel May 24, 2024 • edited

Choose a reason for hiding this comment

qubvel commented May 28, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Clean up `reduce_labels` for image processors #30799

Clean up `reduce_labels` for image processors #30799

qubvel commented May 14, 2024 •

edited

qubvel May 15, 2024 •

edited

qubvel May 20, 2024 •

edited

qubvel May 24, 2024 •

edited