-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset stage states in cloned saved views #3758
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #3758 +/- ##
===========================================
- Coverage 16.16% 16.16% -0.01%
===========================================
Files 641 641
Lines 74140 74141 +1
Branches 982 982
===========================================
Hits 11988 11988
- Misses 62152 62153 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this example:
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset("quickstart")
patches = dataset.to_patches("ground_truth")
dataset.save_view("test", patches)
dataset2 = dataset.clone()
patches2 = dataset2.load_saved_view("test")
assert patches._serialize() != patches2._serialize()
When patches2
is loaded, the cached _state
from dataset
that has been carried over to dataset2
is invalidated because dataset2.name != _state["dataset"]
:
fiftyone/fiftyone/core/stages.py
Line 7441 in ccf77ff
"dataset": sample_collection.dataset_name, |
So, although the cached _state
is not valid, there's not any direct risk of accidentally editing the wrong source dataset as you are suggesting.
One thing I wish we did was use We could make that change, but it would cause any saved views with |
I don't see value in going further down this rabbit hole for now. Unless there is a specific reproducible error report. We've already decided that once |
2a2f41d
to
6a8d9cb
Compare
Also, I don't see any harm this change would do. As a data-centric company, we should value persisting accurate/valid data. Cloning shouldn't introduce inconsistencies in the first place that may or may not be resolved depending on if a user calls a certain sdk method, especially since there are app only users |
6a8d9cb
to
4e6b181
Compare
What changes are proposed in this pull request?
There's a problem with saved_views of cloned datasets, specifically ones with stages involving generated/temp datasets. The name of the generated view in the saved view of the cloned dataset is the same as the original dataset. However, after a dataset is cloned, it can be modified independently. This means that the generated temp dataset also must be different. Otherwise changes on the generated view will be persisted to both the clone and the original, which may lead to errors and undesirable effects.
When viewing the saved view of the clone, 4 temporary datasets are created per interaction with the view and expanding the sidebar throws an error
Example: