Reset stage states in cloned saved views #3758

kaixi-wang · 2023-11-01T21:26:58Z

What changes are proposed in this pull request?

There's a problem with saved_views of cloned datasets, specifically ones with stages involving generated/temp datasets. The name of the generated view in the saved view of the cloned dataset is the same as the original dataset. However, after a dataset is cloned, it can be modified independently. This means that the generated temp dataset also must be different. Otherwise changes on the generated view will be persisted to both the clone and the original, which may lead to errors and undesirable effects.

When viewing the saved view of the clone, 4 temporary datasets are created per interaction with the view and expanding the sidebar throws an error

Example:

# Load dataset with saved views that have stages with _state:
dataset=fo.load_dataset('quickstart-sim-sort-tests')
cloned = dataset.clone()
orig_views= dataset._doc.saved_views
cloned_views = cloned._doc.saved_views
for (o,c) in zip(orig_views, cloned_views):
    print("original:{}\ncloned:{}\n".format(o.view_stages, c.view_stages))

original:['{"_cls": "fiftyone.core.stages.ToPatches", "kwargs": [["field", "predictions"], ["config", null], ["_state", {"dataset": "quickstart", "stages": [], "field": "predictions", "config": null, "name": "2023.09.29.14.22.14"}]]}']

cloned:['{"_cls": "fiftyone.core.stages.ToPatches", "kwargs": [["field", "predictions"], ["config", null], ["_state", {"dataset": "quickstart", "stages": [], "field": "predictions", "config": null, "name": "2023.09.29.14.22.14"}]]}']

codecov · 2023-11-01T21:29:58Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (223cfab) 16.16% compared to head (4e6b181) 16.16%.
Report is 5 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3758      +/-   ##
===========================================
- Coverage    16.16%   16.16%   -0.01%     
===========================================
  Files          641      641              
  Lines        74140    74141       +1     
  Branches       982      982              
===========================================
  Hits         11988    11988              
- Misses       62152    62153       +1

Flag	Coverage Δ
app	`16.16% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...ackages/core/src/components/Actions/ActionsRow.tsx	`0.00% <0.00%> (ø)`
...s/core/src/components/ColorModal/ShareStyledDiv.ts	`0.00% <0.00%> (ø)`
app/packages/core/src/components/Grid/Grid.tsx	`0.00% <0.00%> (ø)`
app/packages/core/src/components/Modal/Modal.tsx	`0.00% <0.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

brimoor

Consider this example:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

patches = dataset.to_patches("ground_truth")
dataset.save_view("test", patches)

dataset2 = dataset.clone()
patches2 = dataset2.load_saved_view("test")

assert patches._serialize() != patches2._serialize()

When patches2 is loaded, the cached _state from dataset that has been carried over to dataset2 is invalidated because dataset2.name != _state["dataset"]:

fiftyone/fiftyone/core/stages.py

Line 7441 in ccf77ff

"dataset": sample_collection.dataset_name,

So, although the cached _state is not valid, there's not any direct risk of accidentally editing the wrong source dataset as you are suggesting.

brimoor · 2023-11-02T03:24:38Z

One thing I wish we did was use dataset.id rather than dataset.name in the cached state: changing a dataset's name should not invalidate the cache.

We could make that change, but it would cause any saved views with _state to be forever cache-invalid because their saved _state will contain names, not IDs. However, that's not really a big deal because if temporary datasets are cleaned up reasonably often, the saved view will also be cache-invalid because it's temporary dataset will no longer exist.

brimoor · 2023-11-02T03:26:07Z

I don't see value in going further down this rabbit hole for now. Unless there is a specific reproducible error report.

We've already decided that once Dataset.last_updated_at is updated we'll make improvements here.

kaixi-wang · 2023-11-02T19:49:09Z

When patches2 is loaded, the cached _state from dataset that has been carried over to dataset2 is invalidated because dataset2.name != _state["dataset"]:

I didn't see this check. Where is it?

After adding samples to original dataset and deleting non-persistent datasets, viewing the clone's save_view still referenced the _state of the original dataset:

Also, after reloading the cloned dataset, the internal _state is still wr
<SavedViewDocument: {
'id': '6543ff2f0a6fdd8d2be53220',
'dataset_id': '6543fecc0a6fdd8d2be5321b',
'name': 'detection patches',
'description': '',
'slug': 'detection-patches',
'color': '#FAC515',
'view_stages': [
'{"_cls": "fiftyone.core.stages.ToPatches", "kwargs": [["field", "detections"], ["config", null], ["_state", {"dataset": "openv7-saved-views-testing", "stages": [], "field": "detections", "config": null, "name": "2023.10.31.18.04.15"}]]}',
],
'created_at': datetime.datetime(2023, 10, 31, 18, 7, 58, 28000),
'last_modified_at': datetime.datetime(2023, 10, 31, 18, 7, 58, 28000),
'last_loaded_at': datetime.datetime(2023, 11, 2, 21, 46, 37, 619000),
}>,

and has issues loading:

Screen.Recording.2023-11-02.at.2.46.56.PM.mov

kaixi-wang · 2023-11-02T22:39:27Z

Also, I don't see any harm this change would do. As a data-centric company, we should value persisting accurate/valid data. Cloning shouldn't introduce inconsistencies in the first place that may or may not be resolved depending on if a user calls a certain sdk method, especially since there are app only users

kaixi-wang self-assigned this Nov 1, 2023

brimoor requested changes Nov 2, 2023

View reviewed changes

kaixi-wang marked this pull request as draft November 2, 2023 19:51

kaixi-wang changed the base branch from release/v0.22.3 to develop November 2, 2023 19:52

kaixi-wang force-pushed the kacey/v0.22.3/fix-cloned-saved-views branch 2 times, most recently from 2a2f41d to 6a8d9cb Compare November 2, 2023 22:21

kaixi-wang marked this pull request as ready for review November 2, 2023 22:21

kaixi-wang requested a review from brimoor November 2, 2023 22:39

rm state from saved views when cloning

4e6b181

kaixi-wang force-pushed the kacey/v0.22.3/fix-cloned-saved-views branch from 6a8d9cb to 4e6b181 Compare November 2, 2023 22:43

kaixi-wang requested a review from a team November 2, 2023 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset stage states in cloned saved views #3758

Reset stage states in cloned saved views #3758

kaixi-wang commented Nov 1, 2023 •

edited

codecov bot commented Nov 1, 2023 •

edited

brimoor left a comment •

edited

brimoor commented Nov 2, 2023

brimoor commented Nov 2, 2023

kaixi-wang commented Nov 2, 2023 •

edited

kaixi-wang commented Nov 2, 2023

Reset stage states in cloned saved views #3758

Are you sure you want to change the base?

Reset stage states in cloned saved views #3758

Conversation

kaixi-wang commented Nov 1, 2023 • edited

What changes are proposed in this pull request?

codecov bot commented Nov 1, 2023 • edited

Codecov Report

brimoor left a comment • edited

Choose a reason for hiding this comment

brimoor commented Nov 2, 2023

brimoor commented Nov 2, 2023

kaixi-wang commented Nov 2, 2023 • edited

kaixi-wang commented Nov 2, 2023

kaixi-wang commented Nov 1, 2023 •

edited

codecov bot commented Nov 1, 2023 •

edited

brimoor left a comment •

edited

kaixi-wang commented Nov 2, 2023 •

edited