Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset stage states in cloned saved views #3758

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

kaixi-wang
Copy link
Contributor

@kaixi-wang kaixi-wang commented Nov 1, 2023

What changes are proposed in this pull request?

There's a problem with saved_views of cloned datasets, specifically ones with stages involving generated/temp datasets. The name of the generated view in the saved view of the cloned dataset is the same as the original dataset. However, after a dataset is cloned, it can be modified independently. This means that the generated temp dataset also must be different. Otherwise changes on the generated view will be persisted to both the clone and the original, which may lead to errors and undesirable effects.

When viewing the saved view of the clone, 4 temporary datasets are created per interaction with the view and expanding the sidebar throws an error
image

Example:

# Load dataset with saved views that have stages with _state:
dataset=fo.load_dataset('quickstart-sim-sort-tests')
cloned = dataset.clone()
orig_views= dataset._doc.saved_views
cloned_views = cloned._doc.saved_views
for (o,c) in zip(orig_views, cloned_views):
    print("original:{}\ncloned:{}\n".format(o.view_stages, c.view_stages))
original:['{"_cls": "fiftyone.core.stages.ToPatches", "kwargs": [["field", "predictions"], ["config", null], ["_state", {"dataset": "quickstart", "stages": [], "field": "predictions", "config": null, "name": "2023.09.29.14.22.14"}]]}']

cloned:['{"_cls": "fiftyone.core.stages.ToPatches", "kwargs": [["field", "predictions"], ["config", null], ["_state", {"dataset": "quickstart", "stages": [], "field": "predictions", "config": null, "name": "2023.09.29.14.22.14"}]]}']

@kaixi-wang kaixi-wang self-assigned this Nov 1, 2023
Copy link

codecov bot commented Nov 1, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (223cfab) 16.16% compared to head (4e6b181) 16.16%.
Report is 5 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3758      +/-   ##
===========================================
- Coverage    16.16%   16.16%   -0.01%     
===========================================
  Files          641      641              
  Lines        74140    74141       +1     
  Branches       982      982              
===========================================
  Hits         11988    11988              
- Misses       62152    62153       +1     
Flag Coverage Δ
app 16.16% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...ackages/core/src/components/Actions/ActionsRow.tsx 0.00% <0.00%> (ø)
...s/core/src/components/ColorModal/ShareStyledDiv.ts 0.00% <0.00%> (ø)
app/packages/core/src/components/Grid/Grid.tsx 0.00% <0.00%> (ø)
app/packages/core/src/components/Modal/Modal.tsx 0.00% <0.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this example:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

patches = dataset.to_patches("ground_truth")
dataset.save_view("test", patches)

dataset2 = dataset.clone()
patches2 = dataset2.load_saved_view("test")

assert patches._serialize() != patches2._serialize()

When patches2 is loaded, the cached _state from dataset that has been carried over to dataset2 is invalidated because dataset2.name != _state["dataset"]:

"dataset": sample_collection.dataset_name,

So, although the cached _state is not valid, there's not any direct risk of accidentally editing the wrong source dataset as you are suggesting.

@brimoor
Copy link
Contributor

brimoor commented Nov 2, 2023

One thing I wish we did was use dataset.id rather than dataset.name in the cached state: changing a dataset's name should not invalidate the cache.

We could make that change, but it would cause any saved views with _state to be forever cache-invalid because their saved _state will contain names, not IDs. However, that's not really a big deal because if temporary datasets are cleaned up reasonably often, the saved view will also be cache-invalid because it's temporary dataset will no longer exist.

@brimoor
Copy link
Contributor

brimoor commented Nov 2, 2023

I don't see value in going further down this rabbit hole for now. Unless there is a specific reproducible error report.

We've already decided that once Dataset.last_updated_at is updated we'll make improvements here.

@kaixi-wang
Copy link
Contributor Author

kaixi-wang commented Nov 2, 2023

When patches2 is loaded, the cached _state from dataset that has been carried over to dataset2 is invalidated because dataset2.name != _state["dataset"]:

I didn't see this check. Where is it?

After adding samples to original dataset and deleting non-persistent datasets, viewing the clone's save_view still referenced the _state of the original dataset:

image

Also, after reloading the cloned dataset, the internal _state is still wr
<SavedViewDocument: {
'id': '6543ff2f0a6fdd8d2be53220',
'dataset_id': '6543fecc0a6fdd8d2be5321b',
'name': 'detection patches',
'description': '',
'slug': 'detection-patches',
'color': '#FAC515',
'view_stages': [
'{"_cls": "fiftyone.core.stages.ToPatches", "kwargs": [["field", "detections"], ["config", null], ["_state", {"dataset": "openv7-saved-views-testing", "stages": [], "field": "detections", "config": null, "name": "2023.10.31.18.04.15"}]]}',
],
'created_at': datetime.datetime(2023, 10, 31, 18, 7, 58, 28000),
'last_modified_at': datetime.datetime(2023, 10, 31, 18, 7, 58, 28000),
'last_loaded_at': datetime.datetime(2023, 11, 2, 21, 46, 37, 619000),
}>,

and has issues loading:

Screen.Recording.2023-11-02.at.2.46.56.PM.mov

@kaixi-wang kaixi-wang marked this pull request as draft November 2, 2023 19:51
@kaixi-wang kaixi-wang changed the base branch from release/v0.22.3 to develop November 2, 2023 19:52
@kaixi-wang kaixi-wang force-pushed the kacey/v0.22.3/fix-cloned-saved-views branch 2 times, most recently from 2a2f41d to 6a8d9cb Compare November 2, 2023 22:21
@kaixi-wang kaixi-wang marked this pull request as ready for review November 2, 2023 22:21
@kaixi-wang
Copy link
Contributor Author

Also, I don't see any harm this change would do. As a data-centric company, we should value persisting accurate/valid data. Cloning shouldn't introduce inconsistencies in the first place that may or may not be resolved depending on if a user calls a certain sdk method, especially since there are app only users

@kaixi-wang kaixi-wang force-pushed the kacey/v0.22.3/fix-cloned-saved-views branch from 6a8d9cb to 4e6b181 Compare November 2, 2023 22:43
@kaixi-wang kaixi-wang requested a review from a team November 2, 2023 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants