fix/issue 2090 : Add a repo_files
command, with recursive deletion.
#2280
+436
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement a new
repo-files
command to manipulate files (deletion...)Fixes #2235
Opiniated choice made regarding globs matching
Crux
I've encoutered an issue when implementing the folder matching as per this issue specification :
To implement the pattern lexer I reused
huggingface_hub.hf_api._prepare_folder_deletions()
, which is already used in theupload
command to lexe/tokenize allow and disallow lists.The function is a wrapper around
fnmatch.fnmatch()
(viahuggingface_hub.utils._paths.filter_repo_objects()
). It matches files against unix shell-style file matching patterns.Unix shell-style automatons respond to this case in an rather unconvenient way :
Solution implemented
It adds "*" at the end of the patterns that end with "/". Given git untracks any folder that is empty, deleting all files in the folder effectively deletes the folder. Did it in the new
huggingface_hub.hf_api.delete_files_r()
function, because updatinghuggingface_hub.utils._paths.filter_repo_objects()
at a lower-level would alter the behavior of various functions like theupload
command and the commit scheduler in a non-BC compatible way.What do you think about it ?
All this has been documented in the function doc.
What could have been done instead
Remove support for this
huggingface-cli repo-files <repo_id> delete folder/
Natively support this through already existing logic, and let Git untrack the folder.
huggingface-cli repo-files <repo_id> delete folder/*
Tests added (duration that might disrupt DevX and project output)
Remarks
Feel free to tell me if you'd like something different here :)