Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/issue 2090 : Add a repo_files command, with recursive deletion. #2280

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

OlivierKessler01
Copy link

@OlivierKessler01 OlivierKessler01 commented May 13, 2024

Implement a new repo-files command to manipulate files (deletion...)

Fixes #2235

Opiniated choice made regarding globs matching

Crux

I've encoutered an issue when implementing the folder matching as per this issue specification :

huggingface-cli repo-files <repo_id> delete folder/

To implement the pattern lexer I reused huggingface_hub.hf_api._prepare_folder_deletions(), which is already used in the upload command to lexe/tokenize allow and disallow lists.

The function is a wrapper around fnmatch.fnmatch() (via huggingface_hub.utils._paths.filter_repo_objects()). It matches files against unix shell-style file matching patterns.

Unix shell-style automatons respond to this case in an rather unconvenient way :

>>> fnmatch.fnmatch("nested/file.bin", "nested/")
False

Solution implemented

It adds "*" at the end of the patterns that end with "/". Given git untracks any folder that is empty, deleting all files in the folder effectively deletes the folder. Did it in the new huggingface_hub.hf_api.delete_files_r() function, because updating huggingface_hub.utils._paths.filter_repo_objects() at a lower-level would alter the behavior of various functions like the upload command and the commit scheduler in a non-BC compatible way.

What do you think about it ?

huggingface-cli upload --delete="nested/"
#the line above still doesn't work, but it's consistent (logic unchanged, interface unchanged). 

huggingface-cli repo-files delete nested/ #interpreted as `huggingface-cli repo-files delete nested/*`
#the line above works as per https://github.com/huggingface/huggingface_hub/issues/2235 specs.

All this has been documented in the function doc.

What could have been done instead

Remove support for this
huggingface-cli repo-files <repo_id> delete folder/

Natively support this through already existing logic, and let Git untrack the folder.
huggingface-cli repo-files <repo_id> delete folder/*

Tests added (duration that might disrupt DevX and project output)

20.46s call     tests/test_hf_api.py::CommitApiTest::test_delete_files_r

Remarks

Feel free to tell me if you'd like something different here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CLI] Command line to delete files on a repo
1 participant