added selective filtering to kit import#1053
added selective filtering to kit import#1053TheCoder2010-create wants to merge 1 commit intokitops-ml:mainfrom
Conversation
Allows users to filter files during import from Hugging Face and Git using glob patterns. Signed-off-by: Manav Sutar <sutarmanav557@gmail.com>
There was a problem hiding this comment.
I'm not sure this is a good approach for what this PR is trying to do. Is there a use-case covered here that isn't supported by the currently available options? The current kit import command supports
- Manually editing the generated Kitfile before proceeding with the import, which allows you to select which files to include in the ModelKit
- Providing an existing Kitfile, via the
--fileflag, to do the same as above
In both cases, the benefits (e.g. reduced bandwidth/storage) are present with the above options -- Kit will only download the files necessary (in the huggingface case, at least). The --filter flag seems like a less ergonomic and more error-prone way to achieve the same goal (since you have to match paths from a remote server).
Perhaps an issue explaining the problem you are trying to solve would be helpful here.
Edit to add: Additionally, the filter approach pushes users towards more error-prone usage, as it becomes easy to e.g. omit licenses and readme files, or necessary configuration metadata, without realizing it.
There was a problem hiding this comment.
A PR that updates the CLI should not incidentally update pnpm-lock.yaml
Allows users to filter files during import from Hugging Face and Git using glob patterns.
Description
This PR introduces selective file filtering for the
kit importcommand via a new--filterflag. This feature allows users to provide glob patterns to include only specific files or directories when importing from Hugging Face or Git repositories.Key Benefits:
Bandwidth Efficiency - For Hugging Face imports, filtering happens at the API level.
kitonly downloads files that match the user's patterns, avoiding the need to download large, unnecessary model formats (e.g., skipping.safetensorswhen only.ggufis needed).Reduced Storage Footprint* : For Git imports, unmatched files are pruned from the temporary workspace before the ModelKit is packed.
improved Workflow: Users can now import specific branches or sub-folders of large repositories with surgical precision.
Technical Details:
New Flag: Added
--filter(StringSlice) toimportOptions.Hugging Face Implementation: Integrated the filter into the
importUsingHFflow. It modifies theDirectoryListingbefore the download phase begins.Git Implementation: Integrated the filter into the
importUsingGitflow. It utilizes a newremoveUnmatchedFileshelper to clean up the cloned repository post-download but pre-pack.Unified Filtering: Created a reusable
filterDirectoryListingutility inpkg/cmd/kitimport/util.gothat usespath.Matchfor cross-platform globbing support.unit Testing: Added comprehensive tests in
pkg/cmd/kitimport/util_test.gocovering extension-based filtering, path-based filtering, and multiple filter combinations.Linked issues
Fixes # (Manual contribution)
Example Usage: