-
Notifications
You must be signed in to change notification settings - Fork 4
feat(wanda): implement artifact extraction using docker cp #411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Summary of ChangesHello @andrew-anyscale, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a mechanism to copy artifacts from built Docker images using docker cp. The implementation is straightforward, adding new docker command wrappers and integrating artifact extraction into the build process. The changes are well-tested, covering file and directory copying, optional artifacts, and behavior on cache hits.
My review focuses on improving code style and correctness in the tests and documentation. Specifically:
- Using
filepath.Joinfor path construction in tests to improve portability and adhere to Go idioms. - Correcting a misleading comment in
wanda/forge.go.
I also noticed some code duplication in the new test functions in wanda/docker_cmd_test.go. While I haven't added a specific comment, you might consider refactoring the common setup (pulling image, creating container) into a helper function to make the tests more concise.
Overall, this is a solid contribution that adds valuable functionality.
a3b19fc to
80ab6e0
Compare
70e0db3 to
25c63e1
Compare
80ab6e0 to
81385ec
Compare
25c63e1 to
f1104ff
Compare
81385ec to
1e3c63b
Compare
f1104ff to
313a1f8
Compare
1e3c63b to
217cc4d
Compare
313a1f8 to
7a60b1e
Compare
217cc4d to
34ea72c
Compare
34ea72c to
868f0c1
Compare
868f0c1 to
f210797
Compare
aslonnie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- why we are not using crane this time?
- what will happen when wanda cache hits?
I didn't see any capabilities for copying out individual files/directories outside of exporting the file system. I can rework if there is an entrypoint I am missing. https://github.com/google/go-containerregistry/blob/main/cmd/crane/doc/crane.md
Extraction runs for root spec only, even on cache hit. This is so that CI doesn't have missing artifacts in the case of a cache hit. |
f210797 to
f343d80
Compare
7a60b1e to
928b35c
Compare
f343d80 to
44d0999
Compare
sorry what does "root spec" mean? |
|
Root spec (I'm realizing 'base target' is more appropriate) is the spec that |
|
yeah, it is just the docker cp is faster if the image is already locally available, but it is slower if it is a wanda remote cache hit. when wanda remote cache hits, the image building is just a image retag on the remote CR; it does not pull down the image. where that said.. wheel building never cache hits... however, on the other hand, the wheel output image is also small to perform export (it only contains the wheels). maybe we can 1. use docker cp always, and 2. just drop the artifact exporting when it is cache hit.. |
wanda/testdata/Dockerfile.artifact
Outdated
| RUN mkdir -p /build/dist /build/docs /app/bin \ | ||
| && echo "wheel-content" > /build/dist/mypackage-1.0.0.whl \ | ||
| && echo "wheel-content2" > /build/dist/mypackage-1.0.1.whl \ | ||
| && echo "docs-content" > /build/docs/README.md \ | ||
| && echo "binary-content" > /app/bin/myapp \ | ||
| && chmod +x /app/bin/myapp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe use heredoc / multi-line?
wanda/docker_cmd.go
Outdated
| // CMD/ENTRYPOINT. The command doesn't need to exist since the container is | ||
| // never started. | ||
| func (c *dockerCmd) createContainer(image string) (string, error) { | ||
| cmd := c.cmd("create", image, "unused") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just drop the unused ? the "unused" is treated as a "command" which is optional on create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing on the wheel container, not having any input command causes it to fail out on creation--
❯ docker create cr.ray.io/rayproject/ray-wheel-py3.13-aarch64:latest
Error response from daemon: no command specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ docker create ubuntu:focal
508194d4ed04d9af40683f40dc8596693e7136bbef01b2a778d75d06461d4795
I think it is because the image does not have a CMD, which then demands docker create to specify an explicit CMD.
nit: instead of "unused", maybe use true? with a comment explaining what it is and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and added comment--
// createContainer creates a container from an image without starting it.
// Returns the container ID.
func (c *dockerCmd) createContainer(image string) (string, error) {
// "true" is a no-op command required for images without CMD/ENTRYPOINT.
// The container is never started, so the command doesn't actually run.
out, err := c.output("create", image, "true")
if err != nil {
return "", err
}
return strings.TrimSpace(string(out)), nil
}
wanda/docker_cmd.go
Outdated
|
|
||
| // removeContainer removes a container quietly (no stdout). | ||
| func (c *dockerCmd) removeContainer(containerID string) error { | ||
| cmd := exec.Command(c.bin, "rm", containerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use c.run here?
wanda/docker_cmd.go
Outdated
| // CMD/ENTRYPOINT. The command doesn't need to exist since the container is | ||
| // never started. | ||
| func (c *dockerCmd) createContainer(image string) (string, error) { | ||
| cmd := c.cmd("create", image, "unused") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe create a helper as c.output() for capturing the output of a docker subcommand?
wanda/docker_cmd_test.go
Outdated
| } | ||
| defer cmd.removeContainer(containerID) | ||
|
|
||
| files, err := cmd.listContainerFiles(containerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be quite expensive.. and it is for supporting glob I guess. if we are doing this anyways, I would rather we do crane export so that we do not require having a docker daemon.
which gets to my point:
- maybe just drop globing support?
- and only support copying out a directory (which is supported by
docker cp) - and we can change the docker build files to copy files into
/opt/artifactsdirectory or something so that we get the required result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll go ahead and drop glob support here to avoid any forms of this type of iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed an update to remove glob support, and simplify these flows
928b35c to
5adb664
Compare
44d0999 to
3eb89c4
Compare
wanda/docker_cmd.go
Outdated
| if err := cmd.Run(); err != nil { | ||
| return "", err | ||
| } | ||
| return strings.TrimSpace(buf.String()), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe return []byte, and convert to string and apply TrimSpace by the caller? []byte is the more generic format, where string and TrimSpace are only for particular outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent to know, thank you! Made the change
Extract artifacts from built images using `docker cp` for cross-platform reliability - Create container from image without starting it - Copy each artifact using `docker cp` - Clean up container when done - Extraction runs for base target only. Skips on cache hit. Example: ```shell ❯ PYTHON_VERSION=3.13 MANYLINUX_VERSION=260128.221a193 HOSTTYPE=aarch64 ARCH_SUFFIX=-aarch64 BUILDKITE_COMMIT=b5737cefc0 IS_LOCAL_BUILD=true /Users/andrew/devel/rayci-wanda-artifacts/_release/wanda-darwin-arm64 --artifacts_dir .whl/ ci/docker/ray-wheel.wanda.yaml ... 2026/02/04 09:10:39 extracting 1 artifact(s) from localhost:5000/rayci-work:ray-wheel-py3.13-aarch64 Successfully copied 72MB to /Users/andrew/devel/ray-local-wheel-build/.whl/ray-3.0.0.dev0-cp313-cp313-manylinux2014_aarch64.whl 2026/02/04 09:10:40 extracted 1 artifact(s) in 722ms: 2026/02/04 09:10:40 /Users/andrew/devel/ray-local-wheel-build/.whl/ray-3.0.0.dev0-cp313-cp313-manylinux2014_aarch64.whl ``` Topic: wanda-artifact-copy Signed-off-by: andrew <andrew@anyscale.com>
126a18a to
074fb9a
Compare
Extract artifacts from built images using
docker cpfor cross-platform reliabilitydocker cpExample:
❯ PYTHON_VERSION=3.13 MANYLINUX_VERSION=260128.221a193 HOSTTYPE=aarch64 ARCH_SUFFIX=-aarch64 BUILDKITE_COMMIT=b5737cefc0 IS_LOCAL_BUILD=true /Users/andrew/devel/rayci-wanda-artifacts/_release/wanda-darwin-arm64 --artifacts_dir .whl/ ci/docker/ray-wheel.wanda.yaml ... 2026/02/04 09:10:39 extracting 1 artifact(s) from localhost:5000/rayci-work:ray-wheel-py3.13-aarch64 Successfully copied 72MB to /Users/andrew/devel/ray-local-wheel-build/.whl/ray-3.0.0.dev0-cp313-cp313-manylinux2014_aarch64.whl 2026/02/04 09:10:40 extracted 1 artifact(s) in 722ms: 2026/02/04 09:10:40 /Users/andrew/devel/ray-local-wheel-build/.whl/ray-3.0.0.dev0-cp313-cp313-manylinux2014_aarch64.whlTopic: wanda-artifact-copy
Signed-off-by: andrew andrew@anyscale.com