Update Execution Contexts to recommend explicit dependencies#112
Update Execution Contexts to recommend explicit dependencies#112
Conversation
jmhsieh
left a comment
There was a problem hiding this comment.
did you test the conda approach?
| .pip(deps) # list of pip dependencies, like "numpy" or "numpy==2.3.5" | ||
| .conda(deps) # list of conda dependencies, like "numpy" or "numpy=2.3.5" |
There was a problem hiding this comment.
This needs to be tested. I'm pretty sure conda and pip fight and aren't compatible with each other.
There was a problem hiding this comment.
Right - we'll explicitly raise an exception if you try to specify both (which is warned right below this code block)
There was a problem hiding this comment.
As for the actual code that uses conda - that's in https://github.com/lancedb/geneva/pull/515. Tests that we pass along the conda (or requirements, or conda environment) on to Ray so Ray can do what it does with it. Debated adding an integration test too that runs a whole workflow with conda deps, didn't do so, but could if you think it's worthwhile.
docs/geneva/jobs/contexts.mdx
Outdated
|
|
||
| ### Auto-upload local dependencies | ||
|
|
||
| Geneva packages your local environment and sends it to Ray workers. This includes the current workspace root (if you're in a python repo) or the current working directory (if you're not). However, if you set `.skip_site_packages(False)`, your Python site-packages (defined by `site.getsitepackages()`) will be uploaded to workers as well. This is not recommended for production use, as it is prone to issues like architecture mismatches of built dependencies, but it can be a good way to iterate quickly during development. |
There was a problem hiding this comment.
Geneva packages ... => Geneva can package ...
There was a problem hiding this comment.
If you load these and then get the name, it can be used in production. Actually that's sort of the point -- I have my env in a manifest and when someone else runs my job they use my manifest as well and it "should just work". That also needs to be tested actually.
There was a problem hiding this comment.
I'm not sure I understand. Reusing manifests by name is great, but we can do that no matter how you specify the manifest (auto-uploading, pip, conda, whatever).
The upside of specifying it manually is that you know what you're sending to it - it's not just "open_clip because I was using that in the same venv, and this old version of numpy because that's what I happened to have installed, and this version of pyarrow that'll break on amd64"
docs/geneva/jobs/contexts.mdx
Outdated
| manifest = ( | ||
| GenevaManifestBuilder() | ||
| .name(manifest_name) | ||
| .skip_site_packages(False) |
There was a problem hiding this comment.
nit: For the builder we might want to change this to use_site_packages(True)
There was a problem hiding this comment.
Oh, I'd love to - I hate passing around double negatives. It would be an even more breaking change though - I guess we would:
- add
use_site_packages - have it override
skip_site_packages - throw a warning if both are present (maybe just throw a deprecation warning if you're using skip_site_packages at all)
- Are you up for that much churn?
- If so, naming:
use_site_packages?upload_site_packages?
There was a problem hiding this comment.
Builders are relatively new so I'm not concerned about churn here yet.
There was a problem hiding this comment.
Changing it one level down would cause some issues.
1b1f08b to
6c3d366
Compare
Fixes https://linear.app/lancedb/issue/GEN-284/docs-make-pipcondaimage-the-preferred-way-to-send-deps-in-manifest
We're now recommending that users use .pip, .requirements_path, .conda, .conda_environment_path, or just bake their dependencies into their image, instead of setting skip_site_packages(False) and uploading all your local dependencies. This updates the docs to show that.