Skip to content

Update Execution Contexts to recommend explicit dependencies#112

Open
dantasse wants to merge 1 commit intomainfrom
dantasse/update_contexts_to_prioritize_explicit_deps
Open

Update Execution Contexts to recommend explicit dependencies#112
dantasse wants to merge 1 commit intomainfrom
dantasse/update_contexts_to_prioritize_explicit_deps

Conversation

@dantasse
Copy link
Contributor

Fixes https://linear.app/lancedb/issue/GEN-284/docs-make-pipcondaimage-the-preferred-way-to-send-deps-in-manifest

We're now recommending that users use .pip, .requirements_path, .conda, .conda_environment_path, or just bake their dependencies into their image, instead of setting skip_site_packages(False) and uploading all your local dependencies. This updates the docs to show that.

Copy link
Contributor

@jmhsieh jmhsieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you test the conda approach?

Comment on lines +155 to +161
.pip(deps) # list of pip dependencies, like "numpy" or "numpy==2.3.5"
.conda(deps) # list of conda dependencies, like "numpy" or "numpy=2.3.5"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be tested. I'm pretty sure conda and pip fight and aren't compatible with each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - we'll explicitly raise an exception if you try to specify both (which is warned right below this code block)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the actual code that uses conda - that's in https://github.com/lancedb/geneva/pull/515. Tests that we pass along the conda (or requirements, or conda environment) on to Ray so Ray can do what it does with it. Debated adding an integration test too that runs a whole workflow with conda deps, didn't do so, but could if you think it's worthwhile.


### Auto-upload local dependencies

Geneva packages your local environment and sends it to Ray workers. This includes the current workspace root (if you're in a python repo) or the current working directory (if you're not). However, if you set `.skip_site_packages(False)`, your Python site-packages (defined by `site.getsitepackages()`) will be uploaded to workers as well. This is not recommended for production use, as it is prone to issues like architecture mismatches of built dependencies, but it can be a good way to iterate quickly during development.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geneva packages ... => Geneva can package ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you load these and then get the name, it can be used in production. Actually that's sort of the point -- I have my env in a manifest and when someone else runs my job they use my manifest as well and it "should just work". That also needs to be tested actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. Reusing manifests by name is great, but we can do that no matter how you specify the manifest (auto-uploading, pip, conda, whatever).

The upside of specifying it manually is that you know what you're sending to it - it's not just "open_clip because I was using that in the same venv, and this old version of numpy because that's what I happened to have installed, and this version of pyarrow that'll break on amd64"

manifest = (
GenevaManifestBuilder()
.name(manifest_name)
.skip_site_packages(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For the builder we might want to change this to use_site_packages(True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'd love to - I hate passing around double negatives. It would be an even more breaking change though - I guess we would:

  • add use_site_packages
  • have it override skip_site_packages
  • throw a warning if both are present (maybe just throw a deprecation warning if you're using skip_site_packages at all)
  1. Are you up for that much churn?
  2. If so, naming: use_site_packages? upload_site_packages?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builders are relatively new so I'm not concerned about churn here yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it one level down would cause some issues.

@dantasse dantasse force-pushed the dantasse/update_contexts_to_prioritize_explicit_deps branch from 1b1f08b to 6c3d366 Compare February 5, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants