cleanup

dantasse · dantasse · commit 1b1f08bf6dbd · 2026-02-04T17:40:15.000-05:00
diff --git a/docs/geneva/jobs/contexts.mdx b/docs/geneva/jobs/contexts.mdx
@@ -17,20 +17,25 @@ We currently support one processing backend: **Ray**. There are 3 ways to connec
 
 ### Local Ray
 
-To execute jobs without an external Ray cluster, you can just trigger the `Table.backfill` method. This will auto-create a Ray cluster on your machine. Because it's on your laptop/desktop, this is only suitable for prototyping on small datasets. But it is the easiest way to get started. Simply define the UDF, add a column, and trigger the job:
+To execute jobs without an external Ray cluster, you can use `LocalRayContext`. This will auto-create a Ray cluster on your machine. Because it's on your laptop/desktop, this is only suitable for prototyping on small datasets. But it is the easiest way to get started. Simply define the UDF, add a column, call `Connection.local_ray_context()`, and trigger the job:
 
 <CodeGroup>
 ```python Python icon="python"
+from geneva import udf
+from geneva.db import Connection
+
 @udf
 def filename_len(filename: str) -> int:
     return len(filename)
 
 tbl.add_columns({"filename_len": filename_len})
-tbl.backfill("filename_len")
+
+with Connection.local_ray_context():
+    tbl.backfill("filename_len")
 ```
 </CodeGroup>
 
-Geneva will package up your local environment and send it to each worker node, so they'll have access to all the same dependencies as if you ran a simple Python script yourself.
+Geneva will package up your local environment and send it to each worker process, so they'll have access to all the same dependencies as if you ran a simple Python script yourself.
 
 ### KubeRay
 
@@ -50,7 +55,7 @@ db = geneva.connect("s3://my-bucket/my-db")
 ray_version = ray.__version__
 python_version = f"{sys.version_info.major}.{sys.version_info.minor}"
 cluster_name = "my-geneva-cluster" # lowercase, numbers, hyphens only
-service_account = "my_k8s_service_account" # k8s service account bound geneva runs as
+service_account = "my_k8s_service_account" # k8s service account that Geneva runs as
 k8s_namespace = "geneva"  # k8s namespace
 
 cluster = (
@@ -158,7 +163,7 @@ After workers start up, this will run `pip install lancedb numpy` on them. Genev
 .conda_environment_path(path) # path to local conda environment.yml file, like "./environment.yml"
 # Note that file paths are relative to the execution directory.
 ```
-Note that you can only use one of these methods; trying to define more than one will raise an exception.
+Note that attempting to use both `pip` and `requirements_path` will raise an exception. Similarly, you can't use both `conda` and `conda_environment_path`.
 
 ### Bake dependencies into an image
 
@@ -199,7 +204,7 @@ However, if an image is defined in both a Cluster and a Manifest, the definition
 
 ### Auto-upload local dependencies
 
-Geneva packages your local environment and sends it to Ray workers. This includes the current workspace root (if you're in a python repo) or the current working directory (if you're not). However, if you set `.skip_site_packages(False)`, your Python site-packages (defined by `site.getsitepackages()`) will be uploaded to workers as well. This is not recommended for production use, as it is prone to issues like architecture mismatches of built dependencies, but it can be a good way to iterate quickly during development.
+Geneva can package your local environment and send it to Ray workers. This includes the current workspace root (if you're in a python repo) or the current working directory (if you're not). However, if you set `.upload_site_packages(True)`, your Python site-packages (defined by `site.getsitepackages()`) will be uploaded to workers as well. This is not recommended for production use, as it is prone to issues like architecture mismatches of built dependencies, but it can be a good way to iterate quickly during development.
 
 To upload site packages:
 
@@ -210,7 +215,7 @@ manifest_name = "dev-manifest"
 manifest = (
     GenevaManifestBuilder()
         .name(manifest_name)
-        .skip_site_packages(False)
+        .upload_site_packages(True)
     ).build()
 
 db.define_manifest(manifest_name, manifest)
@@ -222,9 +227,9 @@ Here's a summary of what's in a manifest and how you can define it. (methods are
 |Contents|How you can define it|
 |---|---|
 |Local working directory (or workspace root, if in a python repo)|Will be uploaded automatically.|
-|Local python packages|Will be uploaded automatically if you set `.skip_site_packages(False)`.|
-|Python packages to be installed|Use `.pip(packages: list[str])` or `.conda(packages: list[str])`. See [Ray's RuntimeEnv docs](https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html) for details.|
-|Python dependency lists|Use `.requirements_path(path: str)` or `conda_environment_path(path: str)`|
+|Local python packages|Will be uploaded if you set `.upload_site_packages(True)`.|
+|Python packages to be installed|Use `.pip(packages: list[str])` or `.conda(packages: dict[str, Any])`. See [Ray's RuntimeEnv docs](https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html) for details.|
+|Python dependency lists|Use `.requirements_path(path: str)` or `.conda_environment_path(path: str)`|
 |Local python packages outside of `site_packages`|Use `.py_modules(modules: list[str])` or `.add_py_module(module: str)`. See [Ray's RuntimeEnv docs](https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html) for details.|
 |Container image for head node|Use `.head_image(head_image: str)` or `default_head_image()` to use the default. Note that, if the image is also defined in the GenevaCluster, the image set here in the Manifest will take priority.|
 |Container image for worker nodes|Use `.worker_image(worker_image: str)` or `default_worker_image()` to use the default for the current platform. As with the head image, this takes priority over any images set in the Cluster.|
@@ -242,7 +247,6 @@ Calling `context` will enter a context manager that will provision an execution
 db = geneva.connect(my_db_uri)
 tbl = db.get_table("my_table")
 
-# Providing a manifest is optional; if not provided, it will work as described in "Use defaults" above.
 with db.context(cluster=cluster_name, manifest=manifest_name):
     tbl.backfill("embedding")
 ```
@@ -253,7 +257,7 @@ In a notebook environment, you can manually enter and exit the context manager i
 <CodeGroup>
 ```python Python icon="python"
 ctx = db.context(cluster=cluster_name, manifest=manifest_name)
-ctx.__enter()__
+ctx.__enter__()
 
 # ... do stuff