Skip to content

Commit 1b1f08b

Browse files
committed
cleanup
1 parent a71d6bc commit 1b1f08b

File tree

1 file changed

+16
-12
lines changed

1 file changed

+16
-12
lines changed

docs/geneva/jobs/contexts.mdx

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -17,20 +17,25 @@ We currently support one processing backend: **Ray**. There are 3 ways to connec
1717

1818
### Local Ray
1919

20-
To execute jobs without an external Ray cluster, you can just trigger the `Table.backfill` method. This will auto-create a Ray cluster on your machine. Because it's on your laptop/desktop, this is only suitable for prototyping on small datasets. But it is the easiest way to get started. Simply define the UDF, add a column, and trigger the job:
20+
To execute jobs without an external Ray cluster, you can use `LocalRayContext`. This will auto-create a Ray cluster on your machine. Because it's on your laptop/desktop, this is only suitable for prototyping on small datasets. But it is the easiest way to get started. Simply define the UDF, add a column, call `Connection.local_ray_context()`, and trigger the job:
2121

2222
<CodeGroup>
2323
```python Python icon="python"
24+
from geneva import udf
25+
from geneva.db import Connection
26+
2427
@udf
2528
def filename_len(filename: str) -> int:
2629
return len(filename)
2730

2831
tbl.add_columns({"filename_len": filename_len})
29-
tbl.backfill("filename_len")
32+
33+
with Connection.local_ray_context():
34+
tbl.backfill("filename_len")
3035
```
3136
</CodeGroup>
3237

33-
Geneva will package up your local environment and send it to each worker node, so they'll have access to all the same dependencies as if you ran a simple Python script yourself.
38+
Geneva will package up your local environment and send it to each worker process, so they'll have access to all the same dependencies as if you ran a simple Python script yourself.
3439

3540
### KubeRay
3641

@@ -50,7 +55,7 @@ db = geneva.connect("s3://my-bucket/my-db")
5055
ray_version = ray.__version__
5156
python_version = f"{sys.version_info.major}.{sys.version_info.minor}"
5257
cluster_name = "my-geneva-cluster" # lowercase, numbers, hyphens only
53-
service_account = "my_k8s_service_account" # k8s service account bound geneva runs as
58+
service_account = "my_k8s_service_account" # k8s service account that Geneva runs as
5459
k8s_namespace = "geneva" # k8s namespace
5560

5661
cluster = (
@@ -158,7 +163,7 @@ After workers start up, this will run `pip install lancedb numpy` on them. Genev
158163
.conda_environment_path(path) # path to local conda environment.yml file, like "./environment.yml"
159164
# Note that file paths are relative to the execution directory.
160165
```
161-
Note that you can only use one of these methods; trying to define more than one will raise an exception.
166+
Note that attempting to use both `pip` and `requirements_path` will raise an exception. Similarly, you can't use both `conda` and `conda_environment_path`.
162167

163168
### Bake dependencies into an image
164169

@@ -199,7 +204,7 @@ However, if an image is defined in both a Cluster and a Manifest, the definition
199204

200205
### Auto-upload local dependencies
201206

202-
Geneva packages your local environment and sends it to Ray workers. This includes the current workspace root (if you're in a python repo) or the current working directory (if you're not). However, if you set `.skip_site_packages(False)`, your Python site-packages (defined by `site.getsitepackages()`) will be uploaded to workers as well. This is not recommended for production use, as it is prone to issues like architecture mismatches of built dependencies, but it can be a good way to iterate quickly during development.
207+
Geneva can package your local environment and send it to Ray workers. This includes the current workspace root (if you're in a python repo) or the current working directory (if you're not). However, if you set `.upload_site_packages(True)`, your Python site-packages (defined by `site.getsitepackages()`) will be uploaded to workers as well. This is not recommended for production use, as it is prone to issues like architecture mismatches of built dependencies, but it can be a good way to iterate quickly during development.
203208

204209
To upload site packages:
205210

@@ -210,7 +215,7 @@ manifest_name = "dev-manifest"
210215
manifest = (
211216
GenevaManifestBuilder()
212217
.name(manifest_name)
213-
.skip_site_packages(False)
218+
.upload_site_packages(True)
214219
).build()
215220

216221
db.define_manifest(manifest_name, manifest)
@@ -222,9 +227,9 @@ Here's a summary of what's in a manifest and how you can define it. (methods are
222227
|Contents|How you can define it|
223228
|---|---|
224229
|Local working directory (or workspace root, if in a python repo)|Will be uploaded automatically.|
225-
|Local python packages|Will be uploaded automatically if you set `.skip_site_packages(False)`.|
226-
|Python packages to be installed|Use `.pip(packages: list[str])` or `.conda(packages: list[str])`. See [Ray's RuntimeEnv docs](https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html) for details.|
227-
|Python dependency lists|Use `.requirements_path(path: str)` or `conda_environment_path(path: str)`|
230+
|Local python packages|Will be uploaded if you set `.upload_site_packages(True)`.|
231+
|Python packages to be installed|Use `.pip(packages: list[str])` or `.conda(packages: dict[str, Any])`. See [Ray's RuntimeEnv docs](https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html) for details.|
232+
|Python dependency lists|Use `.requirements_path(path: str)` or `.conda_environment_path(path: str)`|
228233
|Local python packages outside of `site_packages`|Use `.py_modules(modules: list[str])` or `.add_py_module(module: str)`. See [Ray's RuntimeEnv docs](https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html) for details.|
229234
|Container image for head node|Use `.head_image(head_image: str)` or `default_head_image()` to use the default. Note that, if the image is also defined in the GenevaCluster, the image set here in the Manifest will take priority.|
230235
|Container image for worker nodes|Use `.worker_image(worker_image: str)` or `default_worker_image()` to use the default for the current platform. As with the head image, this takes priority over any images set in the Cluster.|
@@ -242,7 +247,6 @@ Calling `context` will enter a context manager that will provision an execution
242247
db = geneva.connect(my_db_uri)
243248
tbl = db.get_table("my_table")
244249

245-
# Providing a manifest is optional; if not provided, it will work as described in "Use defaults" above.
246250
with db.context(cluster=cluster_name, manifest=manifest_name):
247251
tbl.backfill("embedding")
248252
```
@@ -253,7 +257,7 @@ In a notebook environment, you can manually enter and exit the context manager i
253257
<CodeGroup>
254258
```python Python icon="python"
255259
ctx = db.context(cluster=cluster_name, manifest=manifest_name)
256-
ctx.__enter()__
260+
ctx.__enter__()
257261

258262
# ... do stuff
259263

0 commit comments

Comments
 (0)