You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/geneva/jobs/contexts.mdx
+16-12Lines changed: 16 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,20 +17,25 @@ We currently support one processing backend: **Ray**. There are 3 ways to connec
17
17
18
18
### Local Ray
19
19
20
-
To execute jobs without an external Ray cluster, you can just trigger the `Table.backfill` method. This will auto-create a Ray cluster on your machine. Because it's on your laptop/desktop, this is only suitable for prototyping on small datasets. But it is the easiest way to get started. Simply define the UDF, add a column, and trigger the job:
20
+
To execute jobs without an external Ray cluster, you can use `LocalRayContext`. This will auto-create a Ray cluster on your machine. Because it's on your laptop/desktop, this is only suitable for prototyping on small datasets. But it is the easiest way to get started. Simply define the UDF, add a column, call `Connection.local_ray_context()`, and trigger the job:
21
21
22
22
<CodeGroup>
23
23
```python Python icon="python"
24
+
from geneva import udf
25
+
from geneva.db import Connection
26
+
24
27
@udf
25
28
deffilename_len(filename: str) -> int:
26
29
returnlen(filename)
27
30
28
31
tbl.add_columns({"filename_len": filename_len})
29
-
tbl.backfill("filename_len")
32
+
33
+
with Connection.local_ray_context():
34
+
tbl.backfill("filename_len")
30
35
```
31
36
</CodeGroup>
32
37
33
-
Geneva will package up your local environment and send it to each worker node, so they'll have access to all the same dependencies as if you ran a simple Python script yourself.
38
+
Geneva will package up your local environment and send it to each worker process, so they'll have access to all the same dependencies as if you ran a simple Python script yourself.
34
39
35
40
### KubeRay
36
41
@@ -50,7 +55,7 @@ db = geneva.connect("s3://my-bucket/my-db")
cluster_name ="my-geneva-cluster"# lowercase, numbers, hyphens only
53
-
service_account ="my_k8s_service_account"# k8s service account bound geneva runs as
58
+
service_account ="my_k8s_service_account"# k8s service account that Geneva runs as
54
59
k8s_namespace ="geneva"# k8s namespace
55
60
56
61
cluster = (
@@ -158,7 +163,7 @@ After workers start up, this will run `pip install lancedb numpy` on them. Genev
158
163
.conda_environment_path(path) # path to local conda environment.yml file, like "./environment.yml"
159
164
# Note that file paths are relative to the execution directory.
160
165
```
161
-
Note that you can only use one of these methods; trying to define more than one will raise an exception.
166
+
Note that attempting to use both `pip` and `requirements_path` will raise an exception. Similarly, you can't use both `conda` and `conda_environment_path`.
162
167
163
168
### Bake dependencies into an image
164
169
@@ -199,7 +204,7 @@ However, if an image is defined in both a Cluster and a Manifest, the definition
199
204
200
205
### Auto-upload local dependencies
201
206
202
-
Geneva packages your local environment and sends it to Ray workers. This includes the current workspace root (if you're in a python repo) or the current working directory (if you're not). However, if you set `.skip_site_packages(False)`, your Python site-packages (defined by `site.getsitepackages()`) will be uploaded to workers as well. This is not recommended for production use, as it is prone to issues like architecture mismatches of built dependencies, but it can be a good way to iterate quickly during development.
207
+
Geneva can package your local environment and send it to Ray workers. This includes the current workspace root (if you're in a python repo) or the current working directory (if you're not). However, if you set `.upload_site_packages(True)`, your Python site-packages (defined by `site.getsitepackages()`) will be uploaded to workers as well. This is not recommended for production use, as it is prone to issues like architecture mismatches of built dependencies, but it can be a good way to iterate quickly during development.
@@ -222,9 +227,9 @@ Here's a summary of what's in a manifest and how you can define it. (methods are
222
227
|Contents|How you can define it|
223
228
|---|---|
224
229
|Local working directory (or workspace root, if in a python repo)|Will be uploaded automatically.|
225
-
|Local python packages|Will be uploaded automatically if you set `.skip_site_packages(False)`.|
226
-
|Python packages to be installed|Use `.pip(packages: list[str])` or `.conda(packages: list[str])`. See [Ray's RuntimeEnv docs](https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html) for details.|
227
-
|Python dependency lists|Use `.requirements_path(path: str)` or `conda_environment_path(path: str)`|
230
+
|Local python packages|Will be uploaded if you set `.upload_site_packages(True)`.|
231
+
|Python packages to be installed|Use `.pip(packages: list[str])` or `.conda(packages: dict[str, Any])`. See [Ray's RuntimeEnv docs](https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html) for details.|
232
+
|Python dependency lists|Use `.requirements_path(path: str)` or `.conda_environment_path(path: str)`|
228
233
|Local python packages outside of `site_packages`|Use `.py_modules(modules: list[str])` or `.add_py_module(module: str)`. See [Ray's RuntimeEnv docs](https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html) for details.|
229
234
|Container image for head node|Use `.head_image(head_image: str)` or `default_head_image()` to use the default. Note that, if the image is also defined in the GenevaCluster, the image set here in the Manifest will take priority.|
230
235
|Container image for worker nodes|Use `.worker_image(worker_image: str)` or `default_worker_image()` to use the default for the current platform. As with the head image, this takes priority over any images set in the Cluster.|
@@ -242,7 +247,6 @@ Calling `context` will enter a context manager that will provision an execution
242
247
db = geneva.connect(my_db_uri)
243
248
tbl = db.get_table("my_table")
244
249
245
-
# Providing a manifest is optional; if not provided, it will work as described in "Use defaults" above.
246
250
with db.context(cluster=cluster_name, manifest=manifest_name):
247
251
tbl.backfill("embedding")
248
252
```
@@ -253,7 +257,7 @@ In a notebook environment, you can manually enter and exit the context manager i
0 commit comments