-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Description
Having created an esm_datastore, I'd like to use regular expressions (regex) to search for datasets that contain a given variable. However, the use of regex does not appear to be supported on columns with iterables. For example:
import intake
> datastore = intake.open_esm_datastore(
'cryo_input_datastore.json',
columns_with_iterables=['variable'])
This successfully loads the datastore which contains multiple records with the following unique variables:
> datastore.unique().variable
['bed', 'dataid', 'errbed', 'firn', 'geoid', 'mask', 'source', 'surface', 'thickness', 'x', 'y', 'bed_topography', 'bed_uncertainty', 'ice_thickness', 'surface_topography', 'thickness_survey_count', 'thickness_uncertainty', 'CNT', 'ERRX', 'ERRY', 'STDX', 'STDY', 'VX', 'VY', 'lat', 'lon']
To filter the datastore to show datasets that contain bed topography, I use the .*bed.* regex to search, but this returns no entries:
> ds = datastore.search(variable = '.*bed.*')
> print(ds)
<cryo_input_datastore catalog with 0 dataset(s) from 0 asset(s)>
I would expect it to return entries for all datasets that contain bed in the variable field. If I use variable_standard_name in the search function, it returns entries as expected. This is because variable_standard_name is not included in the columns_with_iterables when loading the datastore.
Version information: output of intake_esm.show_versions()
INSTALLED VERSIONS
------------------
cftime: 1.6.4
dask: 2025.5.1
fastprogress: 1.0.3
fsspec: 2025.7.0
gcsfs: 2025.7.0
intake: 2.0.8
intake_esm: 2025.7.9
netCDF4: 1.7.2
pandas: 2.3.1
requests: 2.32.4
s3fs: 2025.7.0
xarray: 2025.6.1
zarr: 2.18.7
Cause
This behaviour appears to be intended and is implemented here.