Skip to content

Cannot search with regex on columns with iterables #736

@lawrenceabird

Description

@lawrenceabird

Description

Having created an esm_datastore, I'd like to use regular expressions (regex) to search for datasets that contain a given variable. However, the use of regex does not appear to be supported on columns with iterables. For example:

import intake

> datastore = intake.open_esm_datastore(
                        'cryo_input_datastore.json',
                        columns_with_iterables=['variable'])

This successfully loads the datastore which contains multiple records with the following unique variables:

> datastore.unique().variable

['bed', 'dataid', 'errbed', 'firn', 'geoid', 'mask', 'source', 'surface', 'thickness', 'x', 'y', 'bed_topography', 'bed_uncertainty', 'ice_thickness', 'surface_topography', 'thickness_survey_count', 'thickness_uncertainty', 'CNT', 'ERRX', 'ERRY', 'STDX', 'STDY', 'VX', 'VY', 'lat', 'lon']

To filter the datastore to show datasets that contain bed topography, I use the .*bed.* regex to search, but this returns no entries:

> ds = datastore.search(variable = '.*bed.*')
> print(ds)

<cryo_input_datastore catalog with 0 dataset(s) from 0 asset(s)>

I would expect it to return entries for all datasets that contain bed in the variable field. If I use variable_standard_name in the search function, it returns entries as expected. This is because variable_standard_name is not included in the columns_with_iterables when loading the datastore.

Version information: output of intake_esm.show_versions()

INSTALLED VERSIONS
------------------

cftime: 1.6.4
dask: 2025.5.1
fastprogress: 1.0.3
fsspec: 2025.7.0
gcsfs: 2025.7.0
intake: 2.0.8
intake_esm: 2025.7.9
netCDF4: 1.7.2
pandas: 2.3.1
requests: 2.32.4
s3fs: 2025.7.0
xarray: 2025.6.1
zarr: 2.18.7

Cause

This behaviour appears to be intended and is implemented here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions