feat: parse wfs for meta and layers infos#385
Conversation
|
TODO
|
maudetes
left a comment
There was a problem hiding this comment.
Nice! It works nicely 🎉
Main main comment is on the scope of the hydra detection 😛
|
|
||
| # Extract service metadata | ||
| try: | ||
| metadata = { |
There was a problem hiding this comment.
I'm not sure we want to send the entire WFS capabilities for every layer in the resource extras?
For example this ressource would have 600+ layers described in its extra with 115 CRS for each.
If we end up with one WFS URL for each layer, we would be bloating the dataset a bit.
Shouldn't we resolve the layer in hydra and ignore the others?
There was a problem hiding this comment.
Interesting! I was wary of doing too much business logic in hydra, but we could put the resource title / url layer name detection here.
But we also need the full layers list to show the user or the QGIS export when no valid layer name is detected (or even maybe to let the user switch layers easily, on a front map for example)...
So:
- do the layer matching game and full list of layers only when not valid?
- handle a MAX_LAYERS threshold
- both
- stay as is and let the consumers do the layer dance
WDYT?
There was a problem hiding this comment.
Claude says 600+ layers with 115 CRS would be a 1MB payload, so that's definitely an issue...
There was a problem hiding this comment.
And obviously, the biggest issue is the CRS multiplication. We could:
- store only the default CRS
- store only the CRS defined in an accept list (most common)
I'd say go with 1 and keep all the layers for now.
There was a problem hiding this comment.
I would agree with storing only the default CRS for now.
For the layers, I think the layer deduction logic here is the most important part? I think listing the features is quite easy but detecting the layer is more complex part and thus would help the different frontend usages the most, WDYT?
| payload["document"]["analysis:parsing:geojson_size"] = check.get("geojson_size") | ||
| if config.OGC_ANALYSIS_ENABLED and check.get("ogc_metadata"): | ||
| ogc_metadata = check.get("ogc_metadata") | ||
| if isinstance(ogc_metadata, str): |
There was a problem hiding this comment.
Those days, it's always a string :p See here
hydra/udata_hydra/db/__init__.py
Line 8 in 4d24f24
This add support for WFS services to hydra 🐙.
If a WFS is url-detected, it will fetch general info (WFS version, output formats) and layers info (names and supported projections). The general workflow is respected (this can be challenged) : WFS is handled as any other resource type, and then augmented by the scraped infos.
The scraped infos are stored in
checks.ogc_metadataas JSONB. They are sent is to udata inanalysis:parsing:ogc_metadatafor every resource. I went with a full object since the layer structure has to be complex anyway (ie more than a list of strings).There's a bit of future-proofing here:
ogc_metadatais ready for other formats if needed (eg WMS).owslibis used to parse the WFS XML: lib seems mature and it's always nice to avoid parsing XMLs manually 😬.I have tested end-to-end with a local udata instance.
Bonus: a bit of refactoring in
cli.pywith_find_checkhelper.Fix ecolabdata/ecospheres#892
Related ecolabdata/ecospheres#846