Skip to content

Allow provider-specific data types#529

Merged
merkys merged 19 commits intoMaterials-Consortia:developfrom
merkys:custom-data-types
Sep 18, 2025
Merged

Allow provider-specific data types#529
merkys merged 19 commits intoMaterials-Consortia:developfrom
merkys:custom-data-types

Conversation

@merkys
Copy link
Member

@merkys merkys commented Jun 14, 2024

This PR attempts to generalize #436 by allowing to define provider-specific data types. The following conditions for such data types apply:

  • Their values have to be expressed as string literals both in responses and in filters;
  • They have to be named using provider-specific prefixes;
  • Properties can only be compared with values of the same type (as per specification up til now).

The latter condition might be too strict for some uses, but since casting is unclear at the moment, let us leave such extensions for future.

Merging this PR will allow to reassign #392 and #436 (SMILES) to _cheminfo_ namespace where it can become a property with special set of rules for comparison.

@merkys merkys requested review from blokhin, ml-evs, rartino and vaitkus June 14, 2024 14:11
merkys and others added 2 commits June 19, 2024 15:45
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
merkys and others added 3 commits June 21, 2024 12:21
@merkys merkys requested a review from vaitkus February 4, 2025 08:47
@blokhin
Copy link
Member

blokhin commented Feb 4, 2025

I have a general question: how to understand, if a prefix denotes a provider-specific datatype, not a general-purpose common datatype. In other words, compare _cheminfo_ and _mpds_, where is what?

@blokhin
Copy link
Member

blokhin commented Feb 4, 2025

apparently, by underscore as the first symbol, correct?

blokhin
blokhin previously approved these changes Feb 4, 2025
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
@merkys
Copy link
Member Author

merkys commented Feb 4, 2025

I have a general question: how to understand, if a prefix denotes a provider-specific datatype, not a general-purpose common datatype. In other words, compare _cheminfo_ and _mpds_, where is what?

Provider-specific data type names are only going to appear in x-optimade-type in property definitions. There they will be identifiable by an initial underscore and namespacing prefix. General purpose data types should be elevated to the main specification, thus without _<prefix>_.

@merkys merkys requested a review from vaitkus February 4, 2025 14:04
vaitkus
vaitkus previously approved these changes Feb 4, 2025
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
@merkys merkys requested a review from blokhin February 5, 2025 09:33
vaitkus
vaitkus previously approved these changes Feb 5, 2025
blokhin
blokhin previously approved these changes Feb 5, 2025
@blokhin
Copy link
Member

blokhin commented Feb 8, 2025

@merkys should we merge?

@merkys
Copy link
Member Author

merkys commented Feb 10, 2025

@merkys should we merge?

@blokhin I would like to get a review from @rartino as this change affects property definitions and might impact the way query filters are supposed to be handled by the backends.

@rartino
Copy link
Contributor

rartino commented Feb 10, 2025

Ok, I'll take a deeper look at this over the next couple of days.

@ml-evs
Copy link
Member

ml-evs commented Feb 10, 2025

Ok, I'll take a deeper look at this over the next couple of days.

Likewise, as I want to use this pretty much straight away!

Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up @merkys!

I only have one general comment; should we encourage the custom type to be described in the description of the fields that use it? Otherwise I don't see where this information would go (even if its only human-readable anyway). I've added two suggestions below along these lines.

Copy link
Contributor

@rartino rartino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work with this!

The one blocking issue I see is that the PR misses a corresponding edit under "Property Definition keys from JSON Schema" -> "type", where in the current edition of the PR (see below) should say that when x-optimade-type is set to a string that indicates a provider-specific datatype, "type" MUST be "string". (I see @ml-evs is "fixing" this on line 2171, but whether or not that is added, the change should also be done on in that part of the specification).

(I couldn't add this as an "edit" comment, it is too far away...)

I would be ok with merging this for string custom datatypes only. However, I think it is worthwhile to pause for one minute here to consider what I wrote in a reply to one of @ml-evs suggestions: can we lightly edit the PR to allow custom data types to be represented using any of the basic, list or dict types?

I foresee that basically the first thing that will happen once we implement this is someone coming along and asking "oh, but in our community we have this data thing that is always represented as an NxM matrix, and for this data the whole community obviously want the ">" operator to mean Z." ...

@merkys
Copy link
Member Author

merkys commented Feb 12, 2025

Thanks for reviews, @ml-evs and @rartino!

Data type description in both human-readable text and Property Definitions is noncontroversial, thus I will address it later on.

Now on the extension beyond the string-based data types. Historically, I drafted this PR having SMILES in mind, thus strings were enough for me. Moreover, we already have strings with internal semantics (timestamps). Now I do not see why we cannot extend this to all basic+list+dict data types, but yes, this needs some more thought.

Some issues coming to my mind as I am writing this:

  1. Intuition. A human reader is capable to perceive that a string has internal semantics. Just by looking at lists and dicts it is difficult to say whether they constitute a data type or not. And this risks becoming counterintuitive for number-based data types where, for example, > operator is overloaded to mean the opposite.

  2. Accessing constituents. Should filters on constituents of list (items) and dict (keys and values) be supported, re-using compound identifier syntax as everywhere else in OPTIMADE?

@rartino
Copy link
Contributor

rartino commented Apr 3, 2025

Just a note as I prepare for the meeting this afternoon: lets discuss the string-based vs. general custom data types at the meeting and decide whether it is fine with just string support for now.

@merkys Can you please look at @ml-evs edit suggestion above and act on it (either do it, reply, or close)

@merkys merkys dismissed stale reviews from blokhin and vaitkus via 1b282da April 3, 2025 14:54
@merkys
Copy link
Member Author

merkys commented Apr 8, 2025

During an online discussion on 2025-04-03 it has been decided to allow provider-specific data types to be based on any OPTIMADE data type, not just string. Thus the PR has to be adjusted to accommodate that.

@merkys
Copy link
Member Author

merkys commented Sep 9, 2025

During an online discussion on 2025-04-03 it has been decided to allow provider-specific data types to be based on any OPTIMADE data type, not just string. Thus the PR has to be adjusted to accommodate that.

I have updated the PR to reflect this decision.

Co-authored-by: Matthew Evans <git@ml-evs.science>
@merkys merkys requested a review from ml-evs September 18, 2025 06:27
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @merkys! I think this is in a good shape, even if there is no programmatic place to describe such types yet (e.g., if there are multiple properties that use the same custom type).

@merkys merkys merged commit aec060e into Materials-Consortia:develop Sep 18, 2025
4 checks passed
@merkys merkys deleted the custom-data-types branch September 18, 2025 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments