-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Describe the bug
Using the to_dict method of some of the ac_bioc dataclasses gets a different result to using the BioCJSON class (which is determined by the BioCJSONEncoder)
To Reproduce
For example, serialising the BioCPassage dataclass returns a very different result:
>>> from autocorpus.ac_bioc import BioCJSON, BioCPassage
>>> p = BioCPassage()
>>> p
BioCPassage(text='', offset=0, infons={}, sentences=[], annotations=[], relations=[])
>>> p.to_dict() # Does not include "annotations" or "relations"
{'text': '', 'offset': 0, 'infons': {}, 'sentences': []}
>>> print(BioCJSON.dumps(p)) # Does not include "sentences"
{"offset": 0, "infons": {}, "text": "", "annotations": [], "relations": []}Expected behavior
These two approaches to get a dictionary should yield the same result.
Suggested solution
I suggest changing the default method of the BioCJSONEncoder to just use the to_dict methods and adjust the to_dict method to match the desired behaviour.
If you just want to include every field automatically and not need to update the to_dict method ever, I suggest using the asdict function from the datalcasses module. This will recursively unpack everything. For example:
>>> from autocorpus.ac_bioc import BioCPassage, BioCSentence
>>> p = BioCPassage(sentences=[BioCSentence("hello", 2)])
>>> p
BioCPassage(text='', offset=0, infons={}, sentences=[BioCSentence(text='hello', offset=2, infons={}, annotations=[], relations=[])], annotations=[], relations=[])
>>> p.to_dict() # Missing fields from both dataclasses
{'text': '', 'offset': 0, 'infons': {}, 'sentences': [{'text': 'hello', 'offset': 2, 'infons': {}, 'annotations': []}]}
>>> from dataclasses import asdict
>>> asdict(p) # All fields present and converts the nested "sentences" to a dict
{'text': '', 'offset': 0, 'infons': {}, 'sentences': [{'text': 'hello', 'offset': 2, 'infons': {}, 'annotations': [], 'relations': []}], 'annotations': [], 'relations': []}To include this in a dataclass is as simple as:
from dataclasses import dataclass, asdict
@dataclass
class MyClass():
field1: int
to_dict = asdictContext
Please, complete the following to better understand the system you are using to run Auto-CORPus.
- Operating system (eg. Windows 10): MacOS 14.7.6
- Auto-CORPus version (eg. 1.0.0): Current
mainbranch - Installation method (eg. pipx, pip, development mode): dev mode with poetry
- Python version (you can get this running
python --version): 3.13.1