Skip to content

TODOs #20

@tobiasschweizer

Description

@tobiasschweizer

TODOs for PoC:

  • complete endpoint configuration information for crawler config:

    • unique endpoint shortname (repository_suffix)
    • define metadata_prefix (in harvest_params?)
  • make FastAPI route for

    • reading OAI-PMH endpoint config from table endpoints and repositories. This create the config files currently under version control in https://github.com/EOSC-Data-Commons/metadata-harvester/tree/master/repos_config
    • pushing OAI-PMH harvesting results and additional metadata to table harvest_events: columns raw_metadata and additional_metadata. Ideally, this would allow for batches of data as the crawler gets them.
  • clarify whether src/utils/normalize_datacite_json.py could be moved to https://github.com/EOSC-Data-Commons/metadata-harvester: for development, it is convenient to map the file into the transform container using a bind mount (automatic reload without container rebuild). Maybe the src/utils/normalize_datacite_json.py could be part of a library which could be used in the celery task.

  • use a different Python client for postgreSQL like https://github.com/psycopg/psycopg2 and possibly a declarative class mapping tool such as SQLAlchemy, see Use SQLAlchemy #19

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions