Add support for parsing CITATION.cff metadata files #4728
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Implements parser for CITATION.cff files used for software citation metadata. Supports CFF spec versions 1.0.0+ with best-effort extraction.
Changes
Key Features
cff-versionis required, all other fields optionalgeneric(citation metadata ≠ ecosystem-specific packages)Design Decisions
Why package type
generic?CFF describes citation metadata, not distributable artifacts. A CFF file can describe software, datasets, papers, or other citable objects. The
type: softwarefield in CFF ≠ package ecosystem (npm, pypi, etc.). Usinggenericavoids misleading over-inference.Why
extracted_license_statementnotdeclared_license_expression?CFF license field may be free-form text or non-SPDX identifiers. Using
extracted_license_statementallows ScanCode's license detection to normalize later, following best-effort extraction principles.Why only
cff-versionis required?Per CFF specification,
cff-versionis the only strictly required field. Other fields (message,authors,title) are recommended but context-dependent. This aligns with both the spec and ScanCode's philosophy.Test Coverage
Fixes Add support for citation file format #3580