Panel Discussion at TDWG2023 Hobart Tasmania

TDWG 2023 Panel Discussion: Data Quality Task Group 2: Tests and Assertions

October 25, 2023

Panel: Lee Belbin (Convenor), Arthur Chapman and Paul Morris

26 People in attendance

Slide Introduction

The Team

Lee Belbin
Arthur Chapman
Paul Morris
John Wieczorek

The Context

Build upon a solid base:

TG2 started in 2014 with Lee Belbin, Arthur Chapman, Paul Morris, John Wieczorek, Paula Zermoglio and Alex Thompson…and a host of GitHub commenters
DQ Task Group 1: Fitness For Use Framework,
Darwin Core Occurrence records,
‘Specifications’
Multiply by > 3:

The Tests

99 tests: https://tinyurl.com/yxrxxu7r
66 Validations
25 AMENDMENTs
3 ISSUEs
5 MEASURES

The Vocabulary

150 terms: https://github.com/tdwg/bdq/issues/152
Described by-
namespaceTERM: bdqffdq:Validation
Term: Validation
Definition: A Data Quality needs level concept that describes a run of a test for validity. The bdqfdfq:Validation concept in the Tests consists of a run with a bdq:Response:result of bdq:COMPLIANT or bdq:NOT_COMPLIANT and a bdqffdq:Criterion that describes the conditions for validity that result in a status of bdq:COMPLIANT.
Context: FFU Framework: Class
Comment: Veiga et al. (2017)

Validation Data

1191 Darwin Core records: https://github.com/tdwg/bdq/issues/152
Described by-
- Test: VALIDATION-DAY-INRANGE
- DataID: 1003
- InTestID : 9
- Input.Data: dwc:day="30", dwc:month="2", dwc:year="1952"
- Response.status: RUN_HAS_RESULT
- Response.result : NOT_COMPLIANT

Future Directions

Complete the coding of the tests
Test Validation Data
Refine Specifications
Finalize bdq-CORE documents
Identify a Review Manager

Introduction to the Panel Discussion

Data Quality Task Group 2 (TG2) builds upon the Fitness For Use Framework of Task Group 1, the data quality Use Cases of Task Group 3. TG2 is based on a subset of Darwin Core terms and is focussed mainly on occurrence records. Evaluating ‘data quality’ was limited largely by a lack of controlled vocabularies associated with Darwin Core terms.

The most significant aspect of TG2’s work over the past 7 years has been the ‘specifications’; the terms used in describing ‘data quality’/fitness for use tests. These are described in the vocabulary. While we do not believe that all 99 tests developed by TG2 will be implemented by all agencies or in all environments (from data capture to data mining), we do believe that the specifications will be a foundation for understanding ‘data quality’ tests.

Directions

Complete the coding of the tests
Run the test data against all tests
Complete the bdq-CORE standard documentation
Identify a Review Manager

Discussion

Wouter Addink: How can new tests be added in the future?

Use the template provided by the specifications and the existing 99 tests. The process should be easy.

Wouter Addink: Could the tests could be extended to ABCD and not just Darwin Core?

No. The tests and the work of TG2 generally, was based on Darwin Core terms. That said, tests based on our specifications could be applied to ABCD terms in the same way that new tests could be added for Darwin Core terms not currently tested or tests that used different combinations of Darwin Core terms. Dave Watts: What is the number of tests per record and what if the number of records is high? Each record gets up to three passes – All tests of type VALIDATION are run on each record, then potential AMENDMENTS are applied to the record, then the VALIDATION tests are re-run.

Dave Watts: Should you run all the tests against all 1 million records in a dataset for example?

Why not?

Robert Mesibov: Have you considered a score for datasets? What would the data quality score look like?

The 99 tests are assertions about a single record. The ‘tests’ of type MEASURE do return a number, for example one MEASURE is how many of the VALIDATION tests returned a status of "compliant". These MEASUREs can be accumulated across any number of records, a dataset. Paul also mentioned that the Framework allowed for MultiRecord Measures, but we weren’t implementing them at this stage.

Robert Mesibov: If for example, someone wants to use GBIF data for species distribution modelling, could a dataset have a score such as ‘level 2’ that

I can use for SDM? How can you come up with a single score? You can't come up with a generic score as it depends on the specific requirements of the use case. Fitness for use states that a set of data is fit for a single specific purpose and will not necessarily be suitable for any other application.

QUESTION: Can the tests apply to data on traits (descriptive characteristics)?

The principles used in the development of our tests could be applied to any biodiversity-related data. For example, specifications such as ‘Source Authority’ apply in any environment.

Henry Englebrecht: What about testing combinations of terms such as date, collector, location and recordedBy?

Such a combination is not among the core tests, but it would be easy to implement using the current specifications as noted earlier.

Chandra Earl: What are the plans for the implementation of the tests?

Most of the tests are already implemented in FilteredPush, Kurator. We believe that the tests could be built into databases like Symbiota.

QUESTION: Advice for data quality flags? Same tests from different aggregators.

Flags are not enough. There is need for a formal structure for the responses. Can the test be run? What is the status of the result? is the validation compliant or not? We have added human readable comments.

QUESTION: Do you have any recommendations on how we can organize with end users, so that we can identify which traits are most useful, which should be prioritise?

In short, no, as this is outside the scope of our work.

QUESTION: (I think the lady from Norway) on many issues with legacy data.

As a Task Group, we would always encourage the publication of all data, regardless of ‘age’ and ‘quality’ as this is the best way of a) knowing that the data exists and b) identifying and hopefully correcting at least some exposed errors. We need user feedback for correction, ideally with our data quality tests as a framework. Users can and should filter out data that is not fit for their purpose. She agreed that we want the data to be used and that it is important to document. She said we want better ways to document data quality, and this looks like a good solution.

Knut: Data is pipe into forestry where errors are not desirable. Need for standard to annotate the quality. W3C annotation is insufficient.

The annotation could be "I've used this data in this way, it works great" and is agnostic about how data quality reports are presented.

QUESTION: What is W3C annotation standard?

W3C annotation model (https://www.w3.org/TR/annotation-model/). It allows for annotations of annotation. GBIF is looking to supporting W3C annotations. There is also Annotation framework in DISSCO. It makes use of a selector in individual fields in data. sets of fields and classes. It can also annotate the object based on the identifier.

Panel Discussion at TDWG2023 Hobart Tasmania

TDWG 2023 Panel Discussion: Data Quality Task Group 2: Tests and Assertions

Panel: Lee Belbin (Convenor), Arthur Chapman and Paul Morris

Slide Introduction

The Team

The Context

Introduction to the Panel Discussion

Directions

Discussion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally