Skip to content

Searching for Interactions in GBIF iDigBio Darwin Core Archives

José Augusto Salim edited this page Jun 15, 2021 · 8 revisions

Searching for Interactions in GBIF/iDigBio Darwin Core Archives

System specification

Property Value
#CPU's 16
RAM 30G
OS Ubuntu 20.04.02 LTS
SSD 0.5T
Elton version 0.10.13
Preston version 0.2.5
Nomer version 0.1.27

Steps to reproduce

The results presented here can be reproduced by the following steps:

git clone https://github.com/zedomel/globi-dwca-index
  • Download and install preston
  • Download and install elton
  • Execute preston-dwca-interactons.sh script (optionally the OUTPUT_DIR can be specified):
bash preston-dwca-interactons.sh <output_dir>

The script will use the latest version of the biodiversity dataset graph at https://deeplinker.bio.

QUERY_HASH/PROVENANCE HASH: hash://sha256/810b22c16e1a3911c6eecfca348758d3ffd5b29fc36990015cda6427bdde2233

Results

Number of datasets (DwC-A) 55,928
Number of Records scanned 574,715,196
Number of potential interactions* XXX
Number of recognized interactions** XXX

* Total of potential interactions: includes all DwC-A records which has non-empty values for any term indexed by elton.

** Total of recognized interactions: includes all DwC-A records which has valid values for any term indexed by elton. A record is valid if it contains a recognized interaction type (see interaction types mappings for a list of recognized interaction types) and non-empty taxonomic names for the sourceTaxon and targetTaxon.

A total of 38,268 DwC-Archives are downloaded from and processed using elton. elton is capable to find interactions records documented in many different ways in DwC-Archives (e.g. associatedTaxa, associatedOccurrences, resourceRelantionship, occurrenceRemarks, description).

A total of 206,894,040 records were scanned from the DwC-Archives. The total number of records which include potential interactions data (i.e. interaction types (un)recognized by elton) was 2,448,044. The number of recognized interactions was 1,601,092. It corresponds to 65.4% of the total number of potential interactions and 0.77% of the total number records scanned.

DwC Terms and Interactions

The are many different ways to document biological interactions using DwC. elton is capable to recognize many of them, but considering the flexibility of DwC standard it may be unpractical to handle all the possibilities. Thus, biological interactions could be documented in many different ways (not recognized by elton), specially when expressed in natural language using terms such occurrenceRemarks.

The total number of records per DwC term/class is show bellow (it list only terms/classes recognized by elton):

Term/Class #Records %
http://rs.tdwg.org/dwc/terms/associatedTaxa 2,246,646 91.77%
http://rs.tdwg.org/dwc/terms/associatedOccurrences 195,083 7.97%
http://rs.gbif.org/terms/1.0/Reference 4,064 0.17%
http://rs.tdwg.org/dwc/terms/Taxon 1,521 0.06%
http://rs.tdwg.org/dwc/terms/ResourceRelationship 677 0.03%
http://rs.tdwg.org/dwc/terms/occurrenceRemarks 35 <0.01%
TOTAL 2,448,026 100%

Interactions by Taxa

Clone this wiki locally