-
Notifications
You must be signed in to change notification settings - Fork 5
Searching for Interactions in GBIF iDigBio Darwin Core Archives
| Property | Value |
|---|---|
| #CPU's | 16 |
| RAM | 30G |
| OS | Ubuntu 20.04.02 LTS |
| SSD | 0.5T |
| Elton version | 0.10.13 |
| Preston version | 0.2.5 |
| Nomer version | 0.1.27 |
The results presented here can be reproduced by the following steps:
- Clone the repository https://github.com/zedomel/globi-dwca-index:
git clone https://github.com/zedomel/globi-dwca-index
- Download and install
preston - Download and install
elton - Execute
preston-dwca-interactons.shscript (optionally theOUTPUT_DIRcan be specified):
bash preston-dwca-interactons.sh <output_dir>
The script will use the latest version of the biodiversity dataset graph at https://deeplinker.bio.
QUERY_HASH/PROVENANCE HASH: hash://sha256/810b22c16e1a3911c6eecfca348758d3ffd5b29fc36990015cda6427bdde2233
| Number of datasets (DwC-A) | 55,928 |
|---|---|
| Number of Records scanned | 574,715,196 |
| Number of potential interactions* | XXX |
| Number of recognized interactions** | XXX |
* Total of potential interactions: includes all DwC-A records which has non-empty values for any term indexed by elton.
** Total of recognized interactions: includes all DwC-A records which has valid values for any term indexed by elton. A record is valid if it contains a recognized interaction type (see interaction types mappings for a list of recognized interaction types) and non-empty taxonomic names for the sourceTaxon and targetTaxon.
A total of 38,268 DwC-Archives are downloaded from and processed using elton. elton is capable to find interactions records documented in many different ways in DwC-Archives (e.g. associatedTaxa, associatedOccurrences, resourceRelantionship, occurrenceRemarks, description).
A total of 206,894,040 records were scanned from the DwC-Archives. The total number of records which include potential interactions data (i.e. interaction types (un)recognized by elton) was 2,448,044. The number of recognized interactions was 1,601,092. It corresponds to 65.4% of the total number of potential interactions and 0.77% of the total number records scanned.
The are many different ways to document biological interactions using DwC. elton is capable to recognize many of them, but considering the flexibility of DwC standard it may be unpractical to handle all the possibilities. Thus, biological interactions could be documented in many different ways (not recognized by elton), specially when expressed in natural language using terms such occurrenceRemarks.
The total number of records per DwC term/class is show bellow (it list only terms/classes recognized by elton):
| Term/Class | #Records | % |
|---|---|---|
http://rs.tdwg.org/dwc/terms/associatedTaxa |
2,246,646 | 91.77% |
http://rs.tdwg.org/dwc/terms/associatedOccurrences |
195,083 | 7.97% |
http://rs.gbif.org/terms/1.0/Reference |
4,064 | 0.17% |
http://rs.tdwg.org/dwc/terms/Taxon |
1,521 | 0.06% |
http://rs.tdwg.org/dwc/terms/ResourceRelationship |
677 | 0.03% |
http://rs.tdwg.org/dwc/terms/occurrenceRemarks |
35 | <0.01% |
| TOTAL | 2,448,026 | 100% |