Skip to content

Searching for Interactions in GBIF iDigBio Darwin Core Archives

José Augusto Salim edited this page Jun 16, 2021 · 8 revisions

Searching for Interactions in GBIF/iDigBio Darwin Core Archives

System specification

Property Value
#CPU's 16
RAM 30G
OS Ubuntu 20.04.02 LTS
SSD 0.5T
Elton version 0.10.13
Preston version 0.2.5
Nomer version 0.1.27

Steps to reproduce

The results presented here can be reproduced by the following steps:

git clone https://github.com/zedomel/globi-dwca-index
  • Download and install preston
  • Download and install elton
  • Execute preston-dwca-interactons.sh script (optionally the OUTPUT_DIR can be specified):
bash preston-dwca-interactons.sh <output_dir>

The script will use the latest version of the biodiversity dataset graph at https://deeplinker.bio.

QUERY_HASH/PROVENANCE HASH: hash://sha256/810b22c16e1a3911c6eecfca348758d3ffd5b29fc36990015cda6427bdde2233

Results

Number of datasets (DwC-A) 55,928
Number of Records scanned 574,715,196
Number of potential interactions* 8,300,191
Number of recognized interactions** 5,080,837

* Total of potential interactions: includes all DwC-A records which has non-empty values for any term indexed by elton.

** Total of recognized interactions: includes all DwC-A records which has valid values for any term indexed by elton. A record is valid if it contains a recognized interaction type (see interaction types mappings for a list of recognized interaction types) and non-empty names for the sourceTaxon and targetTaxon (even if it is not a valid taxon name).

Examples of recognized interactions:

sourceTaxon interactionType targetTaxon
Schoenoplectus pungens interactsWith Eupatorium coelestinum
Pilea pumila interactsWith Found in close proximity to Thalictrum dasycarpum and Asarum canadense
Nymphaea odorata interactsWith also in close proximity to Utricularia vulgaris and Zizania aquatica
Carex grayi interactsWith Associated species include Fraxinus sp.
EFI.INT | DFDE | 418 | 418 interactsWith Coniferae

DwC Terms and Interactions

The are many different ways to document biological interactions using DwC. elton is capable to recognize many of them, but considering the flexibility of DwC standard it may be unpractical to handle all the possibilities. Thus, biological interactions could be documented in many different ways not recognized by elton, specially when expressed in natural language using terms such occurrenceRemarks.

Total number of records with potential interactions

The total number of records with potential interactions per DwC term/class is show bellow (it list only terms/classes indexed by elton).

Term/Class #Records %
http://rs.gbif.org/terms/1.0/Reference 4091 0.05%
http://rs.tdwg.org/dwc/terms/Occurrence 1 0.00%
http://rs.tdwg.org/dwc/terms/ResourceRelationship 1492374 17.98%
http://rs.tdwg.org/dwc/terms/ResourceRelationship | http://rs.tdwg.org/dwc/terms/Occurrence 86969 1.05%
http://rs.tdwg.org/dwc/terms/ResourceRelationship | http://rs.tdwg.org/dwc/terms/Taxon 965 0.01%
http://rs.tdwg.org/dwc/terms/Taxon 1151 0.01%
http://rs.tdwg.org/dwc/terms/associatedOccurrences | http://rs.tdwg.org/dwc/terms/Occurrence 271383 3.27%
http://rs.tdwg.org/dwc/terms/associatedTaxa | http://rs.tdwg.org/dwc/terms/Occurrence 6350347 76.51%
http://rs.tdwg.org/dwc/terms/dynamicProperties | http://rs.tdwg.org/dwc/terms/Occurrence 14 0.00%
http://rs.tdwg.org/dwc/terms/occurrenceRemarks | http://rs.tdwg.org/dwc/terms/Occurrence 92858 1.12%
TOTAL 8,300,153 100%

Total number of records with recognized interactions

The total number of records with recognized interactions (ie. valid interactions) per DwC term/class is show bellow (it list only terms/classes indexed by elton).

Term/Class #Records %
http://rs.gbif.org/terms/1.0/Reference 63 0.00%
http://rs.tdwg.org/dwc/terms/ResourceRelationship | http://rs.tdwg.org/dwc/terms/Occurrence 4,399 0.09%
http://rs.tdwg.org/dwc/terms/ResourceRelationship | http://rs.tdwg.org/dwc/terms/Taxon 677 0.01%
http://rs.tdwg.org/dwc/terms/Taxon 4 0.00%
http://rs.tdwg.org/dwc/terms/associatedOccurrences | http://rs.tdwg.org/dwc/terms/Occurrence 77,464 1.52%
http://rs.tdwg.org/dwc/terms/associatedTaxa | http://rs.tdwg.org/dwc/terms/Occurrence 4,905,348 96.55%
http://rs.tdwg.org/dwc/terms/dynamicProperties | http://rs.tdwg.org/dwc/terms/Occurrence 14 0.00%
http://rs.tdwg.org/dwc/terms/occurrenceRemarks | http://rs.tdwg.org/dwc/terms/Occurrence 92,856 1.83%
null 12 0.00%
TOTAL 5,080,837 100.00%

Interaction Types

The list of recognized interaction types used in this report can be found at . The list includes all expressions used by dataset authors to express any association between documented organisms, but many of them are not valid interaction types. Expressions in the list which don't have a respective mapping to the Relations Ontology are ignored my elton, and thus, are not valid interactions.

Interaction types of records with potential interactions

The potential interactions records contains many expressions for interaction type not recognized by elton or not valid interactions. The list of all "interaction types" (valid and invalid) can be found at List of All Interaction Types.

Interaction types of records with recognized interactions

Interaction type Number of Records
adjacentTo 93,448
eatenBy 94
eats 67
ectoparasiteOf 4,399
hasHost 389,325
hostOf 38,883
interactsWith 4,515,410
parasiteOf 39,157
visits 12
visitsFlowersOf 42

Interactions by Taxa

Kingdom

We use nomer to find the kingdom of all sourceTaxonName in the recognized interactions records:

cat interactions.tsv.bz2 | bunzip2 | cut -f8 | awk '{print "\t"$0}' | nomer replace ncbi-taxon -p nomer.properties | sort | uniq -c

The nomer.properties file includes all default settings (nomer properties) and we only changed the follow line to:

nomer.schema.output=[{"column":0,"type":"path.kingdom.id"},{"column": 1,"type":"path.kingdom.name"}]

NCBI taxonomy

Kingdom Count %
Viridiplantae 2,426,455 47.76%
Metazoa 307,297 6.05%
Fungi 35,429 0.70%
Not found 2,311,656 45.50%
TOTAL 5,080,837 100%

Globi Taxon Cache

Kingdom Count %
Animalia 628,490 12.37%
Archaeplastida 4,624 0.09%
Bacteria 220 0.00%
Biota 194 0.00%
Chromista 2,662 0.05%
Fungi 56,062 1.10%
Hepaticae 18 0.00%
Metazoa 8,042 0.16%
Orthornavirae 301 0.01%
Pararnavirae 1 0.00%
Plantae 2,852,619 56.14%
Protista 679 0.01%
Protozoa 303 0.01%
Shotokuvirae 12 0.00%
Viridiplantae 95,533 1.88%
Not found 1,431,077 28.17%
TOTAL 5,080,837 100.00%