-
Notifications
You must be signed in to change notification settings - Fork 5
Searching for Interactions in GBIF iDigBio Darwin Core Archives
| Property | Value |
|---|---|
| #CPU's | 16 |
| RAM | 30G |
| OS | Ubuntu 20.04.02 LTS |
| SSD | 0.5T |
| Elton version | 0.10.13 |
| Preston version | 0.2.5 |
| Nomer version | 0.1.27 |
The results presented here can be reproduced by the following steps:
- Clone the repository https://github.com/zedomel/globi-dwca-index:
git clone https://github.com/zedomel/globi-dwca-index
- Download and install
preston - Download and install
elton - Execute
preston-dwca-interactons.shscript (optionally theOUTPUT_DIRcan be specified):
bash preston-dwca-interactons.sh <output_dir>
The script will use the latest version of the biodiversity dataset graph at https://deeplinker.bio.
QUERY_HASH/PROVENANCE HASH: hash://sha256/810b22c16e1a3911c6eecfca348758d3ffd5b29fc36990015cda6427bdde2233
| Number of datasets (DwC-A) | 55,928 |
|---|---|
| Number of Records scanned | 574,715,196 |
| Number of potential interactions* | 8,300,191 |
| Number of recognized interactions** | 5,080,837 |
* Total of potential interactions: includes all DwC-A records which has non-empty values for any term indexed by elton.
** Total of recognized interactions: includes all DwC-A records which has valid values for any term indexed by elton. A record is valid if it contains a recognized interaction type (see interaction types mappings for a list of recognized interaction types) and non-empty names for the sourceTaxon and targetTaxon (even if it is not a valid taxon name).
Examples of recognized interactions:
| sourceTaxon | interactionType | targetTaxon |
|---|---|---|
| Schoenoplectus pungens | interactsWith | Eupatorium coelestinum |
| Pilea pumila | interactsWith | Found in close proximity to Thalictrum dasycarpum and Asarum canadense |
| Nymphaea odorata | interactsWith | also in close proximity to Utricularia vulgaris and Zizania aquatica |
| Carex grayi | interactsWith | Associated species include Fraxinus sp. |
| EFI.INT | DFDE | 418 | 418 | interactsWith | Coniferae |
The are many different ways to document biological interactions using DwC. elton is capable to recognize many of them, but considering the flexibility of DwC standard it may be unpractical to handle all the possibilities. Thus, biological interactions could be documented in many different ways not recognized by elton, specially when expressed in natural language using terms such occurrenceRemarks.
The total number of records with potential interactions per DwC term/class is show bellow (it list only terms/classes indexed by elton).
| Term/Class | #Records | % |
|---|---|---|
| http://rs.gbif.org/terms/1.0/Reference | 4091 | 0.05% |
| http://rs.tdwg.org/dwc/terms/Occurrence | 1 | 0.00% |
| http://rs.tdwg.org/dwc/terms/ResourceRelationship | 1492374 | 17.98% |
| http://rs.tdwg.org/dwc/terms/ResourceRelationship | http://rs.tdwg.org/dwc/terms/Occurrence | 86969 | 1.05% |
| http://rs.tdwg.org/dwc/terms/ResourceRelationship | http://rs.tdwg.org/dwc/terms/Taxon | 965 | 0.01% |
| http://rs.tdwg.org/dwc/terms/Taxon | 1151 | 0.01% |
| http://rs.tdwg.org/dwc/terms/associatedOccurrences | http://rs.tdwg.org/dwc/terms/Occurrence | 271383 | 3.27% |
| http://rs.tdwg.org/dwc/terms/associatedTaxa | http://rs.tdwg.org/dwc/terms/Occurrence | 6350347 | 76.51% |
| http://rs.tdwg.org/dwc/terms/dynamicProperties | http://rs.tdwg.org/dwc/terms/Occurrence | 14 | 0.00% |
| http://rs.tdwg.org/dwc/terms/occurrenceRemarks | http://rs.tdwg.org/dwc/terms/Occurrence | 92858 | 1.12% |
| TOTAL | 8,300,153 | 100% |
The total number of records with recognized interactions (ie. valid interactions) per DwC term/class is show bellow (it list only terms/classes indexed by elton).
| Term/Class | #Records | % |
|---|---|---|
| http://rs.gbif.org/terms/1.0/Reference | 63 | 0.00% |
| http://rs.tdwg.org/dwc/terms/ResourceRelationship | http://rs.tdwg.org/dwc/terms/Occurrence | 4,399 | 0.09% |
| http://rs.tdwg.org/dwc/terms/ResourceRelationship | http://rs.tdwg.org/dwc/terms/Taxon | 677 | 0.01% |
| http://rs.tdwg.org/dwc/terms/Taxon | 4 | 0.00% |
| http://rs.tdwg.org/dwc/terms/associatedOccurrences | http://rs.tdwg.org/dwc/terms/Occurrence | 77,464 | 1.52% |
| http://rs.tdwg.org/dwc/terms/associatedTaxa | http://rs.tdwg.org/dwc/terms/Occurrence | 4,905,348 | 96.55% |
| http://rs.tdwg.org/dwc/terms/dynamicProperties | http://rs.tdwg.org/dwc/terms/Occurrence | 14 | 0.00% |
| http://rs.tdwg.org/dwc/terms/occurrenceRemarks | http://rs.tdwg.org/dwc/terms/Occurrence | 92,856 | 1.83% |
| null | 12 | 0.00% |
| TOTAL | 5,080,837 | 100.00% |
The list of recognized interaction types used in this report can be found at . The list includes all expressions used by dataset authors to express any association between documented organisms, but many of them are not valid interaction types. Expressions in the list which don't have a respective mapping to the Relations Ontology are ignored my elton, and thus, are not valid interactions.
The potential interactions records contains many expressions for interaction type not recognized by elton or not valid interactions. The list of all "interaction types" (valid and invalid) can be found at List of All Interaction Types.
| Interaction type | Number of Records |
|---|---|
| adjacentTo | 93,448 |
| eatenBy | 94 |
| eats | 67 |
| ectoparasiteOf | 4,399 |
| hasHost | 389,325 |
| hostOf | 38,883 |
| interactsWith | 4,515,410 |
| parasiteOf | 39,157 |
| visits | 12 |
| visitsFlowersOf | 42 |
We use nomer to find the kingdom of all sourceTaxonName in the recognized interactions records:
cat interactions.tsv.bz2 | bunzip2 | cut -f8 | awk '{print "\t"$0}' | nomer replace ncbi-taxon -p nomer.properties | sort | uniq -c
The nomer.properties file includes all default settings (nomer properties) and we only changed the follow line to:
nomer.schema.output=[{"column":0,"type":"path.kingdom.id"},{"column": 1,"type":"path.kingdom.name"}]
| Kingdom | Count | % |
|---|---|---|
| Viridiplantae | 2,426,455 | 47.76% |
| Metazoa | 307,297 | 6.05% |
| Fungi | 35,429 | 0.70% |
| Not found | 2,311,656 | 45.50% |
| TOTAL | 5,080,837 | 100% |
| Kingdom | Count | % |
|---|---|---|
| Animalia | 628,490 | 12.37% |
| Archaeplastida | 4,624 | 0.09% |
| Bacteria | 220 | 0.00% |
| Biota | 194 | 0.00% |
| Chromista | 2,662 | 0.05% |
| Fungi | 56,062 | 1.10% |
| Hepaticae | 18 | 0.00% |
| Metazoa | 8,042 | 0.16% |
| Orthornavirae | 301 | 0.01% |
| Pararnavirae | 1 | 0.00% |
| Plantae | 2,852,619 | 56.14% |
| Protista | 679 | 0.01% |
| Protozoa | 303 | 0.01% |
| Shotokuvirae | 12 | 0.00% |
| Viridiplantae | 95,533 | 1.88% |
| Not found | 1,431,077 | 28.17% |
| TOTAL | 5,080,837 | 100.00% |