WoodKG is a knowledge graph for African Wood charcoal studies. This repository contains the tools that are used to build the 3 graphs that are linked together and form WoodKG:
- a biological taxonomy providing IRIs for taxa and scientific names,
- a thesaurus of anatomical characteristics being observed,
- the observations of charcoal samples.
The data sources are the following:
- Plants Of the World Online (POWO) which describes up to date taxonomic name and geolocation, and the World Checklist of Vascular Plants (WCVP) is the taxonomic names backbone that POWO relies on;
- International Association of Wood Anatomists's features list (IAWA). From this list we have derived a representation centered around the concepts of Feature of Interest (FoI), Observable Property (OP) and values, available as a shared document;
- Charcoal observations coming from 2 sources:
- InsideWood's charcoal descriptions;
- the Southern African wood CHArcoal description proided by research lab Cultures – Environnements. Préhistoire, Antiquité, Moyen Âge (CEPAM) Descriptions coming from InsideWood and CEPAM both use IAWA's features list.
Before starting, you must download two resources:
1. Morph-xR2RML
- Create folder
xr2rml. - CD to
xr2rmland install the necessary files and folders following the Docker installation instructions. - Open the file
mongo_tools/import-tools.sh. - Modify the following line:
to increase the size
MONGO_IMPORT_MAXSIZE=16000000
MONGO_IMPORT_MAXSIZE=160000000
Start Morph-xR2RML containers with docker-compose up -d.
Run the commands below to download the WCVP taxonomic data wccp_dwca.zip, extract the file wcvp_taxon.csv and place it in input/powo/raw.
From the project root, run:
mkdir -p input/powo/raw input/powo/currated
cd input/powo/raw
wget https://sftp.kew.org/pub/data-repositories/WCVP/wcvp_dwca.zip
unzip wcvp_dwca.zip wcvp_taxon.csvThen, return to the project root and run the script ./tools/powo/split_wcvp.sh.
This will split the csv file into chunks of maximum 100000 lines each.
This folder contains the data sources: WCVP taxonomy (powo/), [input/iawa_thesaurus](IAWA thesaurus), InsideWood observations, CEPAM observations.
Each folder contains two subfolders:
- raw/ for the raw files downloaded from their respective sources,
- currated/for the transformed versions ready to be used for RDF generation.
Contains the generated RDF files:
- the POWO taxonomy (powo_taxonomy_*.ttl),
- the IAWA thesaurus.
- the InsideWood or CEPAM observations (observations.ttl)
File unmatched_taxa.json gives the observations for which no taxonomic identifier was found in POWO.
Contains the scripts for transforming raw files to currated files, and currated files to RDF files.
Launch the main menu with: ./menu.sh
The menu offers different options by calling .sh scripts located in tools/<subfolder>/scripts:
-
Generate IAWA thesaurus as JSON
Transforms the IAWA thesaurus files from raw to currated usingtools/iawa_thesaurus/scripts/thesaurus.shandtools/iawa_thesaurus/scripts/iawa_properties.sh. -
Generate IAWA thesaurus as RDF
Generates iawa_thesaurus.ttl in output from JSON files usingtools/xr2rml/observation2xr2rml --thesaurus. Must be executed after option 1. -
Generate POWO taxonomy as JSON
Transforms WCVP taxonomic files from raw to currated usingtools/powo/scripts/powo.sh. -
Generate POWO taxonomy as RDF
Generates RDF files powo_taxonomy_*.ttl from JSON files usingtools/xr2rml/observation2xr2rml --taxon. -
Generate CEPAM observations as JSON
Transforms CEPAM observations from raw to currated usingtools/cepam_observations/scripts/.cepam_csvtojson.sh -
Generate InsideWood observations as JSON
Transforms InsideWood observations from raw to currated usingtools/insidewood_observations/scripts/insidewood_observations.sh. -
Generate observations as RDF
Requires a .json file (currated type) and generates RDF observations in output. -
Quit
Exit the menu.
Here is a complete execution example:
./menu.shThen in the menu:
- 1 → to generate the IAWA JSON thesaurus
- 3 → to generate the currated POWO files
- 5 → to transform CEPAM observations
- 7 → and enter this path:
input/cepam_observations/currated/CEPAM_feature_net_taxa_and_numbers_homogene.json- SPARQL, SOSA/SSN ontologies
- Morph-xR2RML
- Python 3.10.12