WoodKG

WoodKG is a knowledge graph for African Wood charcoal studies. This repository contains the tools that are used to build the 3 graphs that are linked together and form WoodKG:

a biological taxonomy providing IRIs for taxa and scientific names,
a thesaurus of anatomical characteristics being observed,
the observations of charcoal samples.

The data sources are the following:

Plants Of the World Online (POWO) which describes up to date taxonomic name and geolocation, and the World Checklist of Vascular Plants (WCVP) is the taxonomic names backbone that POWO relies on;
International Association of Wood Anatomists's features list (IAWA). From this list we have derived a representation centered around the concepts of Feature of Interest (FoI), Observable Property (OP) and values, available as a shared document;
Charcoal observations coming from 2 sources:
- InsideWood's charcoal descriptions;
- the Southern African wood CHArcoal description proided by research lab Cultures – Environnements. Préhistoire, Antiquité, Moyen Âge (CEPAM) Descriptions coming from InsideWood and CEPAM both use IAWA's features list.

Installation

Before starting, you must download two resources:

1. Morph-xR2RML

Create folder xr2rml.
CD to xr2rml and install the necessary files and folders following the Docker installation instructions.
Open the file mongo_tools/import-tools.sh.

Modify the following line:

MONGO_IMPORT_MAXSIZE=16000000

to increase the size

MONGO_IMPORT_MAXSIZE=160000000

Start Morph-xR2RML containers with docker-compose up -d.

2. WCVP - Plant taxonomy

Run the commands below to download the WCVP taxonomic data wccp_dwca.zip, extract the file wcvp_taxon.csv and place it in input/powo/raw.

From the project root, run:

mkdir -p input/powo/raw input/powo/currated
cd input/powo/raw
wget https://sftp.kew.org/pub/data-repositories/WCVP/wcvp_dwca.zip
unzip wcvp_dwca.zip wcvp_taxon.csv

Then, return to the project root and run the script ./tools/powo/split_wcvp.sh. This will split the csv file into chunks of maximum 100000 lines each.

Repository Structure

input

This folder contains the data sources: WCVP taxonomy (powo/), [input/iawa_thesaurus](IAWA thesaurus), InsideWood observations, CEPAM observations.

Each folder contains two subfolders: - raw/ for the raw files downloaded from their respective sources, - currated/for the transformed versions ready to be used for RDF generation.

More details.

output

Contains the generated RDF files:

the POWO taxonomy (powo_taxonomy_*.ttl),
the IAWA thesaurus.
the InsideWood or CEPAM observations (observations.ttl)

File unmatched_taxa.json gives the observations for which no taxonomic identifier was found in POWO.

tools

Contains the scripts for transforming raw files to currated files, and currated files to RDF files.

Usage

Launch the main menu with: ./menu.sh

The menu offers different options by calling .sh scripts located in tools/<subfolder>/scripts:

Generate IAWA thesaurus as JSON
Transforms the IAWA thesaurus files from raw to currated using tools/iawa_thesaurus/scripts/thesaurus.sh and tools/iawa_thesaurus/scripts/iawa_properties.sh.
Generate IAWA thesaurus as RDF
Generates iawa_thesaurus.ttl in output from JSON files using tools/xr2rml/observation2xr2rml --thesaurus. Must be executed after option 1.
Generate POWO taxonomy as JSON
Transforms WCVP taxonomic files from raw to currated using tools/powo/scripts/powo.sh.
Generate POWO taxonomy as RDF
Generates RDF files powo_taxonomy_*.ttl from JSON files using tools/xr2rml/observation2xr2rml --taxon.
Generate CEPAM observations as JSON
Transforms CEPAM observations from raw to currated using tools/cepam_observations/scripts/.cepam_csvtojson.sh
Generate InsideWood observations as JSON
Transforms InsideWood observations from raw to currated using tools/insidewood_observations/scripts/insidewood_observations.sh.
Generate observations as RDF
Requires a .json file (currated type) and generates RDF observations in output.
Quit
Exit the menu.

Example of use

Here is a complete execution example:

./menu.sh

Then in the menu:

1 → to generate the IAWA JSON thesaurus
3 → to generate the currated POWO files
5 → to transform CEPAM observations
7 → and enter this path:

input/cepam_observations/currated/CEPAM_feature_net_taxa_and_numbers_homogene.json

Requirements

SPARQL, SOSA/SSN ontologies
Morph-xR2RML
Python 3.10.12

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
input		input
tools		tools
virtuoso		virtuoso
.gitignore		.gitignore
README.md		README.md
menu.sh		menu.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WoodKG

Table of Concent

Installation

1. Morph-xR2RML

2. WCVP - Plant taxonomy

Repository Structure

input

output

tools

Usage

Example of use

Requirements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Wimmics/WoodKG

Folders and files

Latest commit

History

Repository files navigation

WoodKG

Table of Concent

Installation

1. Morph-xR2RML

2. WCVP - Plant taxonomy

Repository Structure

input

output

tools

Usage

Example of use

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages