Skip to content
/ WoodKG Public
forked from Sensaku/BurnWood

Extraction of charcoar data and generation of thesaurus and experimentation dataset

Notifications You must be signed in to change notification settings

Wimmics/WoodKG

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

109 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WoodKG

WoodKG is a knowledge graph for African Wood charcoal studies. This repository contains the tools that are used to build the 3 graphs that are linked together and form WoodKG:

  • a biological taxonomy providing IRIs for taxa and scientific names,
  • a thesaurus of anatomical characteristics being observed,
  • the observations of charcoal samples.

The data sources are the following:

Table of Concent

Installation

Before starting, you must download two resources:

  1. Create folder xr2rml.
  2. CD to xr2rml and install the necessary files and folders following the Docker installation instructions.
  3. Open the file mongo_tools/import-tools.sh.
  4. Modify the following line:
    MONGO_IMPORT_MAXSIZE=16000000
    to increase the size
    MONGO_IMPORT_MAXSIZE=160000000
    

Start Morph-xR2RML containers with docker-compose up -d.

2. WCVP - Plant taxonomy

Run the commands below to download the WCVP taxonomic data wccp_dwca.zip, extract the file wcvp_taxon.csv and place it in input/powo/raw.

From the project root, run:

mkdir -p input/powo/raw input/powo/currated
cd input/powo/raw
wget https://sftp.kew.org/pub/data-repositories/WCVP/wcvp_dwca.zip
unzip wcvp_dwca.zip wcvp_taxon.csv

Then, return to the project root and run the script ./tools/powo/split_wcvp.sh. This will split the csv file into chunks of maximum 100000 lines each.

Repository Structure

input

This folder contains the data sources: WCVP taxonomy (powo/), [input/iawa_thesaurus](IAWA thesaurus), InsideWood observations, CEPAM observations.

Each folder contains two subfolders: - raw/ for the raw files downloaded from their respective sources, - currated/for the transformed versions ready to be used for RDF generation.

More details.

output

Contains the generated RDF files:

  • the POWO taxonomy (powo_taxonomy_*.ttl),
  • the IAWA thesaurus.
  • the InsideWood or CEPAM observations (observations.ttl)

File unmatched_taxa.json gives the observations for which no taxonomic identifier was found in POWO.

tools

Contains the scripts for transforming raw files to currated files, and currated files to RDF files.

Usage

Launch the main menu with: ./menu.sh

The menu offers different options by calling .sh scripts located in tools/<subfolder>/scripts:

  1. Generate IAWA thesaurus as JSON
    Transforms the IAWA thesaurus files from raw to currated using tools/iawa_thesaurus/scripts/thesaurus.sh and tools/iawa_thesaurus/scripts/iawa_properties.sh.

  2. Generate IAWA thesaurus as RDF
    Generates iawa_thesaurus.ttl in output from JSON files using tools/xr2rml/observation2xr2rml --thesaurus. Must be executed after option 1.

  3. Generate POWO taxonomy as JSON
    Transforms WCVP taxonomic files from raw to currated using tools/powo/scripts/powo.sh.

  4. Generate POWO taxonomy as RDF
    Generates RDF files powo_taxonomy_*.ttl from JSON files using tools/xr2rml/observation2xr2rml --taxon.

  5. Generate CEPAM observations as JSON
    Transforms CEPAM observations from raw to currated using tools/cepam_observations/scripts/.cepam_csvtojson.sh

  6. Generate InsideWood observations as JSON
    Transforms InsideWood observations from raw to currated using tools/insidewood_observations/scripts/insidewood_observations.sh.

  7. Generate observations as RDF
    Requires a .json file (currated type) and generates RDF observations in output.

  8. Quit
    Exit the menu.

Example of use

Here is a complete execution example:

./menu.sh

Then in the menu:

  • 1 → to generate the IAWA JSON thesaurus
  • 3 → to generate the currated POWO files
  • 5 → to transform CEPAM observations
  • 7 → and enter this path:
input/cepam_observations/currated/CEPAM_feature_net_taxa_and_numbers_homogene.json

Requirements

  • SPARQL, SOSA/SSN ontologies
  • Morph-xR2RML
  • Python 3.10.12

About

Extraction of charcoar data and generation of thesaurus and experimentation dataset

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 85.6%
  • Shell 14.4%