Skip to content
rossbowen edited this page Jun 29, 2021 · 2 revisions

A data cube typically corresponds to a to a single CSV file and is represented as an element in the CSVW's tables section. Some large cubes may be partitioned into multiple CSV files - this scenario is covered in the slices section.

We require the following input from the user:

attribute type description necessity
[DATACUBE_CSV] string a string which specifies the full filename of the CSV being described. required
[DATACUBE_URI] URI a URI given to a specific data cube. optional
[DATACUBE_ID] string a kebab-case string used to coin a URI for the datacube if no [DATACUBE_URI] is specifically provided. optional

Coining a URI for the cube

If no [DATACUBE_URI] is provided, the user may provide a [DATACUBE_ID] (i.e., slug) which should be lower kebab-case which is used to coin a URI:

[DATACUBE_URI] = http://gss-data.org.uk/data/[DATACUBE_ID]

If no [DATACUBE_ID] is provided, the title of the data cube (details below) should be transformed into a valid data cube ID , and used to coin a URI.

[DATACUBE_ID]  = kebabcase([DATACUBE_TITLE])
[DATACUBE_URI] = http://gss-data.org.uk/data/[DATACUBE_ID]

To be clear, transformations of strings into kebab-case should also transform strings into lower case.

Cube metadata

An entry should be added to the CSVW's tables array using the following example template. Additional metadata which may be provided by the user are listed below.

{

    //...

   "tables": [
       {
            "@id": "[DATACUBE_URI]",
            "url": "[DATACUBE_CSV]",
            "rdfs:type": [
                {"@id": "dcat:Dataset"},
                {"@id": "qb:DataSet"}
            ],
            "dcterms:title": "A Human-readable Title",
            "dcterms:description": "A human-readable description.",
            "dcterms:publisher": {
                "@id": "https://example.org/a-publisher"
            },
            "dcat:contactPoint": [
                {
                    "vcard:fn": "Full Name",
                    "vcard:email": "something@example.com",
                    "vcard:tel": "+44 (0)1234 567890"
                }
            ],
            "dcterms:issued": {
                "@value": "YYYY-MM-DD",
                "@type": "xsd:date"
            },
            "dcat:theme": [
                {"@id": "https://example.com/a-theme"}
            ],
            "dcat:keyword": ["Some", "List", "Of", "Keywords"],
            "dcat:landingPage": {
                "@id": "https://example.com/a-landing-page"
            },
            "qb:structure": {
                "@id": "[DATACUBE_URI]#structure",
                "@type": "qb:DataStructureDefinition",
                "qb:component": [

                    //...

                ]
            },
            "tableSchema": {
                "columns": [

                    //...

                ]
            }
       }
   ]

The user may specify any of the following metadata attributes associated with the data cube.

namespace key type description necessity
dcterms title string a human-readable title for the data cube. required
dcterms description string a human-readable description of the data cube. required
dcterms publisher URI a URI representing the publisher of the data cube. recommended
vcard fn string a human-readble full name of the contact point. recommended
vcard email string the email address of the contact point. recommended
dcterms issued date the date of the data cube being published/issued. recommended
dcat theme URI an array of URIs representing the themes of the data cube. recommended
dcat keyword string an array of keyword strings relating to the data cube. recommended
dcat landingPage URI a URI representing the landing page associated with the data cube. recommended
vcard tel string a human-readable telephone number of the contact point. optional

Additionally, the user may specify arbitrary RDF to express further information about the cube.

Specifying arbitrary metadata

The CSVW specification refers to "arbitrary RDF" as "common properties" and sets out the standards these must adhere to.

The user may specify arbitrary RDF using prefixed names (of the form prefix:name), so long as the prefix is one of those provided by the CSVW @context.

The prefixes that are recognized are those defined for [rdfa-core] within the RDFa 1.1 Initial Context and other prefixes defined within [csvw-context] and these MUST NOT be overridden. These prefixes are periodically extended; refer to [csvw-context] for details. Properties from other vocabularies MUST be named using absolute URLs.

Clone this wiki locally