The Medical Subject Headings (MeSH) is a controled vocabulary produced by the NLM for cataloging biomedical information. The resource is structured as an ontology and is used for PubMed/MEDLINE annotation. Here we provide user-friendly datasets derived from MeSH. Currently, two record types are processed: Descriptors and Supplementary Concept Records.
descriptors.ipynb— processes Descriptors (also known as Main Headings)supplementary-concept-records.ipynb— processes Supplementary Concept Records (SCRs)
The data directory contains the created datasets:
terms.tsv— table of Descriptor terms.descriptor-terms.tsv— table of Descriptor names.mesh.json— a JSON-formatted representation of the Descriptor ontology. Includes term identifiers, names, semantic types, parents, and tree numbers.ontology.gexf.gz— a GEXF representation of the descriptor ontology that is compatable withnewtorkx.symptoms.tsv— symptom Descriptors (the 438 descendants ofD012816tree-numbers.tsv— table of tree numbers for each Descriptor. A tree number represents a path to the the root. This file is handy for mapping to external resources which occasionally identify MeSH Descriptors by their tree numbers (a bad but prevalent practice).supplemental-records.tsv— table of SCR terms.supplemental-terms.tsv— table of SCR names.
This repository is released as CC0