Skip to content
mgladkova edited this page Nov 14, 2015 · 21 revisions

Harvest is the format that the MathWebSearch accepts as crawled data. It is an extension of MathML which introduces a few tags and attributes to specify meta-information about the crawled Content MathML data.

The introduced tags are harvest, expr and data, all defined in the mws namespace. The root element of MWSHarvest is a mws::harvest. Its children are mws:expr and mws:data nodes.

mws:expr nodes contain the actual ContentMathML and have the following attributes:

  • url specifies the URL+UUID of the m:math from which the content was extracted
  • data_id specifies the id of a mws:data node previously defined in this document. The respective data will be associated with this expression.

mws:data nodes contain arbitrary XML data and have the following attributes:

  • data_id specifies an unique identifier within this XML document.

Note that mws:data has been introduced in version 2 of the mws:harvest, but it is backward compatible.

An example harvest document is provided here:

subscript → x 0 1 superscript x 2

Clone this wiki locally