-
Notifications
You must be signed in to change notification settings - Fork 12
MWS Harvests
mgladkova edited this page Nov 14, 2015
·
21 revisions
Harvest is the format that the MathWebSearch accepts as crawled data. It is an extension of MathML which introduces a few tags and attributes to specify meta-information about the crawled Content MathML data.
The introduced tags are harvest, expr and data, all defined in the mws namespace. The root element of MWSHarvest is a mws::harvest. Its children are mws:expr and mws:data nodes.
mws:expr nodes contain the actual ContentMathML and have the following attributes:
-
urlspecifies the URL+UUID of the m:math from which the content was extracted -
data_idspecifies theidof amws:datanode previously defined in this document. The respective data will be associated with this expression.
mws:data nodes contain arbitrary XML data and have the following attributes:
-
data_idspecifies an unique identifier within this XML document.
Note that mws:data has been introduced in version 2 of the mws:harvest, but it is backward compatible.
An example harvest document is provided here:
subscript
→
x
0
1
superscript
x
2