-
Notifications
You must be signed in to change notification settings - Fork 12
MWS Harvests
Harvest is the format that the MathWebSearch accepts as crawled data. It is an extension of MathML which introduces a few tags and attributes to specify meta-information about the crawled Content MathML data.
The introduced tags are harvest, expr and data, all defined in the mws namespace. The root element of MWSHarvest is a mws::harvest. Its children are mws:expr and mws:data nodes.
mws:expr nodes contain the actual ContentMathML and have the following attributes:
-
urlspecifies the URL+UUID of the m:math from which the content was extracted -
data_idspecifies theidof amws:datanode previously defined in this document. The respective data will be associated with this expression.
mws:data nodes contain arbitrary XML data and have the following attributes:
-
data_idspecifies an unique identifier within this XML document.
Note that mws:data has been introduced in version 2 of the mws:harvest, but it is backward compatible.
An example harvest document is provided here:
<?xml version="1.0"?> <mws:harvest xmlns:mws="http://search.mathweb.org/ns" xmlns:m="http://www.w3.org/1998/Math/MathML"> <mws:data mws:data_id="foo"> <!-- arbitrary XML data --> </mws:data> <mws:expr url="http://math.example.org/article123456#e123456" mws:data_id="foo"> <m:apply> <m:eq/> <m:apply> <m:apply> <m:csymbol cd="ambiguous">subscript</m:csymbol> <m:limit/> <m:apply> <m:ci>→</m:ci> <m:ci>x</m:ci> <m:cn>0</m:cn> </m:apply> </m:apply> <m:apply> <m:divide/> <m:cn>1</m:cn> <m:apply> <m:csymbol cd="ambiguous">superscript</m:csymbol> <m:ci>x</m:ci> <m:cn>2</m:cn> </m:apply> </m:apply> </m:apply> <m:infinity/> </m:apply> </mws:expr> <!-- More mws:data and mws:expr nodes --> </mws:harvest>