Hermes: A content oriented LaTeX to XML conversion/authoring tool

Talk @ OpenMath, Bremen, 7 Nov. 2003
Romeo Anghelache

Introduction

Hermes complements the LaTeX system: it helps the authors enrich the semantics of their work, preserving the quality of rendering. This makes it a content oriented LaTeX authoring and conversion to XML tool.
Its main purpose is to assist the authors of scientific articles in making their work available to the public, on the Internet.

Outline

Currently, Hermes has the following components: Hermes v.0.7.1 handles consistently only those mathematical expressions in LaTeX which contain structures covered by the current Content-MathML standard (MathML version 2.0); it also handles the text around them. A subset of these structures can be and are automatically inferred from the unaltered (that is, only the 'definitions' file is included, no further manual formulae editing is performed) LaTeX source (at the cost of semantic depth), the rest are provided as supplementary macros; they reside and are briefly commented in the 'definitions' file.

Architecture

Hermes is developed along these lines:

Details about the source code

To do

In the real world, authors need, along with the most usual symbols, or Hermes provided macros, arbitrary mathematical expressions they feel most appropriate for rendering a particular meaning; some of their choices become de facto standards (e.g. x1/2 ) so Hermes has no difficulty in generating the appropriate content oriented XML, others remain ambiguous from a machine point of view (e.g. Q+ ), i.e. there is not enough information for a machine to infer what their meaning were.
There is no realistic (i.e. easily acceptable by the user) alternative solution to the problem above but to convert those arbitrary symbols into Presentation-MathML and let the author complement the source of the arbitrary symbol with simple annotations if he feels the need to do so (and not forcing him to obey a non-standard, external to his way of thinking, set of conventions); these annotated Presentation-MathML structures, along with the Content-MathML, enable a potential reader to locate mathematical expressions on the web by their meaning and not by their particular rendering (which, obviously, cannot be known before accesing the document itself).
Therefore, a truly viable and complete Hermes system should go beyond converting from LaTeX to Content-MathML, that is, should be prepared to convert and annotate arbitrary mathematical expressions, not yet covered by the current Content-MathML or OpenMath standards, into Presentation-MathML and this is how it will evolve.