Levels of Semantic Encoding

Document Encodings can be placed on a continuum depending on the easy of machine extraction of semantic information:

  1. Bitmapped - GIF, TIFF
  2. Unstructured - HTML, bad TeX, text, PDF, Word
  3. Structured - XML+MathML, LaTeX
  4. Formally presented - Mathematica, Maple, Mizar, OpenMath, etc

Digitized collections are level 1, most online research literature is level 2 or 3.