3 Predicate Reference
AllApplicationManualNameSummaryHelp

  • Documentation
    • Reference manual
    • Packages
      • SWI-Prolog SGML/XML parser
        • Predicate Reference
          • Loading Structured Documents
          • Handling white-space
          • XML documents
          • DTD-Handling
          • Extracting a DTD
          • Parsing Primitives
          • Type checking

3.2 Handling white-space

SGML2PL has four modes for handling white-space. The initial mode can be switched using the space(SpaceMode) option to load_structure/3 and set_sgml_parser/2. In XML mode, the mode is further controlled by the xml:space attribute, which may be specified both in the DTD and in the document. The defined modes are:

space(sgml)
In SGML, newlines at the start and end of an element are removed.2In addition, newlines at the end of lines containing only markup should be deleted. This is not yet implemented. This is the default mode for the SGML dialect.
space(preserve)
White space is passed literally to the application. This mode leaves most white space handling to the application. This is the default mode for the XML dialect. Note that \r\n is still translated to \n. To preserve whitespace exactly, use space(strict) (see below)
space(strict)
White space is passed strictly to the application. This mode leaves all white space handling to the application. This is useful for producing and verifying XML signatures.
space(default)
In addition to sgml space-mode, all consequtive white-space is reduced to a single space-character. This mode canonicalises all white space.
space(remove)
In addition to default, all leading and trailing white-space is removed from CDATA objects. If, as a result, the CDATA becomes empty, nothing is passed to the application. This mode is especially handy for processingā€˜data-oriented' documents, such as RDF. It is not suitable for normal text documents. Consider the HTML fragment below. When processed in this mode, the spaces between the three modified words are lost. This mode is not part of any standard; XML 1.0 allows only default and preserve.
Consider adjacent <b>bold</b> <ul>and</ul> <it>italic</it> words.