3.6 Parsing Primitives
AllApplicationManualNameSummaryHelp

  • Documentation
    • Reference manual
    • Packages
      • SWI-Prolog SGML/XML parser
        • Predicate Reference
          • Parsing Primitives
            • new_sgml_parser/2
            • free_sgml_parser/1
            • set_sgml_parser/2
            • get_sgml_parser/2
            • sgml_parse/2
            • Partial Parsing
Availability::- use_module(library(sgml)).(can be autoloaded)
sgml_parse(+Parser, +Options)
Parse an XML file. The parser can operate in two input and two output modes. Output is either a structured term as described with load_structure/2 or call-backs on predefined events. The first is especially suitable for manipulating not-too-large documents, while the latter provides a primitive means for handling very large documents.

Input is a stream. A full description of the option-list is below.

document(-Term)
A variable that will be unified with a list describing the content of the document (see load_structure/2).
source(+Stream)
An input stream that is read. This option must be given.
content_length(+Characters)
Stop parsing after Characters. This option is useful to parse input embedded in envelopes, such as the HTTP protocol.
cdata(+Representation)
Specify the representation of cdata elements. Supported are atom (default), and string. See load_structure/3 for details.
parse(+Unit)
Defines how much of the input is parsed. This option is used to parse only parts of a file.
file
Default. Parse everything upto the end of the input.
element
The parser stops after reading the first element. Using source(Stream), this implies reading is stopped as soon as the element is complete, and another call may be issued on the same stream to read the next element.
content
The value content is like element but assumes the element has already been opened. It may be used in a call-back from call(on_begin, Pred) to parse individual elements after validating their headers.
declaration
This may be used to stop the parser after reading the first declaration. This is especially useful to parse only the doctype declaration.
input
This option is intended to be used in conjunction with the allowed(Elements) option of get_sgml_parser/2. It disables the parser's default to complete the parse-tree by closing all open elements.
max_errors(+MaxErrors)
Set the maximum number of errors. If this number is exceeded further writes to the stream will yield an I/O error exception. Printing of errors is suppressed after reaching this value. The default is 50. Using max_errors(-1) makes the parser continue, no matter how many errors it encounters.
error(limit_exceeded(max_errors, Max), _)
syntax_errors(+ErrorMode)
Defines how syntax errors are handled.
quiet
Suppress all messages.
print
Default. Pass messages to print_message/2.
style
Print dubious input such as attempts for redefinitions in the DTD using print_message/2 with severity informational.
xml_no_ns(+Mode)
Error handling if an XML namespace is not defined. Default generates an error. If quiet, the error is suppressed. Can be used together with call(urlns, Closure) to provide external expansion of namespaces. See also section 3.3.1.
call(+Event, :PredicateName)
Issue call-backs on the specified events. PredicateName is the name of the predicate to call on this event, possibly prefixed with a module identifier. If the handler throws an exception, parsing is stopped and sgml_parse/2 re-throws the exception. The defined events are:
begin
An open-tag has been parsed. The named handler is called with three arguments: Handler(+Tag, +Attributes, +Parser).
end
A close-tag has been parsed. The named handler is called with two arguments: Handler(+Tag, +Parser).
cdata
CDATA has been parsed. The named handler is called with two arguments: Handler(+CDATA, +Parser), where CDATA is an atom representing the data.
pi
A processing instruction has been parsed. The named handler is called with two arguments: Handler(+Text, +Parser), where Text is the text of the processing instruction.
decl
A declaration (<!...>) has been read. The named handler is called with two arguments: Handler(+Text, +Parser), where Text is the text of the declaration with comments removed.

This option is expecially useful for highlighting declarations and comments in editor support, where the location of the declaration is extracted using get_sgml_parser/2.

error
An error has been encountered. the named handler is called with three arguments: Handler(+Severity, +Message, +Parser), where Severity is one of warning or error and Message is an atom representing the diagnostic message. The location of the error can be determined using get_sgml_parser/2

If this option is present, errors and warnings are not reported using print_message/3

xmlns
When parsing an in xmlns mode, a new namespace declaraction is pushed on the environment. The named handler is called with three arguments: Handler(+NameSpace, +URL, +Parser). See section 3.3.1 for details.
urlns
When parsing an in xmlns mode, this predicate can be used to map a url into either a canonical URL for this namespace or another internal identifier. See section 3.3.1 for details.