3.6 Parsing Primitives
AllApplicationManualNameSummaryHelp

  • Documentation
    • Reference manual
    • Packages
      • SWI-Prolog SGML/XML parser
        • Predicate Reference
          • Parsing Primitives
            • new_sgml_parser/2
            • free_sgml_parser/1
            • set_sgml_parser/2
            • get_sgml_parser/2
            • sgml_parse/2
            • Partial Parsing

3.6.1 Partial Parsing

In some cases, part of a document needs to be parsed. One option is to use load_structure/2 or one of its variations and extract the desired elements from the returned structure. This is a clean solution, especially on small and medium-sized documents. It however is unsuitable for parsing really big documents. Such documents can only be handled with the call-back output interface realised by the call(Event, Action) option of sgml_parse/2. Event-driven processing is not very natural in Prolog.

The SGML2PL library allows for a mixed approach. Consider the case where we want to process all descriptions from RDF elements in a document. The code below calls process_rdf_description(Element) on each element that is directly inside an RDF element.

:- dynamic
        in_rdf/0.

load_rdf(File) :-
        retractall(in_rdf),
        open(File, read, In),
        new_sgml_parser(Parser, []),
        set_sgml_parser(Parser, file(File)),
        set_sgml_parser(Parser, dialect(xml)),
        sgml_parse(Parser,
                   [ source(In),
                     call(begin, on_begin),
                     call(end, on_end)
                   ]),
        close(In).

on_end('RDF', _) :-
        retractall(in_rdf).

on_begin('RDF', _, _) :-
        assert(in_rdf).
on_begin(Tag, Attr, Parser) :-
        in_rdf, !,
        sgml_parse(Parser,
                   [ document(Content),
                     parse(content)
                   ]),
        process_rdf_description(element(Tag, Attr, Content)).