3.1 The project source files
AllApplicationManualNameSummaryHelp

  • Documentation
    • Reference manual
      • Initialising and Managing a Prolog Project
        • The project source files
          • File Names and Locations
          • Project Special Files
          • International source files
    • Packages

3.1.3 International source files

As discussed in section 2.19, SWI-Prolog supports international character handling. Its internal encoding is UNICODE. I/O streams convert to/from this internal format. This section discusses the options for source files not in US-ASCII.

SWI-Prolog can read files in any of the encodings described in section 2.19. Two encodings are of particular interest. The text encoding deals with the current locale, the default used by this computer for representing text files. The encodings utf8, unicode_le and unicode_be are UNICODE encodings: they can represent---in the same file---characters of virtually any known language. In addition, they do so unambiguously.

If one wants to represent non US-ASCII text as Prolog terms in a source file, there are several options:

  • Use escape sequences
    This approach describes NON-ASCII as sequences of the form \octal\. The numerical argument is interpreted as a UNICODE character.42To my knowledge, the ISO escape sequence is limited to 3 octal digits, which means most characters cannot be represented. The resulting Prolog file is strict 7-bit US-ASCII, but if there are many NON-ASCII characters it becomes very unreadable.

  • Use local conventions
    Alternatively the file may be specified using local conventions, such as the EUC encoding for Japanese text. The disadvantage is portability. If the file is moved to another machine, this machine must use the same locale or the file is unreadable. There is no elegant way if files from multiple locales must be united in one application using this technique. In other words, it is fine for local projects in countries with uniform locale conventions.

  • Using UTF-8 files
    The best way to specify source files with many NON-ASCII characters is definitely the use of UTF-8 encoding. Prolog can be notified of this encoding in two ways, using a UTF-8 BOM (see section 2.19.1.1) or using the directive :- encoding(utf8). Many of today's text editors, including PceEmacs, are capable of editing UTF-8 files. Projects that were started using local conventions can be re-coded using the Unix iconv tool or often using commands offered by the editor.