3 The HTTP server libraries
AllApplicationManualNameSummaryHelp

  • Documentation
    • Reference manual
    • Packages
      • SWI-Prolog HTTP support
        • The HTTP server libraries
          • Creating an HTTP reply
          • library(http/http_dispatch): Dispatch requests in the HTTP server
          • library(http/http_dirindex): HTTP directory listings
          • library(http/http_files): Serve plain files from a hierarchy
          • library(http/http_session): HTTP Session management
          • library(http/http_cors): Enable CORS: Cross-Origin Resource Sharing
          • library(http/http_authenticate): Authenticate HTTP connections using 401 headers
          • library(http/http_digest): HTTP Digest authentication
          • library(http/http_dyn_workers): Dynamically schedule HTTP workers.
          • Custom Error Pages
          • library(http/http_openid): OpenID consumer and server library
          • Get parameters from HTML forms
          • Request format
          • Running the server
            • Common server interface options
            • Multi-threaded Prolog
            • library(http/http_unix_daemon): Run SWI-Prolog HTTP server as a Unix system daemon
            • From (Unix) inetd
            • MS-Windows
            • As CGI script
            • Using a reverse proxy
          • The wrapper library
          • library(http/http_host): Obtain public server location
          • library(http/http_log): HTTP Logging module
          • Debugging HTTP servers
          • library(http/http_header): Handling HTTP headers
          • The library(http/html_write) library
          • library(http/js_write): Utilities for including JavaScript
          • library(http/http_path): Abstract specification of HTTP server locations
          • library(http/html_head): Automatic inclusion of CSS and scripts links
          • library(http/http_pwp): Serve PWP pages through the HTTP server

3.14 Running the server

The functionality of the server should be defined in one Prolog file (of course this file is allowed to load other files). Depending on the wanted server setup this‘body' is wrapped into a small Prolog file combining the body with the appropriate server interface. There are three supported server-setups. For most applications we advice the multi-threaded server. Examples of this server architecture are the PlDoc documentation system and the SeRQL Semantic Web server infrastructure.

All the server setups may be wrapped in a reverse proxy to make them available from the public web-server as described in section 3.14.7.

  • Using library(thread_httpd) for a multi-threaded server
    This server exploits the multi-threaded version of SWI-Prolog, running the users body code parallel from a pool of worker threads. As it avoids the state engine and copying required in the event-driven server it is generally faster and capable to handle multiple requests concurrently.

    This server is harder to debug due to the involved threading, although the GUI tracer provides reasonable support for multi-threaded applications using the tspy/1 command. It can provide fast communication to multiple clients and can be used for more demanding servers.

  • Using library(inetd_httpd) for server-per-client
    In this setup the Unix inetd user-daemon is used to initialise a server for each connection. This approach is especially suitable for servers that have a limited startup-time. In this setup a crashing client does not influence other requests.

    This server is very hard to debug as the server is not connected to the user environment. It provides a robust implementation for servers that can be started quickly.

3.14.1 Common server interface options

All the server interfaces provide http_server(:Goal, +Options) to create the server. The list of options differ, but the servers share common options:

port(?Port)
Specify the port to listen to for stand-alone servers. Port is either an integer or unbound. If unbound, it is unified to the selected free port.

3.14.2 Multi-threaded Prolog

The library(http/thread_httpd.pl) provides the infrastructure to manage multiple clients using a pool of worker-threads. This realises a popular server design, also seen in Java Tomcat and Microsoft .NET. As a single persistent server process maintains communication to all clients startup time is not an important issue and the server can easily maintain state-information for all clients.

In addition to the functionality provided by the inetd server, the threaded server can also be used to realise an HTTPS server exploiting the library(ssl) library. See option ssl(+SSLOptions) below.

http_server(:Goal, +Options)
Create the server. Options must provide the port(?Port) option to specify the port the server should listen to. If Port is unbound an arbitrary free port is selected and Port is unified to this port-number. The server consists of a small Prolog thread accepting new connection on Port and dispatching these to a pool of workers. Defined Options are:
port(?Address)
Address to bind to. Address is either a port (integer) or a term Host:Port. The port may be a variable, causing the system to select a free port and unify the variable with the selected port. See also tcp_bind/2.
workers(+N)
Defines the number of worker threads in the pool. Default is to use five workers. Choosing the optimal value for best performance is a difficult task depending on the number of CPUs in your system and how much resources are required for processing a request. Too high numbers makes your system switch too often between threads or even swap if there is not enough memory to keep all threads in memory, while a too low number causes clients to wait unnecessary for other clients to complete. See also http_workers/2.
timeout(+SecondsOrInfinite)
Determines the maximum period of inactivity handling a request. If no data arrives within the specified time since the last data arrived, the connection raises an exception, and the worker discards the client and returns to the pool-queue for a new client. If it is infinite, a worker may wait forever on a client that doesn't complete its request. Default is 60 seconds.
keep_alive_timeout(+SecondsOrInfinite)
Maximum time to wait for new activity on Keep-Alive connections. Choosing the correct value for this parameter is hard. Disabling Keep-Alive is bad for performance if the clients request multiple documents for a single page. This may ---for example-- be caused by HTML frames, HTML pages with images, associated CSS files, etc. Keeping a connection open in the threaded model however prevents the thread servicing the client servicing other clients. The default is 2 seconds.
local(+KBytes)
Size of the local-stack for the workers. Default is taken from the commandline option.
global(+KBytes)
Size of the global-stack for the workers. Default is taken from the commandline option.
trail(+KBytes)
Size of the trail-stack for the workers. Default is taken from the commandline option.
ssl(+SSLOptions)
Use SSL (Secure Socket Layer) rather than plain TCP/IP. A server created this way is accessed using the https:// protocol. SSL allows for encrypted communication to avoid others from tapping the wire as well as improved authentication of client and server. The SSLOptions option list is passed to ssl_context/3. The port option of the main option list is forwarded to the SSL layer. See the library(ssl) library for details.
http_server_property(?Port, ?Property)
True if Property is a property of the HTTP server running at Port. Defined properties are:
goal(:Goal)
Goal used to start the server. This is often http_dispatch/1.
scheme(-Scheme)
Scheme is one of http or https.
start_time(-Time)
Time-stamp when the server was created. See format_time/3 for creating a human-readable representation.
http_workers(+Port, ?Workers)
Query or manipulate the number of workers of the server identified by Port. If Workers is unbound it is unified with the number of running servers. If it is an integer greater than the current size of the worker pool new workers are created with the same specification as the running workers. If the number is less than the current size of the worker pool, this predicate inserts a number of‘quit' requests in the queue, discarding the excess workers as they finish their jobs (i.e. no worker is abandoned while serving a client).

This can be used to tune the number of workers for performance. Another possible application is to reduce the pool to one worker to facilitate easier debugging.

http_add_worker(+Port, +Options)
Add a new worker to the HTTP server for port Port. Options overrule the default queue options. The following additional options are processed:
max_idle_time(+Seconds)
The created worker will automatically terminate if there is no new work within Seconds.
http_stop_server(+Port, +Options)
Stop the HTTP server at Port. Halting a server is done gracefully, which means that requests being processed are not abandoned. The Options list is for future refinements of this predicate such as a forced immediate abort of the server, but is currently ignored.
http_current_worker(?Port, ?ThreadID)
True if ThreadID is the identifier of a Prolog thread serving Port. This predicate is motivated to allow for the use of arbitrary interaction with the worker thread for development and statistics.
http_spawn(:Goal, +Spec)
Continue handling this request in a new thread running Goal. After http_spawn/2, the worker returns to the pool to process new requests. In its simplest form, Spec is the name of a thread pool as defined by thread_pool_create/3. Alternatively it is an option list, whose options are passed to thread_create_in_pool/4 if Spec contains pool(Pool) or to thread_create/3 of the pool option is not present. If the dispatch module is used (see section 3.2), spawning is normally specified as an option to the http_handler/3 registration.

We recomment the use of thread pools. They allow registration of a set of threads using common characteristics, specify how many can be active and what to do if all threads are active. A typical application may define a small pool of threads with large stacks for computation intensive tasks, and a large pool of threads with small stacks to serve media. The declaration could be the one below, allowing for max 3 concurrent solvers and a maximum backlog of 5 and 30 tasks creating image thumbnails.

:- use_module(library(thread_pool)).

:- thread_pool_create(compute, 3,
                      [ local(20000), global(100000), trail(50000),
                        backlog(5)
                      ]).
:- thread_pool_create(media, 30,
                      [ local(100), global(100), trail(100),
                        backlog(100)
                      ]).

:- http_handler('/solve',     solve,     [spawn(compute)]).
:- http_handler('/thumbnail', thumbnail, [spawn(media)]).

3.14.3 library(http/http_unix_daemon): Run SWI-Prolog HTTP server as a Unix system daemon

See also
The file <swi-home>/doc/packages/examples/http/linux-init-script provides a /etc/init.d script for controlling a server as a normal Unix service.
To be done
Cleanup issues wrt. loading and initialization of xpce.

This module provides the logic that is needed to integrate a process into the Unix service (daemon) architecture. It deals with the following aspects, all of which may be used/ignored and configured using commandline options:

  • Select the port(s) to be used by the server
  • Run the startup of the process as root to perform privileged tasks and the server itself as unpriviledged user, for example to open ports below 1000.
  • Fork and detach from the controlling terminal
  • Handle console and debug output using a file and/or the syslog daemon.
  • Manage a pid file

The typical use scenario is to write a file that loads the following components:

  1. The application code, including http handlers (see http_handler/3).
  2. This library

In the code below, ?- [load]. loads the remainder of the webserver code. This is often a sequence of use_module/1 directives.

:- use_module(library(http/http_unix_daemon)).

:- [load].

The program entry point is http_daemon/0, declared using initialization/2. This may be overruled using a new declaration after loading this library. The new entry point will typically call http_daemon/1 to start the server in a preconfigured way.

:- use_module(library(http/http_unix_daemon)).
:- initialization(run, main).

run :-
    ...
    http_daemon(Options).

Now, the server may be started using the command below. See http_daemon/0 for supported options.

% [sudo] swipl mainfile.pl [option ...]

Below are some examples. Our first example is completely silent, running on port 80 as user www.

% swipl mainfile.pl --user=www --pidfile=/var/run/http.pid

Our second example logs HTTP interaction with the syslog daemon for debugging purposes. Note that the argument to --debug= is a Prolog term and must often be escaped to avoid misinterpretation by the Unix shell. The debug option can be repeated to log multiple debug topics.

% swipl mainfile.pl --user=www --pidfile=/var/run/http.pid \
        --debug='http(request)' --syslog=http

Broadcasting The library uses broadcast/1 to allow hooking certain events:

http(pre_server_start)
Run after fork, just before starting the HTTP server. Can be used to load additional files or perform additional initialisation, such as starting additional threads. Recall that it is not possible to start threads before forking.
http(post_server_start)
Run after starting the HTTP server.
http_daemon
Start the HTTP server as a daemon process. This predicate processes the commandline arguments below. Commandline arguments that specify servers are processed in the order they appear using the following schema:

  1. Arguments that act as default for all servers.
  2. --http=Spec or --https=Spec is followed by arguments for that server until the next --http=Spec or --https=Spec or the end of the options.
  3. If no --http=Spec or --https=Spec appears, one HTTP server is created from the specified parameters.

    Examples:

    --workers=10 --http --https
    --http=8080 --https=8443
    --http=localhost:8080 --workers=1 --https=8443 --workers=25

--port=Port
Start HTTP server at Port. It requires root permission and the option --user=User to open ports below 1000. The default port is 80. If --https is used, the default port is 443.
--ip=IP
Only listen to the given IP address. Typically used as --ip=localhost to restrict access to connections from localhost if the server itself is behind an (Apache) proxy server running on the same host.
--debug=Topic
Enable debugging Topic. See debug/3.
--syslog=Ident
Write debug messages to the syslog daemon using Ident
--user=User
When started as root to open a port below 1000, this option must be provided to switch to the target user for operating the server. The following actions are performed as root, i.e., before switching to User:

  • open the socket(s)
  • write the pidfile
  • setup syslog interaction
  • Read the certificate, key and password file (--pwfile=File)
--group=Group
May be used in addition to --user. If omitted, the login group of the target user is used.
--pidfile=File
Write the PID of the daemon process to File.
--output=File
Send output of the process to File. By default, all Prolog console output is discarded.
--fork[=Bool]
If given as --no-fork or --fork=false, the process runs in the foreground.
--http[=(Bool|Port|BindTo:Port)]
Create a plain HTTP server. If the argument is missing or true, create at the specified or default address. Else use the given port and interface. Thus, --http creates a server at port 80, --http=8080 creates one at port 8080 and --http=localhost:8080 creates one at port 8080 that is only accessible from localhost.
--https[=(Bool|Port|BindTo:Port)]
As --http, but creates an HTTPS server. Use --certfile, --keyfile, -pwfile, --password and --cipherlist to configure SSL for this server.
--certfile=File
The server certificate for HTTPS.
--keyfile=File
The server private key for HTTPS.
--pwfile=File
File holding the password for accessing the private key. This is preferred over using --password=PW as it allows using file protection to avoid leaking the password. The file is read before the server drops privileges when started with the --user option.
--password=PW
The password for accessing the private key. See also‘--pwfile`.
--cipherlist=Ciphers
One or more cipher strings separated by colons. See the OpenSSL documentation for more information. Starting with SWI-Prolog 7.5.11, the default value is always a set of ciphers that was considered secure enough to prevent all critical attacks at the time of the SWI-Prolog release.
--interactive[=Bool]
If true (default false) implies --no-fork and presents the Prolog toplevel after starting the server.
--gtrace=[Bool]
Use the debugger to trace http_daemon/1.
--sighup=Action
Action to perform on kill -HUP <pid>. Default is reload (running make/0). Alternative is quit, stopping the server.

Other options are converted by argv_options/3 and passed to http_server/1. For example, this allows for:

--workers=Count
Set the number of workers for the multi-threaded server.

http_daemon/0 is defined as below. The start code for a specific server can use this as a starting point, for example for specifying defaults.

http_daemon :-
    current_prolog_flag(argv, Argv),
    argv_options(Argv, _RestArgv, Options),
    http_daemon(Options).
See also
http_daemon/1
http_daemon(+Options)
Start the HTTP server as a daemon process. This predicate processes a Prolog option list. It is normally called from http_daemon/0, which derives the option list from the command line arguments.

Error handling depends on whether or not interactive(true) is in effect. If so, the error is printed before entering the toplevel. In non-interactive mode this predicate calls halt(1).

[semidet,multifile]http_certificate_hook(+CertFile, +KeyFile, -Password)
Hook called before starting the server if the --https option is used. This hook may be used to create or refresh the certificate. If the hook binds Password to a string, this string will be used to decrypt the server private key as if the --password=Password option was given.
[semidet,multifile]http_server_hook(+Options)
Hook that is called to start the HTTP server. This hook must be compatible to http_server(Handler, Options). The default is provided by start_server/1.
[multi,multifile]http:sni_options(-HostName, -SSLOptions)
Hook to provide Server Name Indication (SNI) for TLS servers. When starting an HTTPS server, all solutions of this predicate are collected and a suitable sni_hook/1 is defined for ssl_context/3 to use different contexts depending on the host name of the client request. This hook is executed before privileges are dropped.

3.14.4 From (Unix) inetd

All modern Unix systems handle a large number of the services they run through the super-server inetd or one of its descendants (xinetd, systemd etc.) Such a program reads a configuration file (for example /etc/inetd.conf) and opens server-sockets on all ports defined in this file. As a request comes in it accepts it and starts the associated server such that standard I/O is performed through the socket. This approach has several advantages:

  • Simplification of servers
    Servers don't have to know about sockets and -operations.

  • Centralised authorisation
    Using tcpwrappers and similar tools, simple and effective firewalling of all services can be realised.

  • Automatic start and monitor
    The inetd automatically starts the server‘just-in-time' and starts additional servers or restarts a crashed server according to its configuration.

The very small generic script for handling inetd based connections is in inetd_httpd, defining http_server/1:

http_server(:Goal)
Initialises and runs http_wrapper/5 in a loop until failure or end-of-file. This server does not support the Port option as the port is specified with the inetd configuration. The only supported option is After.

Here is the example from demo_inetd

#!/usr/bin/pl -t main -q -f
:- use_module(demo_body).
:- use_module(inetd_httpd).

main :-
        http_server(reply).

With the above file installed in /home/jan/plhttp/demo_inetd, the following line in /etc/inetd enables the server at port 4001 guarded by tcpwrappers. After modifying inetd, send the daemon the HUP signal to make it reload its configuration. For more information, please check inetd.conf(5).

4001 stream tcp nowait nobody /usr/sbin/tcpd /home/jan/plhttp/demo_inetd

3.14.5 MS-Windows

There are rumours that inetd has been ported to Windows.

3.14.6 As CGI script

To be done.

3.14.7 Using a reverse proxy

There are several options for public deployment of a web service. The main decision is whether to run it on a standard port (port 80 for HTTP, port 443 for HTTPS) or a non-standard port such as for example 8000 or 8080. Using a standard port below 1000 requires root access to the machine, and prevents other web services from using the same port. On the other hand, using a non-standard port may cause problems with intermediate proxy- and/or firewall policies that may block the port when you try to access the service from some networks. In both cases, you can either use a physical or a virtual machine running ---for example--- under VMWARE or XEN to host the service. Using a dedicated (physical or virtual) machine to host a service isolates security threats. Isolation can also be achieved using a Unix chroot environment, which is however not a security feature.

To make several different web services reachable on the same (either standard or non-standard) port, you can use a so-called reverse proxy. A reverse proxy uses rules to relay requests to other web services that use their own dedicated ports. This approach has several advantages:

  • We can run the service on a non-standard port, but still access it (via the proxy) on a standard port, just as for a dedicated machine. We do not need a separate machine though: We only need to configure the reverse proxy to relay requests to the intended target servers.
  • As the main web server is doing the front-line service, the Prolog server is normally protected from malformed HTTP requests that could result in denial of service or otherwise compromise the server. In addition, the main web server can transparently provide encodings such as compression to the outside world.

Proxy technology can be combined with isolation methods such as dedicated machines, virtual machines and chroot jails. The proxy can also provide load balancing.

Setting up an Apache reverse proxy

The Apache reverse proxy setup is really simple. Ensure the modules proxy and proxy_http are loaded. Then add two simple rules to the server configuration. Below is an example that makes a PlDoc server on port 4000 available from the main Apache server at port 80.

ProxyPass        /pldoc/ http://localhost:4000/pldoc/
ProxyPassReverse /pldoc/ http://localhost:4000/pldoc/

Apache rewrites the HTTP headers passing by, but using the above rules it does not examine the content. This implies that URLs embedded in the (HTML) content must use relative addressing. If the locations on the public and Prolog server are the same (as in the example above) it is allowed to use absolute locations. I.e. /pldoc/search is ok, but http://myhost.com:4000/pldoc/search is not. If the locations on the server differ, locations must be relative (i.e. not start with /.

This problem can also be solved using the contributed Apache module proxy_html that can be instructed to rewrite URLs embedded in HTML documents. In our experience, this is not troublefree as URLs can appear in many places in generated documents. JavaScript can create URLs on the fly, which makes rewriting virtually impossible.