Sometimes you would like to evaluate code that comes from an untrusted party. The safest way to do this is to buy a new computer, evaluate the code on that computer, then throw the machine away. However if you are unwilling to take this simple approach, Guile does include a limited “sandbox” facility that can allow untrusted code to be evaluated with some confidence.
To use the sandboxed evaluator, load its module:
(use-modules (ice-9 sandbox))
Guile’s sandboxing facility starts with the ability to restrict the time and space used by a piece of code.
Call thunk, but cancel it if limit seconds of wall-clock
time have elapsed. If the computation is canceled, call
limit-reached in tail position. thunk must not disable
interrupts or prevent an abort via a dynamic-wind
unwind handler.
Call thunk, but cancel it if limit bytes have been
allocated. If the computation is canceled, call limit-reached in
tail position. thunk must not disable interrupts or prevent an
abort via a dynamic-wind
unwind handler.
This limit applies to both stack and heap allocation. The computation will not be aborted before limit bytes have been allocated, but for the heap allocation limit, the check may be postponed until the next garbage collection.
Note that as a current shortcoming, the heap size limit applies to all threads; concurrent allocation by other unrelated threads counts towards the allocation limit.
Invoke thunk in a dynamic extent in which its execution is limited
to time-limit seconds of wall-clock time, and its allocation to
allocation-limit bytes. thunk must not disable interrupts
or prevent an abort via a dynamic-wind
unwind handler.
If successful, return all values produced by invoking thunk. Any
uncaught exception thrown by the thunk will propagate out. If the time
or allocation limit is exceeded, an exception will be thrown to the
limit-exceeded
key.
The time limit and stack limit are both very precise, but the heap limit only gets checked asynchronously, after a garbage collection. In particular, if the heap is already very large, the number of allocated bytes between garbage collections will be large, and therefore the precision of the check is reduced.
Additionally, due to the mechanism used by the allocation limit (the
after-gc-hook
), large single allocations like (make-vector
#e1e7)
are only detected after the allocation completes, even if the
allocation itself causes garbage collection. It’s possible therefore
for user code to not only exceed the allocation limit set, but also to
exhaust all available memory, causing out-of-memory conditions at any
allocation site. Failure to allocate memory in Guile itself should be
safe and cause an exception to be thrown, but most systems are not
designed to handle malloc
failures. An allocation failure may
therefore exercise unexpected code paths in your system, so it is a
weakness of the sandbox (and therefore an interesting point of attack).
The main sandbox interface is eval-in-sandbox
.
Evaluate the Scheme expression exp within an isolated "sandbox". Limit its execution to time-limit seconds of wall-clock time, and limit its allocation to allocation-limit bytes.
The evaluation will occur in module, which defaults to the result
of calling make-sandbox-module
on bindings, which itself
defaults to all-pure-bindings
. This is the core of the
sandbox: creating a scope for the expression that is safe.
A safe sandbox module has two characteristics. Firstly, it will not allow the expression being evaluated to avoid being canceled due to time or allocation limits. This ensures that the expression terminates in a timely fashion.
Secondly, a safe sandbox module will prevent the evaluation from
receiving information from previous evaluations, or from affecting
future evaluations. All combinations of binding sets exported by
(ice-9 sandbox)
form safe sandbox modules.
The bindings should be given as a list of import sets. One import
set is a list whose car names an interface, like (ice-9 q)
, and
whose cdr is a list of imports. An import is either a bare symbol or a
pair of (out . in)
, where out and in are
both symbols and denote the name under which a binding is exported from
the module, and the name under which to make the binding available,
respectively. Note that bindings is only used as an input to the
default initializer for the module argument; if you pass
#:module
, bindings is unused. If sever-module? is
true (the default), the module will be unlinked from the global module
tree after the evaluation returns, to allow mod to be
garbage-collected.
If successful, return all values produced by exp. Any uncaught
exception thrown by the expression will propagate out. If the time or
allocation limit is exceeded, an exception will be thrown to the
limit-exceeded
key.
Constructing a safe sandbox module is tricky in general. Guile defines an easy way to construct safe modules from predefined sets of bindings. Before getting to that interface, here are some general notes on safety.
dynamic-wind
should not be
included in any binding set.
eval-in-sandbox
call. If the call returns a procedure which is
later called, no limit is “automatically” in place. Users of
eval-in-sandbox
have to be very careful to reimpose limits when
calling procedures that escape from sandboxes.
eval-in-sandbox
call is not necessarily in place when any procedure that escapes from
the sandbox is later called.
This detail prevents us from exposing primitive-eval
to the
sandbox, for two reasons. The first is that it’s possible for legacy
code to forge references to any binding, if the
allow-legacy-syntax-objects?
parameter is true. The default for
this parameter is true; see Syntax Transformer Helpers for the
details. The parameter is bound to #f
for the duration of the
eval-in-sandbox
call itself, but that will not be in place during
calls to escaped procedures.
The second reason we don’t expose primitive-eval
is that
primitive-eval
implicitly works in the current module, which for
an escaped procedure will probably be different than the module that is
current for the eval-in-sandbox
call itself.
The common denominator here is that if an interface exposed to the sandbox relies on dynamic environments, it is easy to mistakenly grant the sandboxed procedure additional capabilities in the form of bindings that it should not have access to. For this reason, the default sets of predefined bindings do not depend on any dynamically scoped value.
Relatedly, set!
may allow a sandbox to mutate a primitive,
invalidating many system-wide invariants. Guile is currently quite
permissive when it comes to imported bindings and mutability. Although
set!
to a module-local or lexically bound variable would be fine,
we don’t currently have an easy way to disallow set!
to an
imported binding, so currently no binding set includes set!
.
If you, dear reader, find the above discussion interesting, you will enjoy Jonathan Rees’ dissertation, “A Security Kernel Based on the Lambda Calculus”.
All “pure” bindings that together form a safe subset of those bindings available by default to Guile user code.
Like all-pure-bindings
, but additionally including mutating
primitives like vector-set!
. This set is still safe in the sense
mentioned above, with the caveats about mutation.
The components of these composite sets are as follows:
The components of all-pure-bindings
.
The additional components of all-pure-and-impure-bindings
.
Finally, what do you do with a binding set? What is a binding set
anyway? make-sandbox-module
is here for you.
Return a fresh module that only contains bindings.
The bindings should be given as a list of import sets. One import
set is a list whose car names an interface, like (ice-9 q)
, and
whose cdr is a list of imports. An import is either a bare symbol or a
pair of (out . in)
, where out and in are
both symbols and denote the name under which a binding is exported from
the module, and the name under which to make the binding available,
respectively.
So you see that binding sets are just lists, and
all-pure-and-impure-bindings
is really just the result of
appending all of the component binding sets.