Chibi-Scheme

Alex Shinn

Fri Jul 31 23:14:28 2020

Minimal Scheme Implementation for use as an Extension Language

http://synthcode.com/wiki/chibi-scheme

Introduction

Chibi-Scheme is a very small library with no external dependencies, intended for use as an extension and scripting language in C programs. In addition to support for lightweight VM-based threads, each VM itself runs in an isolated heap allowing multiple VMs to run simultaneously in different OS threads.

The default language is the R7RS (scheme base) library, with support for all libraries from the small language. Support for additional languages such as JavaScript, Go, Lua and Bash are planned for future releases. Scheme is chosen as a substrate because its first class continuations and guaranteed tail-call optimization makes implementing other languages easy.

The system is designed in optional layers, beginning with a VM based on a small set of opcodes, a set of primitives implemented in C, a default language, a module system implementation, and a set of standard modules. You can choose whichever layer suits your needs best and customize the rest. Adding your own primitives or wrappers around existing C libraries is easy with the C FFI.

Chibi is known to build and run on 32 and 64-bit Linux, FreeBSD, DragonFly, OS X, iOS, Windows (under Cygwin) and Plan9.

Installation

To build, just run "make". This will provide a shared library "libchibi-scheme", as well as a sample "chibi-scheme" command-line repl. The "chibi-scheme-static" make target builds an equivalent static executable. If your make doesn't support GNU make conditionals, then you'll need to edit the top of the Makefile to choose the appropriate settings. On Plan9 just run "mk". You can test the build with "make test".

To install run "make install". If you want to try the executable out without installing, you will probably need to set LD_LIBRARY_PATH, depending on your platform. If you have an old version installed, run "make uninstall" first, or manually delete the directory.

You can edit the file chibi/features.h for a number of settings, mostly disabling features to make the executable smaller. You can specify standard options directly as arguments to make, for example

make CFLAGS=-Os CPPFLAGS=-DSEXP_USE_NO_FEATURES=1

to optimize for size, or

make LDFLAGS=-L/usr/local/lib CPPFLAGS=-I/usr/local/include

to compile against a library installed in /usr/local.

By default Chibi uses a custom, precise, non-moving GC (non-moving is important so you can maintain references from C code). You can link against the Boehm conservative GC by editing the features.h file, or directly from make with:

make SEXP_USE_BOEHM=1

To compile a static executable, use

make chibi-scheme-static SEXP_USE_DL=0

Note the static executable has no binary libraries, so the default language won't work and you'll need to run as

./chibi-scheme-static -q

To compile a static executable with all C libraries statically included (so that you can run the default language), first you need to create a clibs.c file, which can be done with:

make clibs.c

or edited manually. Be sure to run this with a non-static chibi-scheme. Then you can make the static executable with:

make -B chibi-scheme-static SEXP_USE_DL=0 CPPFLAGS="-DSEXP_USE_STATIC_LIBS -DSEXP_USE_STATIC_LIBS_NO_INCLUDE=0"

By default files are installed in /usr/local. You can optionally specify a PREFIX for the installation directory:

make PREFIX=/path/to/install/
sudo make PREFIX=/path/to/install/ install

Compile-Time Options

The include file "chibi/features.h" describes a number of C preprocessor values which can be enabled or disabled by setting to 1 or 0 respectively. For example, the above commands used the features SEXP_USE_BOEHM, SEXP_USE_DL and SEXP_USE_STATIC_LIBS. Many features are still experimental and may be removed from future releases, but the important features are listed below.

Installed Programs

The command-line programs chibi-scheme, chibi-doc and chibi-ffi are installed by default, along with manpages. chibi-scheme provides a REPL and way to run scripts. Run -? for a brief list of options, or see the man page for more details. chibi-doc is the command-line interface to the literate documentation system described in (chibi scribble), and used to build this manual. chibi-ffi is a tool to build wrappers for C libraries, described in the FFI section below.

Default Language

Scheme Standard

The default language is the (scheme base) library from R7RS, which is mostly a superset of R5RS.

The reader defaults to case-sensitive, like R6RS and R7RS but unlike R5RS. You can specify the -f option on the command-line to enable case-folding. The default configuration includes the full numeric tower: fixnums, flonums, bignums, exact rationals and complex numbers, though this can be customized at compile time.

Full continuations are supported, but currently continuations don't take C code into account. This means that you can call from Scheme to C and then from C to Scheme again, but continuations passing through this chain may not do what you expect. The only higher-order C functions (thus potentially running afoul of this) in the standard environment are load and eval. The result of invoking a continuation created by a different thread is also currently unspecified.

In R7RS (and R6RS) semantics it is impossible to use two macros from different modules which both use the same auxiliary keywords (like else in cond forms) without renaming one of the keywords. By default Chibi considers all top-level bindings effectively unbound when matching auxiliary keywords, so this case will "just work". This decision was made because the chance of different modules using the same keywords seems more likely than user code unintentionally matching a top-level keyword with a different binding, however if you want to use R7RS semantics you can compile with SEXP_USE_STRICT_TOPLEVEL_BINDINGS=1.

load is extended to accept an optional environment argument, like eval. You can also load shared libraries in addition to Scheme source files - in this case the function sexp_init_library is automatically called with the following signature:

sexp_init_library(sexp context, sexp self, sexp_sint_t n, sexp environment, const char* version, sexp_abi_identifier_t abi);

Note, as R7RS (and earlier reports) states, "in contrast to other dialects of Lisp, the order of evaluation is unspecified [...]". Chibi is one of the few implementations which use a right-to-left evaluation order, which can be surprising to programmers coming from other languages.

Module System

Chibi supports the R7RS module system natively, which is a simple static module system. The Chibi implementation is actually a hierarchy of languages in the style of the Scheme48 module system, allowing easy extension of the module system itself. As with most features this is optional, and can be ignored or completely disabled at compile time.

Modules names are hierarchical lists of symbols or numbers. A module definition uses the following form:

  (define-library (foo bar baz)
    <library-declarations> ...)

where <library-declarations> can be any of

  (export <id> ...)                    ;; specify an export list
  (import <import-spec> ...)           ;; specify one or more imports
  (begin <expr> ...)                   ;; inline Scheme code
  (include <file> ...)                 ;; load one or more files
  (include-ci <file> ...)              ;; as include, with case-folding
  (include-shared <file> ...)          ;; dynamic load a library (non-R7RS)
  (alias-for <library>)                ;; a library alias (non-R7RS)

<import-spec> can either be a module name or any of

  (only <import-spec> <id> ...)
  (except <import-spec> <id> ...)
  (rename <import-spec> (<from-id> <to-id>) ...)
  (prefix <import-spec> <prefix-id>)
  (drop-prefix <import-spec> <prefix-id>)   ;; non-R7RS

These forms perform basic selection and renaming of individual identifiers from the given module. They may be composed to perform combined selection and renaming.

Some modules can be statically included in the initial configuration, and even more may be included in image files, however in general modules are searched for in a module load path. The definition of the module (foo bar baz) is searched for in the file "foo/bar/baz.sld". The default module path includes the installed directories, "." and "./lib". Additional directories can be specified with the command-line options -I and -A (see the command-line options below) or with the add-modue-directory procedure at runtime. You can search for a module file with (find-module-file <file>), or load it with (load-module-file <file> <env>).

Within the module definition, files are loaded relative to the .sld file, and are written with their extension (so you can use whatever suffix you prefer - .scm, .ss, .sls, etc.).

Shared modules, on the other hand, should be specified without the extension - the correct suffix will be added portably (e.g. .so for Unix and .dylib for OS X).

You may also use cond-expand and arbitrary macro expansions in a module definition to generate <module-declarations>.

Macro System

syntax-rules macros are provided by default, with the extensions from SRFI-46. In addition, low-level hygienic macros are provided with a syntactic-closures interface, including sc-macro-transformer, rsc-macro-transformer, and er-macro-transformer. A good introduction to syntactic-closures can be found at http://community.schemewiki.org/?syntactic-closures.

identifier?, identifier->symbol, identifier=?, and make-syntactic-closure and strip-syntactic-closures are also available.

Types

You can define new record types with SRFI-9, or inherited record types with SRFI-99. These are just syntactic sugar for the following more primitive type constructors:

(register-simple-type <name-string> <parent> <field-names>)
 => <type>    ; parent may be #f, field-names should be a list of symbols

(make-type-predicate <opcode-name-string> <type>)
 => <opcode>  ; takes 1 arg, returns #t iff that arg is of the type

(make-constructor <constructor-name-string> <type>)
 => <opcode>  ; takes 0 args, returns a newly allocated instance of type

(make-getter <getter-name-string> <type> <field-index>)
 => <opcode>  ; takes 1 args, retrieves the field located at the index

(make-setter <setter-name-string> <type> <field-index>)
 => <opcode>  ; takes 2 args, sets the field located at the index

(type-slot-offset <type> <field-name>)
 => <index>   ; returns the index of the field with the given name

Unicode

Chibi supports Unicode strings and I/O natively. Case mappings and comparisons, character properties, formatting and regular expressions are all Unicode aware, supporting the latest version 13.0 of the Unicode standard.

Internally strings are encoded as UTF-8. This provides easy interoperability with many C libraries, but means that string-ref and string-set! are O(n), so they should be avoided in performance-sensitive code (unless you compile Chibi with SEXP_USE_STRING_INDEX_TABLE).

In general you should use high-level APIs such as string-map to ensure fast string iteration. String ports also provide a simple and portable way to efficiently iterate and construct strings, by looping over an input string or accumulating characters in an output string.

The in-string and in-string-reverse iterators in the (chibi loop) module will also iterate over strings efficiently while hiding the low-level details.

In the event that you do need a low-level interface, such as when writing your own iterator protocol, you should use string cursors. (srfi 130) provides a portable API for this, or you can use (chibi string) which builds on the following core procedures:

Embedding in C

Quick Start

To use Chibi-Scheme in a program you need to link against the "libchibi-scheme" library and include the "eval.h" header file:

#include <chibi/eval.h>

All definitions begin with a "sexp_" prefix, or "SEXP_" for constants (deliberately chosen not to conflict with other Scheme implementations which typically use "scm_"). In addition to the prototypes and utility macros, this includes the following type definitions:

A simple program might look like:

void dostuff(sexp ctx) {
  /* declare and preserve local variables *
  sexp_gc_var2(obj1, obj2);
  sexp_gc_preserve2(ctx, obj1, obj2);

  /* load a file containing Scheme code *
  obj1 = sexp_c_string(ctx, "/path/to/source/file.scm", -1);
  sexp_load(ctx, obj1, NULL);

  /* eval a C string as Scheme code *
  sexp_eval_string(ctx, "(some scheme expression)", -1, NULL);

  /* construct a Scheme expression to eval *
  obj1 = sexp_intern(ctx, "my-procedure", -1);
  obj2 = sexp_cons(ctx, obj1, SEXP_NULL);
  sexp_eval(ctx, obj2, NULL);

  /* release the local variables *
  sexp_gc_release2(ctx);
}

int main(int argc, char** argv) {
  sexp ctx;
  sexp_scheme_init();
  ctx = sexp_make_eval_context(NULL, NULL, NULL, 0, 0);
  sexp_load_standard_env(ctx, NULL, SEXP_SEVEN);
  sexp_load_standard_ports(ctx, NULL, stdin, stdout, stderr, 1);
  dostuff(ctx);
  sexp_destroy_context(ctx);
}

Looking at main, sexp_make_eval_context and sexp_destroy_context create and destroy a "context", which manages the heap and VM state. The meaning of the arguments is explained in detail below, but these values will give reasonable defaults, in this case constructing an environment with the core syntactic forms, opcodes, and standard C primitives.

This is still a fairly bare environment, so we call sexp_load_standard_env to find and load the default initialization file.

The resulting context can then be used to construct objects, call functions, and most importantly evaluate code, as is done in dostuff. The default garbage collector for Chibi is precise, which means we need to declare and preserve references to any temporary values we may generate, which is what the sexp_gc_var2, sexp_gc_preserve2 and sexp_gc_release2 macros do (there are similar macros for values 1-6). Precise GCs prevent a class of memory leaks (and potential attackes based thereon), but if you prefer convenience then Chibi can be compiled with a conservative GC and you can ignore these.

The interesting part is then the calls to sexp_load, eval_string and eval which evaluate code stored in files, C strings, or represented as s-expressions respectively.

Destroying a context runs any finalizers for all objects in the heap and then frees the heap memory (but has no effect on other contexts you or other users of the library may have created).

Contexts and Evaluation

Contexts represent the state needed to perform evaluation. This includes keeping track of the heap (when using precise GC), a default environment, execution stack, and any global values. A program being evaluated in one context may spawn multiple child contexts, such as when you call eval, and each child will share the same heap and globals. When using multiple interpreter threads, each thread has its own context.

You can also create independent contexts with their own separate heaps. These can run simultaneously in multiple OS threads without any need for synchronization.

Garbage Collection

Chibi uses a precise garbage collector by default, which means when performing multiple computations on the C side you must explicitly preserve any temporary values. You can declare variables to be preserved with sexp_gc_varn, for n from 1 to 6.

You can declare additional macros for larger values of n if needed.

sexp_gc_varn(obj1, obj2, ..., objn)

This is equivalent to the declaration

sexp obj1, obj2, ..., objn;

except it makes preservation possible. Because it is a declaration it must occur at the beginning of your function, and because it includes assignments (in the macro-expanded form) it should occur after all other declarations.

To preserve these variables for a given context, you can then use sexp_gc_preserven:

sexp_gc_preserven(ctx, obj1, obj2, ..., objn)

This can be delayed in your code until you know a potentially memory-allocating computation will be performed, but once you call sexp_gc_preserven it must be paired with a matching sexp_gc_releasen:

sexp_gc_releasen(ctx);

Note each of these have different signatures. sexp_gc_varn just lists the variables to be declared. sexp_gc_preserven prefixes these with the context in which they are to be preserved, and sexp_gc_releasen just needs the context.

A typical usage for these is:

sexp foo(sexp ctx, sexp bar, sexp baz) {
  /* variable declarations *
  int i, j;
  ...
  sexp_gc_var3(tmp1, tmp2, res);

  /* asserts or other shortcut returns *
  sexp_assert_type(ctx, sexp_barp, SEXP_BAR, bar);
  sexp_assert_type(ctx, sexp_bazp, SEXP_BAZ, baz);

  /* preserve the variables in ctx *
  sexp_gc_preserve3(ctx, tmp1, tmp2, res);

  /* perform your computations *
  tmp1 = ...
  tmp2 = ...
  res = ...

  /* release before returning *
  sexp_gc_release3(ctx);

  return res;
}

If compiled with the Boehm GC, sexp_gc_varn just translates to the plain declaration, while sexp_gc_preserven and sexp_gc_releasen become noops.

When interacting with a garbage collection system from another language, or communicating between different Chibi managed heaps, you may want to manually ensure objects are preserved irrespective of any references to it from other objects in the same heap. This can be done with the sexp_preserve_object and sexp_release_object utilities.

sexp_preserve_object(ctx, obj)

Increment the absolute reference count for obj. So long as the reference count is above 0, obj will not be reclaimed even if there are no references to it from other object in the Chibi managed heap.

sexp_release_object(ctx, obj)

Decrement the absolute reference count for obj.

C API Index

The above sections describe most everything you need for embedding in a typical application, notably creating environments and evaluating code from sexps, strings or files. The following sections expand on additional macros and utilities for inspecting, accessing and creating different Scheme types, and for performing port and string I/O. It is incomplete - see the macros and SEXP_API annotated functions in the include files (sexp.h, eval.h, bignum.h) for more bindings.

Being able to convert from C string to sexp, evaluate it, and convert the result back to a C string forms the basis of the C API. Because Chibi is aimed primarily at minimal size, there are relatively few other utilities or helpers. It is expected most high-level code will be written in Scheme, and most low-level code will be written in pure, Scheme-agnostic C and wrapped via the FFI.

Type Predicates

The sexp represents different Scheme types with the use of tag bits for so-called "immediate" values, and a type tag for heap-allocated values. The following predicates can be used to distinguish these types. Note the predicates in C all end in "p". For efficiency they are implemented as macros, and so may evaluate their arguments multiple times.

Note also that the non-immediate type checks will segfault if passed a NULL value. At the Scheme level (and the return values of any exported primitives) NULLs are never exposed, however some unexposed values in C can in certain cases be NULL. If you're not sure you'll need to check manually before applying the predicate.

Constants

The following shortcuts for various immediate values are available.

String Handling

Scheme strings are length bounded C strings which can be accessed with the following macros:

Currently all Scheme strings also happen to be NULL-terminated, but you should not rely on this and be sure to use the size as a bounds check. The runtime does not prevent embedded NULLs inside strings, however data after the NULL may be ignored.

By default (unless you compile with -DSEXP_USE_UTF8_STRING=0), strings are interpreted as UTF-8 encoded on the Scheme side, as describe in section Unicode above. In many cases you can ignore this on the C side and just treat the string as an opaque sequence of bytes. However, if you need to you can use the following macros to safely access the contents of the string regardless of the options Chibi was compiled with:

When UTF-8 support is not compiled in the cursor and non-cursor variants are equivalent.

Accessors

The following macros provide access to the different components of the Scheme types. They do no type checking, essentially translating directly to pointer offsets, so you should be sure to use the above predicates to check types first. They only evaluate their arguments once.

Constructors

Constructors allocate memory and so must be passed a context argument. Any of these may fail and return the OOM exception object.

I/O

Utilities

Exceptions

Exceptions can be created with the following:

Returning an exception from a C function by default raises that exception in the VM. If you want to pass an exception as a first class value, you have to wrap it first:

sexp sexp_maybe_wrap_error (sexp ctx, sexp obj)

Customizing

You can add your own types and primitives with the following functions.

See the C FFI for an easy way to automate adding bindings for C functions.

C FFI

The "chibi-ffi" script reads in the C function FFI definitions from an input file and outputs the appropriate C wrappers into a file with the same base name and the ".c" extension. You can then compile that C file into a shared library:

chibi-ffi file.stub
cc -fPIC -shared file.c -lchibi-scheme

(or using whatever flags are appropriate to generate shared libs on your platform) and the generated .so file can be loaded directly with load, or portably using (include-shared "file") in a module definition (note that include-shared uses no suffix).

The goal of this interface is to make access to C types and functions easy, without requiring the user to write any C code. That means the stubber needs to be intelligent about various C calling conventions and idioms, such as return values passed in actual parameters. Writing C by hand is still possible, and several of the core modules provide C interfaces directly without using the stubber.

Includes and Initializations

Struct Interface

C structs can be bound as Scheme types with the define-c-struct form:

(define-c-struct struct_name
  [predicate: predicate-name]
  [constructor: constructor-name]
  [finalizer: c_finalizer_name]
  (type c_field_name getter-name setter-name) ...)

struct_name should be the name of a C struct type. If provided, predicate-name is bound to a procedure which takes one object and returns #t iff the object is of type struct_name.

If provided, constructor-name is bound to a procedure of zero arguments which creates and returns a newly allocated instance of the type.

If a finalizer is provided, c_finalizer_name must be a C function which takes one argument, a pointer to the struct, and performs any cleanup or freeing of resources necessary.

The remaining slots are similar to the SRFI-9 syntax, except they are prefixed with a C type (described below). The c_field_name should be a field name of struct_name. getter-name will then be bound to a procedure of one argument, a struct_name type, which returns the given field. If provided, setter-name will be bound to a procedure of two arguments to mutate the given field.

The variants define-c-class and define-c-union take the same syntax but define types with the class and union keywords respectively. define-c-type just defines accessors to an opaque type without any specific struct-like keyword.

;; Example: the struct addrinfo returned by getaddrinfo.

(c-system-include "netdb.h")

(define-c-struct addrinfo
  finalizer: freeaddrinfo
  predicate: address-info?
  (int              ai_family    address-info-family)
  (int              ai_socktype  address-info-socket-type)
  (int              ai_protocol  address-info-protocol)
  ((link sockaddr)  ai_addr      address-info-address)
  (size_t           ai_addrlen   address-info-address-length)
  ((link addrinfo)  ai_next      address-info-next))

Function and Constant Interface

C functions are defined with:

(define-c return-type name-spec (arg-type ...))

where name-space is either a symbol name, or a list of (scheme-name c_name). If just a symbol is used, the C name is generated automatically by replacing any dashes (-) in the Scheme name with underscores (_).

Each arg-type is a type suitable for input validation and conversion as discussed below.

;; Example: define connect(2) in Scheme
(define-c int connect (int sockaddr int))

Constants can be defined with:

(define-c-const type name-space)

where name-space is the same form as in define-c. This defines a Scheme variable with the same value as the C constant.

;; Example: define address family constants in Scheme
(define-c-const int (address-family/unix "AF_UNIX"))
(define-c-const int (address-family/inet "AF_INET"))

C Types

Basic Types

Integer Types

Float Types

String Types

Port Types

Struct Types

Struct types are by default just referred to by the bare struct_name from define-c-struct, and it is assumed you want a pointer to that type. To refer to the full struct, use the struct modifier, as in (struct struct-name).

Type modifiers

Any type may also be written as a list of modifiers followed by the type itself. The supported modifiers are:

Standard Modules

A number of SRFIs are provided in the default installation. Note that SRFIs 0, 6, 23, 46 and 62 are built into the default environment so there's no need to import them. SRFI 22 is available with the "-r" command-line option. This list includes popular SRFIs or SRFIs used in standard Chibi modules (many other SRFIs are available on snow-fort):

Additional non-standard modules are put in the (chibi) module namespace.

Snow Package Manager

Beyond the distributed modules, Chibi comes with a package manager based on Snow2 which can be used to share R7RS libraries. Packages are distributed as tar gzipped files called "snowballs," and may contain multiple libraries. The program is installed as snow-chibi. The "help" subcommand can be used to list all subcommands and options. Note by default snow-chibi uses an image file to speed-up loading (since it loads many libraries) - if you have any difficulties with image files on your platform you can run

snow-chibi --noimage

to disable this feature.

Querying Packages and Status

By default snow-chibi looks for packages in the public repository http://snow-fort.org/, though you can customize this with the --repository-uri option. Packages can be browsed on the site, but you can also search and query from the command-line tool.

Managing Packages

The basic package management functionality, installing upgrading and removing packages.

Authoring Packages

Creating packages can be done with the package command, though other commands allow for uploading to public repositories.

Easy Packaging

To encourage sharing code it's important to make it as easy as possible to create packages, while encouraging documentation and tests. In particular, you should never need to duplicate information anywhere. Thus the package command automatically locates and packages include files (and data and ffi files) and determines dependencies for you. In addition, it can automatically handle versions, docs and tests:

Other useful meta-info options include:

These three are typically always the same, so it's useful to save them in your ~/.snow/config.scm file. This file contains a single sexp and can specify any option, for example:

((repository-uri "http://alopeke.gr/repo.scm")
 (command
  (package
   (authors "Socrates &lt;hemlock@aol.com>")
   (doc-from-scribble #t)
   (version-file "VERSION")
   (test-library (append-to-last -test))
   (license gpl))))

Top-level snow options are represented as a flat alist. Options specific to a command are nested under (command (name ...)), with most options here being for package. Here unless overridden on the command-line, all packages will use the given author and license, try to extract literate docs from the code, look for a version in the file "VERSION", and try to find a test with the same library name appended with -test, e.g. for the library (socratic method), the test library would be (socratic method-test). This form is an alternate to using an explicit test-library name, and encourages you to keep your tests close to the code they test. In the typical case, if using these conventions, you can thus simply run snow-chibi package <lib-file> without any other options.

Other Implementations

Although the command is called snow-chibi, it supports several other R7RS implementations. The implementations command tells you which you currently have installed. The following are currently supported: