by Sebastian Egner
This copy of the SRFI 42 specification document is distributed as part of the Racket package srfi-doc.
The canonical source of this document is https://srfi.schemers.org/srfi-42/srfi-42.html.
This SRFI is currently in final status.  Here is an explanation of each status that a SRFI can hold.  To provide input on this SRFI, please send email to srfi-42@nospamsrfi.schemers.org.  To subscribe to the list, follow these instructions.  You can access previous messages via the mailing list archive.
This SRFI defines a modular and portable mechanism for eager comprehensions extending the algorithmic language Scheme [R5RS]. An eager comprehension is a convenient notation for one or more nested or parallel loops generating a sequence of values, and accumulating this sequence into a result. In its most simple form, a comprehension according to this SRFI looks like this:
(list-ec (: i 5) (* i i)) => (0 1 4 9 16).
Here, i is a local variable that is sequentially
bound to the values 0, 1, ..., 4,  and the squares of these
numbers are collected in a list.
The following example illustrates most conventions of this SRFI
with respect to nesting and syntax:
(list-ec (: n 1 4) (: i n) (list n i)) => ((1 0) (2 0) (2 1) (3 0) (3 1) (3 2)).
In the example, the variable n is first bound to 1
then to 2 and finally to 3, and for each binding of n
the variable i is bound to the values
0, 1, ..., n-1 in turn.
The expression (list n i) constructs a two-element list for
each bindings, and the comprehension list-ec collects
all these results in a list.
The mechanism defined in this SRFI has the following properties:
list-ec,
append-ec,
sum-ec,
min-ec,
every?-ec,
do-ec, and others.
Some other natural comprehensions (e.g. gcd-ec) have
not been included into this SRFI due to their low significance
for most applications.
On the other hand, a few comprehensions
(fold-ec,
fold3-ec)
not inspired by [R5RS] have been
included due to their broad applicability.
:list, :string, ...)
expecting certain types of objects for their arguments.
These generators usually produce code as efficient as
hand coded do-loops.
:
(read 'run through')
dispatching on the value of its argument list at runtime.
In the examples above, one or two integers were used to define a range.
The convenience of omitting the type comes at a certain
performance penalty, both per iteration and during startup of the loop.
:parallel),
and can be stopped early before (:while)
or after (:until)
producing the current value.
if, not,
and, or),
intermediate commands can be evaluated between
generators (begin), and
intermediate variables can be introduced (:let).
my-comprehension-ec,
:my-generator is used systematically.
The remainder of this document is organized as follows. In section Rationale a brief introduction is given what the motivation is for this SRFI. The following section Specification defines the mechanism. Section Suggestions for Application-specific Extensions presents some ideas how extensions using this SRFI could look like. The section Design Rationale contains some considerations that went into the design of the mechanism as defined in the specification. The following section Related Work and Acknowledgements briefly reviews other proposals related to Scheme and comprehensions or loops. Finally, the section Reference Implementation gives source code for a reference implementation together with a collection of runnable examples and a few examples on extensions.
The purpose of this SRFI is to provide a compact notation
for many common cases of loops, in particular those
constructing values for further computation.
The origin of this SRFI is my frustration that
there is no simple for the list of integers
from 0 to n-1.
With this SRFI it is (list-ec (: i n) i).
Refer to the collection of examples for the
reference implementation
what it can be used for, and what it should not be used for.
To give a practically useful example,
the following procedure computes the sorted
list of all prime numbers below a certain bound
(you may want to run it yourself to get an idea of
its efficiency):
(define (eratosthenes n) ; primes in {2..n-1} for n >= 1 (let ((p? (make-string n #\1))) (do-ec (:range k 2 n) (if (char=? (string-ref p? k) #\1)) (:range i (* 2 k) n k) (string-set! p? i #\0) ) (list-ec (:range k 2 n) (if (char=? (string-ref p? k) #\1)) k) ))
Apart from make simple things simple, there is no other paradigm involved for this SRFI. In particular, it is not the ambition to implement the powerful lazy list comprehensions of other functional programming languages in Scheme. If you are looking for that you may want to refer to [SRFI 40]. (The usual definition of the stream of all primes does in fact also use Eratosthenes' sieve method. It is instructive to compare.)
The main focus of the design of this SRFI is portability under [R5RS] and modularity for extension. Portability is achieved by limiting the features included. Modularity for generators is achieved by a special implementation technique using Continuation Passing Style for macros (which I learned from Richard Kelsey's implementation of "Macros for writing loops", [MWL]) and by limiting the expressive power of generators. Modularity for comprehensions requires no additional effort. As modularity was a major design goal, I hope many people will find it easy to define their own comprehensions and generators. As a starting point for doing so, I have included several suggestions for extensions.
For a detailed motivation of the design decisions, please refer to the Section "Design Rationale".
A comprehension is a hygienic, referentially transparent macro
in the sense of [R5RS, 4.3.].
The macros extend the <expression>-syntax
defined in [R5RS, 7.1.3.].
The main syntactic pattern used for defining a comprehension is
<qualifier>, representing a generator or a filter.
It is defined in Section "Qualifiers".
The most important instances of <qualifier> are generators.
These are defined in Section "Generators".
Generators come in three flavors,
as typed generators
(:list,
:range etc.),
as the dispatching generator :
(pronounced as 'run through'),
and as combined and modified generators
(:parallel,
:while,
:until).
Most generators in this SRFI also support an optional
index variable counting the
values being generated.
Finally, it is explained how to add a new application-specific comprehension, how to add a new application-specific typed generator, and how to add a new application-specific dispatching generator. As this concerns code unknown at the time this is being written, the explanation should not be taken as a specification in the literal sense. It rather suggests a convention to follow in order to ensure new comprehensions and generators blend seamlessly with the ones defined in this SRFI.
(do-ec <qualifier>* <command>)
<command> exactly once
for each binding in the sequence defined by the qualifiers.
If there are no qualifiers <command>
is evaluated exactly once.
The expression is evaluated for its side-effects only.
The result of the comprehension is unspecified.
(list-ec <qualifier>* <expression>)
<expression> once for each binding in the sequence
defined by the qualifiers.
If there are no qualifiers the result is the list with
the value of <expression>.
(append-ec <qualifier>* <expression>)
<expression>,
which must all be lists.(apply append (list-ec <qualifier>* <expression>)).
(string-ec <qualifier>* <expression>)
<expression>.(list->string (list-ec <qualifier>* <expression>)).
(string-append-ec <qualifier>* <expression>)
<expression>,
which must all be strings.(apply string-append (list-ec <qualifier>* <expression>)).
(vector-ec <qualifier>* <expression>)
<expression>.(list->vector (list-ec <qualifier>* <expression>)).
(vector-of-length-ec <k> <qualifier>* <expression>)
<expression>,
of which there must be exactly <k>.
This comprehension behaves like vector-ec
but can be implemented more efficiently.
(sum-ec <qualifier>* <expression>)
<expression>.(apply + (list-ec <qualifier>* <expression>)).
(product-ec <qualifier>* <expression>)
<expression>.(apply * (list-ec <qualifier>* <expression>)).
(min-ec <qualifier>* <expression>)(max-ec <qualifier>* <expression>)
The minimum and maximum of all values of <expression>.
The sequence of values must be non-empty.
Think of these as
(apply min (list-ec <qualifier>* <expression>))
(apply max (list-ec <qualifier>* <expression>)).
If you want to return a default value in case the sequence is empty
you may want to consider
(fold3-ec 'infinity <qualifier>* <expression> min min).
(any?-ec <qualifier>* <test>)
<test> in the sequence
of bindings specified by the qualifiers is non-#f.
If this is the case, #t is returned, otherwise #f.
If there are no bindings in the sequence specified by the qualifiers
at all then the result is #f.
The enumeration of values stops after the first non-#f
encountered.
(every?-ec <qualifier>* <test>)
<test> are non-#f.
If this is the case, #t is returned, otherwise #f.
If the sequence is empty the result is #t.
Enumeration stops after the first #f.
(first-ec <default> <qualifier>* <expression>)
(last-ec  <default> <qualifier>* <expression>)
<expression>
in the sequence of bindings specified by the qualifiers.
Before enumeration, the result is initialized
with the value of <default>;
so this will be the result if the sequence is empty.
Enumeration is terminated in first-ec
when the first value has been computed.
(fold-ec   <x0> <qualifier>* <expression>      <f2>)
(fold3-ec <x0> <qualifier>* <expression> <f1> <f2>)
Reduces the sequence x[0], x[1], ..., x[n-1]
of values obtained by evaluating <expression> once
for each binding as specified by <qualifier>*.
The arguments <x0>, <f2>,
and <f1>, all syntactically equivalent to
<expression>, specify the reduction process.
The reduction process for fold-ec is defined as follows.
A reduction variable x is initialized to the value
of <x0>,
and for each k in {0, ..., n-1} the command
(set! x (<f2> x[k] x))
is evaluated.
Finally, x is returned as the value of the comprehension.
The reduction process for fold3-ec is different.
If and only if n = 0, i.e. the sequence is empty, then
<x0> is evaluated and returned as the value
of the comprehension.
Otherwise, a reduction variable x is initialized
to the value of (<f1> x[0]),
and for each k in {1, ..., n-1} the command
(set! x (<f2> x[k] x))
is evaluated.
Finally, x is returned as the value of the comprehension.
As the order of the arguments suggests,
<x0> is evaluated outside the scope of the
qualifiers, whereas the reduction expressions involving
<f1> and <f2> are
inside the scope of the qualifiers (so they may depend on
any variable introduced by the qualifiers).
Note that <f2> is evaluated repeatedly,
with any side-effect or overhead this might have.
The main purpose of these comprehensions is
implementing other comprehensions as special cases.
They are generalizations of the procedures
fold and reduce in the sense of
[SRFI 1].
(Concerning naming and argument order, please refer to
the discussion archive of SRFI 1, in particular
the posting [Folds].)
Note that fold3-ec is defined such that
<x0> is only evaluated in case the
sequence is empty.
This allows raising an error for the empty sequence,
as in the example definition of min-ec below.
<application-specific comprehension>
An important aspect of this SRFI is a modular mechanism
to define application-specific comprehensions.
To create a new comprehension a hygienic macro
of this name is defined.
The macro transforms the new comprehension patterns
into instances of do-ec, which is the most
fundamental eager comprehension, or any other comprehension
already defined.
For example, the following code defines
list-ec and
min-ec
in terms of fold-ec
and fold3-ec:
(define-syntax list-ec (syntax-rules () ((list-ec etc1 etc ...) (reverse (fold-ec '() etc1 etc ... cons)) ))) (define-syntax min-ec (syntax-rules () ((min-ec etc1 etc ...) (fold3-ec (min) etc1 etc ... min min) )))
Note that the pattern etc1 etc ... matches
the syntax <qualifier>* <expression>
without separate access to <qualifier>*
and <expression>.
In order to define a comprehension that does need explicit
access to the <expression>-part,
the following method is used.
First, all qualifiers are collected into a
nested-qualifier, and then the
'exactly one qualifier'-case is implemented.
For illustration, the following code defines
fold3-ec in terms of
do-ec:
(define-syntax fold3-ec
  (syntax-rules (nested)
    ((fold3-ec x0 (nested q1 ...) q etc1 etc2 etc3 etc ...)
     (fold3-ec x0 (nested q1 ... q) etc1 etc2 etc3 etc ...) )
    ((fold3-ec x0 q1 q2 etc1 etc2 etc3 etc ...)
     (fold3-ec x0 (nested q1 q2) etc1 etc2 etc3 etc ...) )
    ((fold3-ec x0 expression f1 f2)
     (fold3-ec x0 (nested) expression f1 f2) )
    ((fold3-ec x0 qualifier expression f1 f2)
     (let ((result #f) (empty #t))
       (do-ec qualifier
              (let ((value expression)) ; don't duplicate code
                (if empty
                    (begin (set! result (f1 value))
                           (set! empty #f) )
                    (set! result (f2 value result)) )))
       (if empty x0 result) ))))
Finally, observe that the newly defined fold3-ec
comprehension inherits all types of qualifiers supported by
do-ec, including all application-specific generators;
no further definitions are necessary.
This section defines the syntax <qualifier>.
The nesting of qualifiers is from left (outer) to right (inner).
In other words, the rightmost generator spins fastest.
The nesting also defines the scope of the variables introduced
by the generators.
This implies that inner generators may depend on the variables
of outer generators.
The sequence of enumeration of values is strictly depth first.
These conventions are illustrated by the
first example.
The syntax <qualifier> consists of the following alternatives:
<generator>
<generator>-syntax is defined in
Section "Generators".
(if <test>)
<test> evaluates to non-#f.
Only for those bindings for which this is the case,
the subsequent qualifiers of the comprehension are evaluated.
(not <test>)
(and <test>*)
(or   <test>*)
(if (not <test>)),
(if (and <test>*)), and
(if (or <test>*)).
These represent frequent cases of filters.
(begin <sequence>)
<sequence>, consisting of
<command>* <expression>,
once for each binding of the variables defined by the
previous qualifiers in the comprehension.
Using this qualifier, side effects can be inserted
into the body of a comprehension.
(nested <qualifier>*)
nested-syntax is the same as
inserting <qualifier>*
into the enclosing comprehension.
This construct can be used to reduce comprehensions with several
qualifiers into a form with exactly one qualifier.
This section defines the syntax <generator>.
Each generator defines a sequence of bindings through
which one or more variables are run.
The scope of the variables begins after the closing
parenthesis of the generator expression and extends
to the end of the comprehension it is part of.
The variables defined by the generators are specified using the syntax
<vars> --> <variable1>[(index <variable2>)],
where <variable1> runs through the values in
the sequence defined by the generator, and the optional
<variable2> is an exact integer-valued
index variable counting the values (starting from 0).
The names of the variables must be distinct.
The following example illustrates the index variable:
(list-ec (: x (index i) "abc") (list x i)) => ((#\a 0) (#\b 1) (#\c 2))
Unless defined otherwise, all generators make sure that the expressions provided for their syntactic arguments are evaluated exactly once, before enumeration begins. Moreover, it may be assumed that the generators do not copy the code provided for their arguments, because that could lead to exponential growth in code size. Finally, it is possible to assign a value to the variables defined by a generator, but the effect of this assignment is unspecified.
The syntax <generator> consists of the following alternatives:
(: <vars> <arg1> <arg>*)
First the expressions <arg1> <arg>*
are evaluated into a[1] a[2] ... a[n]
and then a global dispatch procedure is used to dispatch on
the number and types of the arguments and run the resulting
generator.
Initially (after loading the SRFI), the following cases are recognized:
| :list | if | for all i in {1..n}: (list? a[i]). | 
| :string | if | for all i in {1..n}: (string? a[i]). | 
| :vector | if | for all i in {1..n}: (vector? a[i]). | 
| :range | if | n in {1..3} and
for all i in {1..n}: (integer? a[i])and(exact? a[i]). | 
| :real-range | if | n in {1..3} and
for all i in {1..n}: (real? a[i]). | 
| :char-range | if | n = 2 and
for all i in {1, 2}: (char? a[i]). | 
| :port | if | n in {1,2} and (input-port? a[1])and(procedure? a[2]). | 
The current dispatcher can be retrieved as
(:-dispatch-ref), a new dispatcher d
can be installed by
(:-dispatch-set! d)
yielding an unspecified result,
and a copy of the initial dispatcher can be obtained as
(make-initial-:-dispatch).
Please refer to the section below
for recommendation how to add cases to the dispatcher.
(:list     <vars> <arg1> <arg>*)
(:string <vars> <arg1> <arg>*)
(:vector <vars> <arg1> <arg>*)
<arg1> <arg>*
are evaluated and then all elements of the resulting values
are enumerated from left to right.
One can think of it as first appending all arguments and
then enumerating the combined object.
As a clarifying example, consider(list-ec (:string c (index i) "a" "b") (cons c i)) =>
((#\a . 0) (#\b . 1)).
(:integers <vars>)
:parallel,
:while, and
:until or with
a non-local exit in the body of the comprehension.
(:range <vars>         <stop>)
(:range <vars> <start> <stop>)
(:range <vars> <start> <stop> <step>)
Runs through a range of exact rational numbers.
The form (:range <vars> <stop>)
evaluates the expression <stop>,
which must result in an exact integer n,
and runs through the finite sequence 0, 1, 2, ..., n-1.
If n is zero or negative the sequence is empty.
The form (:range <vars> <start> <stop>)
evaluates the expressions <start> and
<stop>,
which must result in exact integers a and b,
and runs through the finite sequence
a, a+1, a+2, ..., b-1.
If b is less or equal a then the sequence is empty.
The form (:range <vars> <start> <stop> <step>)
first evaluates the expressions <start>,
<stop>, and <step>,
which must result in exact integers a, b, and s
such that s is unequal to zero.
Then the sequence
a, a + s, a + 2 s, ..., a + (n-1) s
is enumerated where n = ceil((b-a)/s).
In other words, the sequence starts at a, increments by s,
and stops when the next value would reach or cross b.
If n is zero or negative the sequence is empty.
(:real-range <vars>         <stop>)
(:real-range <vars> <start> <stop>)
(:real-range <vars> <start> <stop> <step>)
Runs through a range of real numbers using an explicit index variable. This form of range enumeration avoids accumulation of rounding errors and is the one to use if any of the numbers defining the range is inexact, not an integer, or a bignum of large magnitude.
Providing default value 0 for <start> and
1 for <step>, the generator first
evaluates <start>, <stop>,
and <step>, which must result in reals
a, b, and s such that
n = (b-a)/s is also representable
as a real.
Then the sequence 0, 1, 2, ... is enumerated while the
current value i is less than n, and the
variable in <vars> is bound to the
value a + i s.
If any of the values a, b, or s is
non-exact then all values in the sequence are non-exact.
(:char-range <vars> <min> <max>)
<min> and <max> are
evaluated, which must result in two characters a and b.
Then the sequence of characters
a, a+1, a+2, ..., b
is enumerated in the order defined by char<=?
in the sense of [R5RS, 6.3.4.].
If b is smaller than a then the sequence is empty.
(Note that b is included in the sequence.)
(:port <vars> <port>)(:port <vars> <port> <read-proc>)
read for
<read-proc>, the generator first
evaluates <port> and
<read-proc>, which must result
in an input port p and a procedure r.
Then the variable is run through the sequence obtained
by (r p)
while the result does not satisfy eof-object?.
(:dispatched <vars> <dispatch> <arg1> <arg>*)
Runs the variables through a sequence defined by <dispatch>
and <arg1> <arg>*.
The purpose of :dispatched is implementing
dispatched generators, in particular the predefined dispatching
generator :.
The working of :dispatched is as follows.
First <dispatch> and
<arg1> <arg>* are evaluated,
resulting in a procedure d (the 'dispatcher') and
the values a[1] a[2] ... a[n].
Then
(d (list a[1] a[2] ... a[n] ))
is evaluated, resulting in a value g.
If g is not a procedure then the dispatcher
did not recognize the argument list and an error is raised.
Otherwise the 'generator procedure' g is used
to run <vars> through a sequence of values.
The sequence defined by g is obtained by repeated
evaluation of (g empty)
until the result is empty.
In other words, g indicates the end of the sequence by
returning its only argument, for which the caller has provided
an object distinct from anything g can produce.
(Generator procedures are state based, they are no such noble things
as streams in the sense of SRFI 40.)
The definition of dispatchers is greatly simplified by the
macro :generator-proc that constructs a generator
procedure from a typed generator.
Let (g var arg1 arg ...) be an instance of
the <generator> syntax, for example an
application-specific typed generator,
with a single variable var and no index variable.
Then
(:generator-proc (g arg1 arg ...)) => g,
where the generator procedure g runs through the list
(list-ec (g var arg1 arg ...) var).
Adding an application-specific dispatching generator.
In order to define a new dispatching generator (say :my)
first a dispatching procedure (say :my-dispatch) is defined.
The dispatcher will be called with a single (!) argument
containing the list of all values to dispatch on.
To enable informative error messages, the dispatcher should
return a descriptive object (e.g. a symbol for the module name)
when it is called with the empty list.
Otherwise (if there is at least one value to dispatch on),
the dispatcher must either return a generator procedure
or #f (= no interest).
As an example, the following skeleton code defines a dispatcher
similar to the initial dispatcher of ::
(define (:my-dispatch args)
  (case (length args)
    ((0) 'SRFI-NN)
    ((1) (let ((a1 (car args)))
           (cond
            ((list? a1)
             (:generator-proc (:list a1)) )
            ((string? a1)
             (:generator-proc (:string a1)) )
            ...more unary cases...
            (else
             #f ))))
    ((2) (let ((a1 (car args)) (a2 (cadr args)))
           (cond
            ((and (list? a1) (list? a2))
             (:generator-proc (:list a1 a2)) )
            ...more binary cases...
            (else
             #f ))))
    ...more arity cases...
    (else
     (cond
      ((every?-ec (:list a args) (list? a))
       (:generator-proc (:list (apply append args))) )
      ...more large variable arity cases...
      (else
       #f )))))
Once the dispatcher has been defined, the following macro implements the new dispatching generator:
(define-syntax :my
  (syntax-rules (index)
    ((:my cc var (index i) arg1 arg ...)
     (:dispatched cc var (index i) :my-dispatch arg1 arg ...) )
    ((:my cc var arg1 arg ...)
     (:dispatched cc var :my-dispatch arg1 arg ...) )))
This method of extension yields complete control of the dispatching process.
Other modules can only add cases to :my
if they have access to :my-dispatch.
Extending the predefined dispatched generator.
An alternative to adding a new dispatched generators is extending
the predefined generator :.
Technically, extending : means
installing a new global dispatching procedure using
:-dispatch-set!
as described above.
In most cases, however, the already installed dispatcher should be extended by new cases. The following procedure is a utility for doing so:
(dispatch-union d1 d2) => d,
where the new dispatcher d recognizes the union of the
cases recognized by the dispatchers d1 and d2.
The new dispatcher always tries both component dispatchers
and raises an error in case of conflict.
The identification returned by (d)
is the concatenation of the component identifications
(d1) and
(d2), enclosed in lists
if necessary.
For illustration, consider the following code:
(define (example-dispatch args)
  (cond
   ((null? args)
    'example )
   ((and (= (length args) 1) (symbol? (car args)) )
    (:generator-proc (:string (symbol->string (car args)))) )
   (else
    #f )))
(:-dispatch-set! (dispatch-union (:-dispatch-ref) example-dispatch))
After evaluation of this code, the following example will work:
(list-ec (: c 'abc) c) => (#\a #\b #\c)
Adding cases to : is particularly useful
for frequent cases of interactive input.
Be warned, however, that the advantage of global extension also carries
the danger of conflicts, unexpected side-effects, and slow dispatching.
(:do (<lb>*) <ne1?> (<ls>*))
(:do (let (<ob>*) <oc>*) (<lb>*) <ne1?> (let (<ib>*) <ic>*) <ne2?> (<ls>*))
let,
optionally decorated with inner and outer lets.
This generator is for defining other generators.
(In fact, the reference implementation transforms any other
generator into an instance of fully decorated :do.)
The generator is a compromise between expressive power
(more flexible loops) and fixed structure (necessary
for merging and modifying generators).
In the fully decorated form, the syntactic variables
<ob> (outer binding),
<oc> (outer command),
<lb> (loop binding),
<ne1?> (not-end1?),
<ib> (inner binding),
<ic> (inner command),
<ne2?> (not-end2?), and
<ls> (loop step)
define the following loop skeleton:
(let (<ob>*)
  <oc>*
  (let loop (<lb>*)
    (if <ne1?>
        (let (<ib>*)
          <ic>*
          payload
          (if <ne2?>
              (loop <ls>*) ))))),
where <oc>* and
<ic>* are syntactically
equivalent to <command>*,
i.e. they do not begin with a <definition>.
The latter requirement allows the code generator to
produce more efficient code for special cases by
removing empty let-expressions altogether.
(:let <vars> <expression>)
<expression>, only.
This is the same as
(:list <vars> (list <expression>)).
If an index variable is specified, its value is 0.
The :let-generator can be used to introduce
an intermediate variable depending on outer generators.
(:parallel <generator>*)
<generator>*
by one step.
The parallel generator terminates when any of its component
generators terminates.
The generators share a common scope for the variables
they introduce.
This implies that the names of the variables introduced
by the various generators must be distinct.
(:while <generator> <expression>)
Runs <generator> while <expression>
evaluates to non-#f.
The guarding expression is included in the scope
of the variables introduced by the generator.
Note the distinction between the filter if and
the modified generator expressed by :while.
(:until <generator> <expression>)
Runs <generator> until after
<expression> has evaluated to non-#f.
The guarding expression is included in the scope
of the variables introduced by the generator.
Note the distinction between :while, stopping at
a certain condition, and :until, stopping after
a certain condition has occurred. The latter implies that the binding
that has triggered termination has been processed by the comprehension.
<application-specific typed generator>
An important aspect of this SRFI is a modular mechanism to
define new typed generators.
To define a new typed generator a hygienic referentially
transparent macro of the same name is defined to transform
the generator pattern into an instance of the
:do-generator.
The extension is fully modular, meaning that no other
macro has to be modified to add the new generator.
This is achieved by defining the new macro in
Continuation Passing Style, as in [MWL].
Technically, this works as follows.
Assume the generator syntax (:mygen <var> <arg>)
is to be implemented, for example running the variable <var>
through the list (reverse <arg>).
The following definition implements :mygen
in terms of :list
using the additional syntactic variable cc
(read current continuation):
(define-syntax :mygen
  (syntax-rules ()
    ((:mygen cc var arg)
     (:list cc var (reverse arg)) )))
After this definition, any comprehension will accept
the :mygen-generator and produce the
proper code for it.
This works as follows.
When a comprehension sees something of the form
(g arg ...) in the position of a
<qualifier> then it will
transform the entire comprehension into
(g (continue ...) arg ...).
This effectively 'transfers control' to the
macro g, for example :mygen.
The macro g has full control of
the transformation, but eventually it should
transform the expression into
(:do (continue ...) etc ...).
In the :mygen-example this is done
by the :list-macro.
The macro :do finally transforms
into (continue ... (:do etc ...)).
As continue has been chosen by the
macro implementing the comprehension,
it can regain control and proceed
with other qualifiers.
In order to ensure consistency of new generators
with the ones defined in this SRFI, a few conventions
are in order.
Firstly, the generator patterns begin with one or more
variables followed by arguments defining the sequence.
Secondly, each generator except :do
can handle an optional index variable.
This is most easily implemented using
:parallel
together with :integers.
In case the payload generator needs an index anyhow
(e.g. :vector)
it is more efficient to add an index-variable if
none is given and to implement the indexed case.
Finally, make sure that no syntactic variable of the
generator pattern ever gets duplicated in the code
(to avoid exponential code size in nested application),
and introduce sufficient intermediate variables to
make sure expressions are evaluated at the correct time.
In order to create an array from a sequence of elements, a comprehension with the following syntax would be useful:
(array-ec <shape> <qualifier>* <expression>).
The comprehension constructs a new array of the given shape by filling it row-major with the sequence of elements as specified by the qualifiers.
On the generator side, it would be most useful to have a generator of the form
(:array <vars> <arg>),
running through the elements of the array in row-major.
For the optional index variable, the extension
(index <k1> <k>*)
could be defined where <k1> <k>*
are variable names indexing the various dimensions.
In order to create a vector or list of random numbers, it would be convenient to have generators of the following form:
(:random-integer [ <range> [ <number> ] ] ) (:random-real [ <number> ] )
where <range> (default 2) indicates the range of
the integers and <number> (default infinity)
specifies how many elements are to be generated.
Derived from these basic generators, one could define several
other generators for other distributions (e.g. Gaussian).
As eager comprehensions are efficient, they can be useful for operations involving strings of bits. It could be useful to have the following comprehension:
(bitwise-ec <qualifier>* <expression>),
which constructs an integer from bits obtained as values
of <expression> in the ordering defined
by [SRFI33].
In other words, if the sequence of values is
x[0], x[1], ..., x[n-1] then
the result is x[0] + x[1] 2 + ... + x[n-1] 2^(n-1).
On the generator side, a generator of the form
(:bitwise <vars> <arg1> <arg>*)
runs through the sequence of bits obtained by appending the
binary digits of the integers <arg1> <arg>*.
It is possible to 'lazify' the eager comprehension
list-ec,
constructing a stream in the sense of [SRFI 40].
Clearly, such a comprehension (stream-ec)
is not eager at all since it only runs the loops when results are requested.
It is also possible to define a :stream-generator with
the same API as :list but running
through streams instead of lists.
For what it is worth,
the file srfi40-ec.scm implements
:stream and stream-ec and gives an example.
The implementation makes substantial use of
call-with-current-continuation to run the loop
only when necessary.
In some implementations of Scheme this may involve
considerable overhead.
Eager comprehensions can also be used to process files. However, bear in mind that an eager comprehension wants to read and process the entire file right away. Nevertheless, these generators would be useful for reading through the lines of a file or through the characters of a file:
(:lines-of-file <vars> <file>) (:chars-of-file <vars> [ (line <variable1>) ] [ (column <variable2>) ] <file>)
Here <file> is either an input port
or a string interpreted as a filename.
In a similar fashion, generators reading from sockets defined
by URLs or other communication facilities could be defined.
In the Scheme-shell Scsh it could be useful to have certain
comprehensions and generators.
Candidates for comprehensions are
begin-ec,
|-ec,
||-ec, and
&&-ec.
Concerning generators, it might be useful to have
:directory running through the
records of a directory, and maybe a
sophisticated :file-match-generator
could enumerate file record in a directory structure.
Optional variables of the generators could give
convenient access frequent components of the file records
(e.g. the filename).
Another candidate is :env to run through
the environment associations.
It is left to other authors and other SRFIs to
define a useful set of comprehensions and generators
for Scsh.
A lazy comprehension, for example stream-of in the
sense of [SRFI 40], constructs an object
representing a sequence of values.
Only at the time these values are needed that they
are actually produced.
An eager comprehension, on the other hand, is an instruction to
run through a certain sequence of values and do something with it,
for example as in do-ec.
In other words, it is nothing more sophisticated than a loop,
potentially with a more convenient notation.
This also explains why stream-of is the most
fundamental lazy comprehension, and all others can
be formulated in terms of it, whereas the most fundamental
eager comprehension is do-ec.
In principle, there are six possible orders in which the qualifiers and the expression of a comprehension can be written. We denote the different conventions with a pattern in which expr denotes the expression over which the comprehension ranges, inner denotes the generator spinning fastest, and outer denotes the generator spinning slowest. For example, [Haskell] and [Python] use [expr | outer .. inner]. (Probably with sufficient persistence, instances for any of the conventions can be found on the Internet.) In addition, there is the common mathematical notation '{f(x) | x in X}'.
It is important to understand that the notational convention does not only determine the order of enumeration but also the scope of the variables introduced by the generators. The scope of inner includes expr, and the scope of outer should include inner to allow inner generates depending on outer generators. Eventually, the choice for a particular syntactic convention is largely a matter of personal preferences. However, there are a few considerations that went into the choice made for this SRFI:
The mathematical notation is universally known and widely used. However, the mathematical notation denotes a set comprehension in which the order of the qualifiers is either irrelevant or must be deduced from the context. For the purpose of eager comprehensions as a programming language construct, the order does matter and a simple convention is a plus. For these reasons, the mathematical notation as such is undesirable, but its widespread use is in favor of [expr | inner .. outer] and [expr | outer .. inner].
It is desirable to have the scope of the variables increase into one direction, as in [expr | inner .. outer] and [outer .. inner | expr], and not change direction, as in [expr | outer .. inner] where expr is in the scope of inner but outer is not. This is even more important if the syntax in Scheme does not explicitly contain the '|'-separator.
More complicated comprehensions with several nested generators
eventually look like nested loops and Scheme always
introduces them outer .. inner as in
do and named-let.
This is in favor of
[expr | outer .. inner] and
[outer .. inner | expr].
Shorter comprehension may look more naturally the
other way around.
    
Regarding these contradicting preferences, I regard
linearity in scoping (2.) most important, followed by
readability for more complicated comprehensions (3.).
This leads to [outer .. inner | expr].
An example in Scheme-syntax is
(list-ec (: x 10) (: y x) (f x y)),
which looks acceptable to me even without similarity
to the mathematical notation.
As a downside, the convention clashes with other the
convention used in other languages (e.g. Haskell and Python).
I tried to construct a reasonably useful set of tools according to what [R5RS] specifies. Nevertheless, is the choice what to include and what to leave out eventually a matter of personal preference.
When 'packing the toolbox' I went for travelling light;
this SRFI does not include everything imaginable
or even everything useful.
I oriented myself at the standard procedures
of [R5RS],
with a few omissions and additions.
A notable omission are gcd-ec and
lcm-ec because they are one-liners,
and more severely, of questionable value in practice.
A notable addition are
fold-ec and
fold3-ec,
providing a mechanism to define lots of useful one-liners.
The other notable addition is
first-ec, which
is the fundamental 'early stopping' comprehension.
It is used to define
any?-ec and
every?-ec
which are among the most frequent comprehensions.
Concerning the generators, the same principle has been used.
Additions include :range
and friends because they are universally needed, and
:dispatched which is
primarily intended for implementing :.
For the purpose of this SRFI, every generator runs through its sequence of bindings in a well specified order, and nested generators run through the Cartesian product in the order of nested loops. The purpose of this definition is making the sequence as easily predictable as possible.
On the other hand, many mechanisms for lazy comprehensions do not specify the order in which the elements are enumerated. When it comes to infinite streams, this has the great advantage that a comprehension may interleave an inner and an outer enumeration, a method also known as 'dove-tailing' or 'diagonalizing'. Interleaving ensures that any value of the resulting stream is produced after a finite amount of time, even if one or more inner streams are infinite.
The reason for typed generators is runtime efficiency.
In fact, the code produced by :range and others
will run as fast as a hand-coded do-loop.
The primary purpose of the dispatching generator is convenience.
It comes at the price of reduced runtime performance,
both for loop iteration and startup.
-ec and :type naming?
The purpose of the :type convention is to keep
many common comprehensions down to one-liners.
In my opinion, the fundamental nature of eager comprehensions
justifies a single character naming convention.
The something-ec convention is primarily intended to
stay away from the widely used something-of.
It reduces confusion and conflict with related mechanisms.
The generators of this SRFI do two different things with a single syntactic construct: They define a sequence of values to enumerate and they specify a variable (within a certain scope) to run through that sequence. An alternative is to separate the two, for example as it has been done in SRFI 40.
The reason for combining sequence definition and enumeration for the purpose of this SRFI is threefold. Firstly, sequences of values are not explicitly represented as objects in the typed generators; the generators merely manipulate an internal state. Secondly, this SRFI aims at a most concise notation for common comprehensions and reduces syntax to the naked minimum. Thirdly, this SRFI aims at the highest possible performance for typed generators, which is achieved if the state being manipulated is represented by the loop variable itself.
(: <vars>) illegal?
It is reasonable and easy to define (: <vars>)
as (:integers <vars>),
enumerating the non-negative integers.
However, it turned out that a frequent mistake in using the
eager comprehensions is to forget either the variable
or an argument for the enumeration.
As this would lead to an infinite loop (not always
equally pleasant in interactive sessions), it is not allowed.
:sequential?
Just like :parallel
enumerates generators in
parallel, a :sequential generator could
enumerate a concatenation of several generator, starting
the next one when the previous has finished.
The reason for not having such a qualifier is
that the generators should use all the same variable name
and there is no hygienic and referentially transparent
way of enforcing this (or even knowing the variable).
let-qualifier?
It is easy to add let, let*,
and letrec as cases to <qualifier>.
This would allow more sophisticated local variables
and expressions than possible with
(:let <vars> <expression>) and
(begin <sequence>*).
In particular, a local <definition>
in the sense of [R5RS, 7.1.5.] would
be possible.
There are two reasons for not including let
and friends as qualifiers.
The first reason concerns readability.
A qualifier of the form
(let (<binding spec>*) <body>)
only makes sense if the scope of the new variables ends at the
end of the comprehension, and not already
after <body>.
The similarity with ordinary let-expressions
would be very confusing.
The second reason concerns the design rationale.
If sophisticated let-qualifiers involving
recursion or local definitions are needed, it is likely
that eager comprehensions are being overused.
In that case it might be better to define a procedure
for the task.
So including an invitation to overuse the mechanism would
be a serious violation of the
Keep It Simple and Stupid principle.
:nested generator?
The specification above defines nested
as a qualifier but :parallel
as a generator.
In particular, this makes it impossible to make parallel
generators from nested ones.
This design simply reflects an implementability limitation.
All component generators of :parallel are
transformed into :do-generators
and these can be merged into a parallel generator.
However, nested generators cannot be merged easily without
losing the type of the generator,
which would seriously hurt modularity and performance.
any?-ec eager?Yes, it is still eager because it immediately starts to run through the sequence.
In fact, the reference implementation makes sure
first-ec,
any?-ec, and
every?-ec
execute efficiently so they can be used conveniently
as in
(every?-ec (:list x my-list) (pred? x)).
:dispatched business?
It is specified above that the dispatching generator,
called :, is just a special case
of :dispatched using
a global dispatching procedure.
Alternatively, a simple fixed global mechanism to extend
: could have been used.
This is much simpler but does not support the definition
of new dispatched generators.
The purpose of :dispatched
and its utilities
(:generator-proc and
dispatch-union)
is the following.
Assume : is to be used inside a
module but it is essential that no other module can spoil it,
e.g. by installing a very slow dispatcher.
The recommended way to proceed in this case is to define a
local copy of the original dispatching generator
:,
for example with the following code
(define :my-dispatch
  (make-initial-:-dispatch) )
(define-syntax :my
  (syntax-rules (index)
    ((:my cc var (index i) arg1 arg ...)
     (:dispatched cc var (index i) :my-dispatch arg1 arg ...) )
    ((:my cc var arg1 arg ...)
     (:dispatched cc var :my-dispatch arg1 arg ...) ))),
and to use the new generator :my instead of
:.
An alternative for the dispatching mechanism as defined in
this SRFI is the use of parameter objects in the sense of
[SRFI 39].
The dispatching generator would then access a dynamically
scoped variable to find the dispatcher, allowing full
control over dispatching.
However, this approach does not solve the dilemma that it is
sometimes useful that : is global
and sometimes undesired.
The approach specified for this SRFI addresses this dilemma
by offering options.
Another alternative for dealing with the dispatching
problem is adding an optional argument to the syntax of
: through which the dispatcher
can be passed explicitly.
However, as : has variable
arity and the identifier for the variable cannot be
distinguished from any value for a dispatcher,
this is syntactically problematic.
:?
According to [R5RS, 7.1.6.] macros can only
be defined at the level of the <program> syntax.
This implies that the scope of typed generators cannot easily be
limited to local scopes.
As typed and dispatched generators go together,
there is also no strong need for a limited scope
of dispatched generators either.
Furthermore, locally extendable dispatchers are another major
headache to those trying to understand other people's code.
As defined in :dispatched,
a dispatching procedure is called with a single argument being
the list of values to dispatch on.
An alternative is to apply the dispatcher to the
list of values to dispatch on, which would be more natural in Scheme.
The reason for not using apply is a minor
improvement in efficiency.
Every time apply is used on a procedure of variable
arity, an object containing the argument list is allocated on
the heap.
As a dispatcher may call many other dispatchers, this will adds
to the overhead of dispatching, which is relevant in inner loops.
The reason for having two fold comprehensions
(fold-ec and
fold3-ec) is efficiency.
Clearly, the more general construction is
fold3-ec
as it allows individual treatment of the empty
sequence case and the singleton sequence case.
However, this comes at the price of more book-keeping
as can be seen from the
implementation example.
As the overhead is located within inner loops,
it makes sense to define another fold comprehension
for the case where the added flexibility is not needed.
This is fold-ec.
The names fold-ec and fold3-ec
have been chosen for the comprehensions in order to stay
clear any other 'fold' that may be around.
:char-range not defined by integer->char?
The definition of :char-range
specifies a sequence of adjacent characters ordered by char<=?.
The reason for not using char->integer and
integer->char is the fact that
[R5RS, 6.3.4.] leaves it to the implementation
whether the integers representing characters are consecutive or not.
In effect, this underspecification is inherited by :char-range.
Several other proposals related to the mechanism specified here exists. The following mechanisms are made for and in Scheme (or at least a specific dialect thereof):
First of all, the report [R5RS] of Scheme itself
defines two constructs for writing loops: do and
named-let.
Both constructs express a single loop (not nested),
possibly with several variables running in parallel,
based on explicit manipulation of the state variables.
For example (do ((x 0 (+ x 1))) ((= x 10)) (display x))
explicitly mentions how to obtain the next binding of x.
Richard Kelsey's "Macros for writing loops", [MWL]
are an extension to Scheme48 to simplify the formulation of loops.
The basic idea is to stick with a do-like syntax for
more sophisticated loop constructs, not necessarily manipulating
a state variable explicitly.
For example, (list* x '(1 2 3)) expresses an enumeration
of the variable x through the list (1 2 3)
without explicit state manipulation.
The iteration constructs of [MWL], named
iterate and reduce,
express a single (not nested) loop (iterate) or
comprehension (reduce) with any number of
parallel enumerations.
A most important feature of the [MWL]-concept
is a modular way to add sequence types (generators).
In effect, the addition of a new sequence type does not
require a modification of the existing macros.
This is achieved by carefully limiting the expressive
power of the loop constructs and by using the macros
in Continuation Passing Style to call other macros.
The [MWL]-concept, and its implementation,
were most influential for this SRFI.
Another related mechanism is the library of streams recently
submitted by Phil L. Bewig as [SRFI 40].
The library contains a data type to represent even
streams (both car and cdr potentially delayed) and
defines procedures for manipulating these streams.
Moreover, the macro stream-of defines a
lazy comprehension resulting in the stream of values of
an expression subject to generators and filters.
A fixed set of generators (lists, vector, string, port,
and naturally: streams) is supported; extending the
list of generators requires changing stream-of.
Nevertheless, modularity is high since it is easy to define
a procedure producing a stream object and this can be
used for enumeration.
The order of enumeration is left unspecified to allow
interleaving of generators (also refer to
above.)
Before Phil submitted his SRFIs, we had a short
discussion in which we clarified the semantic and syntactic
differences of our approaches.
It turned out that the mechanisms are sufficiently
different not to unify them.
The most important difference is the design rationale:
Phil created his library to support the stream-paradigm
in Scheme, inspired by the work done for Haskell and
other lazy languages, and intrigued by the beauty
of programming with infinite streams.
My work only aims at a convenient way of expressing
frequent patterns of loops in a compact way.
For what it is worth, section SRFI40-ec
contains a suggestion for extending the eager comprehension
mechanism for SRFI40-streams.
Phil's work on streams and lazy comprehensions in Scheme
triggered Eli Barzilay to implement a library of eager
comprehensions for PLT-Scheme, [Eli].
The mechanism implemented by Eli is in essence very
similar to the one proposed in this SRFI, and the two
efforts have been independent until recently.
Syntactically, Eli uses infix operators for generators,
whereas this SRFI is purely prefix, and Eli uses the
[expr | outer .. inner] convention
for nesting, whereas this SRFI uses the
[outer .. inner | expr]
convention.
Semantically, Eli's mechanism defines more flexible
loops than this SRFI.
Comprehensions are regarded as generalized collection
processes like fold and reduce.
The mechanism in this SRFI is more restricted with respect
to control flow (there is no general while)
and more extensive with respect to generators and
comprehensions.
Despite the strong conceptual similarity, the design
rationales are different.
This SRFI focuses on portability and modular extension,
whatever that may cost in terms of expressive power.
Finally, I would like to thank Mike Sperber for his encouragement to proceed with the SRFI and for several discussions of the matter. In particular, the dispatching mechanism evolved rapidly during discussions with Mike.
The reference implementation focuses on portability,
performance, readability and simplicity, roughly in this order.
It is written in [R5RS]-Scheme
(including macros) extended by [SRFI 23]
(error).
The reference implementation was developed
under Scheme48 (0.57),
PLT (202, 204), and
SCM (5d7).
The file ec.scm is the source of the reference implementation. It also contains comments concerning potential problems. Implementors might find the file design.scm helpful. It contains alternative implementations of certain comprehensions and generators in order to simplify tuning the implementation of this SRFI for different Scheme systems.
The file examples.scm contains a collection of examples, and some code checking their results. The purpose of most examples is detecting implementation errors, but the section 'Less artificial examples' contains a few real-world examples.
The file timing.scm contains some
code to measure an idea of performance of the comprehensions.
A hand-coded do-loop, the typed generator
(:range n),
and the dispatching generator
(: n)
are compared.
For each loop we compare the time needed
per iteration and the time needed to construct the loop (the startup delay).
As a rule of thumb, :range is as fast (or slightly faster)
as a hand-coded do-loop per iteration and needs about a
third more time for startup (due to type checking of the argument).
The dispatching generator needs about twice the time per iteration
(due to calling the generator procedure) and needs about five times
as long for startup (due to dispatching).
The file extension.scm contains examples for adding new generators and comprehensions.
| [R5RS] | Richard Kelsey, William Clinger, and Jonathan Rees (eds.): Revised(5) Report on the Algorithmic Language Scheme of 20 February 1998. Higher-Order and Symbolic Computation, Vol. 11, No. 1, September 1998. http://schemers.org/Documents/Standards/R5RS/. | 
| [MWL] | Richard Kelsey, Jonathan Rees: The Incomplete Scheme48 Reference Manual for Release 0.57 (July 15, 2001). Section "Macros for writing loops". http://s48.org/0.57/manual/s48manual_49.html | 
| [SRFI 1] | Olin Shivers: List Library. http://srfi.schemers.org/srfi-1/ | 
| [SRFI 23] | Stephan Houben: Error reporting mechanism http://srfi.schemers.org/srfi-23/ | 
| [SRFI 25] | Jussi Piitulainen: Multi-dimensional Array Primitives. http://srfi.schemers.org/srfi-25/ | 
| [SRFI 27] | Sebastian Egner: Sources of Random Bits. http://srfi.schemers.org/srfi-27/ | 
| [SRFI 33] | Olin Shivers: Integer Bitwise-operation Library. http://srfi.schemers.org/srfi-33/ | 
| [SRFI 39] | Marc Feeley: Parameter objects. http://srfi.schemers.org/srfi-39/ | 
| [SRFI 40] | Philip L. Bewig: A Library of Streams. http://srfi.schemers.org/srfi-40/ | 
| [Eli] | Eli Barzilay: Documentation for "misc.ss". 2002. http://www.cs.cornell.edu/eli/Swindle/misc-doc.html#collect | 
| [Folds] | John David Stone: Folds and reductions. Posting in relation to [SRFI 1] on 8-Jan-1999. http://srfi.schemers.org/srfi-1/mail-archive/msg00021.html | 
| [Haskell] | Simon L. Peyton Jones, John Hughes: The Haskell 98 Report 1 February 1999. Section 3.11 "List Comprehensions". http://www.haskell.org/onlinereport/exps.html#sect3.11 | 
| [Python] | Guido van Rossum, Fred L. Drake Jr. (eds.): Python Reference Manual. Section 5.2.4 "List displays". Release 2.2, December 21, 2001. http://python.org/doc/2.2/ref/lists.html | 
| [SICP] | Harold Abelson, Gerald J. Sussman, Julie Sussman: Structure and Interpretation of Computer Programs. MIT Press, 1985. http://mitpress.mit.edu/sicp/ | 
| [IFPL] | Philip Wadler: List Comprehensions (Chapter 7). In: Simon L. Peyton Jones: The Implementation of Functional Programming Languages. Prentice Hall, 1987. | 
| [Scheme48] | Richard Kelsey, Jonathan Rees: Scheme48 Release 0.57 (July 15, 2001). http://s48.org/ | 
| [SCM] | Aubrey Jaffer: SCM Scheme Implementation. Version 5d7 (November 27, 2002). http://www.swiss.ai.mit.edu/~jaffer/SCM.html | 
| [PLT] | PLT People: PLT Scheme, DrScheme Version 203. http://www.plt-scheme.org/ | 
| [Scsh] | Olin Shivers, Brian D. Carlstrom, Martin Gasbichler, Mike Sperber: Scsh Reference Manual. For scsh release 0.6.3. http://scsh.net/ | 
Copyright (C) Sebastian Egner (2003). All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.