Table of Contents
List of Tables
Orc is a programming language designed to make distributed and
concurrent programs simple and intuitive to write. Orc expresses
orchestration, a type of structured concurrency. It emphasizes the
flow of control and gives a global view of a concurrent system. Orc
is well-suited for task orchestration, a form of
concurrent programming with applications in workflow, business process
management, and web service orchestration. Orc provides constructs to
orchestrate the concurrent invocation of services while managing
time-outs, priorities, and failures of services or communication. To
learn more about Orc and run your own Orc programs, visit the website:
http://orc.csres.utexas.edu/
.
Unless otherwise noted, all material in this document pertains to the Orc language implementation version 1.0.0 .
Table of Contents
This chapter describes the Orc programming language in three steps. In Section 1.2, we discuss a small subset of Orc called Cor. Cor is a pure functional language, which has no features for concurrency, has no state, and does not communicate with external services. Cor introduces us to the parts of Orc that are most familiar from existing programming languages, such as arithmetic operations, variables, conditionals, and functions.
In Section 1.3, we consider Orc itself, which in addition to Cor, comprises external services and combinators for concurrent orchestration of those services. We show how Orc interacts with these external services, how the combinators can be used to build up complex orchestrations from simple base expressions, and how the functional constructs of Cor take on new, subtler behaviors in the concurrent context of Orc.
In Section 1.4, we discuss some additional features of Orc that extend the basic syntax. These are useful for creating large-scale Orc programs, but they are not essential to the understanding of the language.
In this section we introduce Cor, a pure functional subset of the Orc language. Users of functional programming languages such as Haskell and ML will already be familiar with many of the key concepts of Cor.
A Cor program is an expression. Cor expressions are built up recursively from smaller expressions. Cor evaluates an expression to reduce it to some simple value which cannot be evaluated further, for example a list of numbers or a Boolean truth value. This value is called the result of the expression.
In the following subsections we introduce the concepts of Cor. First, we talk about
simple constants, such as numbers and truth values, and the operations that we can perform
on those values. Then we introduce conditionals (if then else). Then we
introduce variables and binding, as a way to give a name to the value of an expression.
After that, we talk about constructing data structures, and examining those structures
using patterns. Lastly, we introduce functions.
Table 1.1 describes the syntax of Cor. Each part of the syntax is explained in a subsequent section.
Table 1.1. Syntax of the Functional Subset of Orc (Cor)
| E | ::= | Expression | ||
| C | constant value | |||
| | | X | variable | ||
| | |
( E , ... , E )
| tuple | ||
| | |
[ E , ... , E ]
| list | ||
| | | X( E , ... , E )
| function call | ||
| | | E op E | operator | ||
| | |
if E then E else E | conditional | ||
| | |
lambda ( P , ... , P ) = E | closure | ||
| | | D E | scoped declaration | ||
| C | ::= | Boolean | Number | String | Constant | |
| X | ::= | Identifier | Identifier | |
| D | ::= | Declaration | ||
val P = E | value declaration | |||
| | |
def X( P , ... , P ) = E | function declaration | ||
| P | ::= | Pattern | ||
| X | variable | |||
| | | C | constant | ||
| | |
_
| wildcard | ||
| | |
( P , ... , P )
| tuple | ||
| | |
[ P , ... , P ]
| list |
Where relevant, syntactic constructs are ordered by precedence, from highest to lowest. For example, among expressions, calls have higher precedence than operators, which in turn have higher precedence than conditionals.
The simplest expression one can write is a constant. It evaluates trivially to that constant value.
Cor has three types of constants, and thus for the moment three types of values:
true and false 5, -1, 2.71828, ... "orc", "ceci n'est pas une |"Cor has a standard set of arithmetic, logical, and comparison operators. As in most other programming languages, they are written in the usual infix style. They have Java-like operator precedence, which can be overridden by adding parentheses.
Examples
1+2 evaluates to 3.(98+2)*17 evaluates to 1700.4 = 20 / 5 evaluates to true.3-5 >= 5-3 evaluates to false.true && (false || true) evaluates to true."leap" + "frog" evaluates to "leapfrog".Here is the full set of operators that Cor supports:
Table 1.2. Operators of Cor
| Arithmetic | Comparison | Logical | String | ||||
|---|---|---|---|---|---|---|---|
+
| addition |
=
| equality |
&&
| logical and |
+
| concatenation |
-
| subtraction |
/=
| inequality |
||
| logical or | ||
*
| multiplication |
<
| less than |
~
| logical not | ||
/
| division |
>
| greater than | ||||
%
| modulus |
<=
| less than or equal | ||||
**
| exponent |
>=
| greater than or equal | ||||
There is also a unary negation operator, written -, for example -(2 ** 5).
The = operator can compare values of any type. Values of different type are always unequal; for example,
10 = true evaluates to false.
Numbers with no decimal part, such as 3, are treated as integers. Arithmetic operators with two integer arguments will perform
an integer operation and return an integer result; for example, 5 / 2 performs integer division and evaluates to 2.
However, if either argument to an operator has a decimal part (even if it is trivial, as in 3.0), the other argument will
be promoted, and a decimal operation will be peformed. For example, 5 / 2.0 and 5.0 / 2 both perform decimal
division and evaluate to 2.5.
There are situations where an expression evaluation is stuck, because it is attempting to perform
some impossible operation and cannot compute a value. In that case, the expression is silent.
An expression is also silent if it depends on the result of a silent subexpression. For example, the following
expressions are silent: 10/0, 6 + false, 3 + 1/0, 4 + true = 5.
Cor is a dynamically typed language. A Cor implementation does not statically check the type correctness of an expression; instead, an expression with a type error is simply silent when it is evaluated.
A conditional expression in Cor is of the form
if E then F else G.
Its meaning is similar to that in other languages: the value
of the expression is the value of F if and only if E evaluates to true,
or the value of G if and only if E evaluates to false. Note that G is not
evaluated at all if E evaluates to true, and similarly F is not evaluated
at all if E evaluates to false. Thus, for example, evaluation of
if true then 2+3 else 1/0 does not evaluate 1/0;
it only evaluates 2+3.
Unlike other languages, expressions in Cor may be silent. If E is silent, then the entire expression is silent. And if E evaluates to true but F is silent (or if E evaluates to false and G is silent) then the expression is silent.
Note that conditionals have lower precedence than any of the operators.
For example, if false then 1 else 2 + 3 is equivalent to
if false then 1 else (2 + 3), not (if false then 1 else 2) + 3.
The behavior of conditionals is summarized by the following table (v denotes a value).
Examples
if true then 4 else 5 evaluates to 4.if 2 < 3 && 5 < 4 then "blue" else "green" evaluates to "green".if true || "fish" then "yes" else "no" is silent.if false || false then 4+5 else 4+true is silent.if 0 < 5 then 0/5 else 5/0 evaluates to 0.
A variable can be bound to a value. A declaration binds one or
more variables to values. The simplest form of declaration is val, which
evaluates an expression and binds its result to a variable. Declarations follow the rules of
lexical scoping.
val x = 1 + 2 val y = x + xThese declarations bind variable
x to 3 and variable y to 6.
If the expression on the right side of a val is silent, then the variable is not bound, but
evaluation of other declarations and expressions continues. If an evaluated expression depends on that
variable, that expression is silent.
val x = 1/0 val y = 4+5 if false then x else yEvaluation of the declaration
val y = 4+5 and the expression if false then x else y
may continue even though x is not bound. The expression evaluates to 9.
Cor supports two basic data structures, tuples and lists.
A tuple expression is a comma-separated sequence of at least two expressions, enclosed by parentheses. Each expression is evaluated; the value of the whole tuple expression is a tuple containing each of these values in order. If any of the expressions is silent, then the whole tuple expression is silent.
Examples
(1+2, 7) evaluates to (3,7). ("true" + "false", true || false, true && false) evaluates to ("truefalse", true, false).(2/2, 2/1, 2/0) is silent.A list expression is a comma-separated sequence of expressions enclosed by square brackets. It may be of any length, including zero. Each expression is evaluated; the value of the whole list expression is a list containing each of these values in order. If any of the expressions is silent, then the whole list expression is silent.
Examples
[1,2+3] evaluates to [1,5]. [true && true] evaluates to [true].[] evaluates vacuously to [], the empty list.[5, 5 + true, 5] is silent.
There is also a concatenation (cons) operation on lists,
written F:G, where F and G are expressions. Its result is a new list whose first element is the
value of F and whose remaining elements are the list value of G. The : operator is right associative,
so F:G:H is F:(G:H).
Examples
(1+3):[2+5,6] evaluates to [4,7,6].2:2:5:[] evaluates to [2,2,5].t is bound to [3,5]. Then 1:t evaluates to [1,3,5].2:3 is silent, because 3 is not a list.We have seen how to construct data structures. But how do we examine them, and use them? We use patterns.
A pattern is a powerful way to bind variables. When writing val declarations, instead of just
binding one variable, we can replace the variable name with a more complex pattern that follows the structure of the
value, and matches its components. A pattern's structure is called its shape; a
pattern may take the shape of any structured value. A pattern can hierarchically match a value, going deep into its
structure. It can also bind an entire structure to a variable.
Examples
val (x,y) = (2+3,2*3) binds x to 5 and y to 6.val [a,b,c] = ["one", "two", "three"] binds a to "one",
b to "two", and c to "three".
val ((a,b),c) = ((1, true), (2, false)) binds a to 1, b to true,
and c to (2,false).
Patterns are linear; that is, a pattern may mention a variable name at most once.
For example, (x,y,x) is not a valid pattern.
Note that a pattern may fail to match a value, if it does not have the same shape as that value. In a val
declaration, this has the same effect as evaluating a silent expression. No variable in the pattern is bound, and
if any one of those variables is later evaluated, it is silent.
It is often useful to ignore parts of the value that are not relevant. We can use the
wildcard pattern, written _, to do this; it matches any shape and binds no
variables.
Examples
val (x,_,_) = (1,(2,2),[3,3,3]) binds x to 1.val [[_,x],[_,y]] = [[1,3],[2,4]] binds x to 3 and y to 4.
Like most other programming languages, Cor provides the capability to define functions,
which are expressions that have a defined name, and have some number of parameters. Functions are declared
using the keyword def, in the following way.
def add(x,y) = x+yThe expression on the right of the
= is called the body of the function.
After defining the function, we can call it. A call looks just like the left side of the declaration except that the variable names (the formal parameters) have been replaced by expressions (the actual parameters).
To evaluate a call, we treat it as a sequence of val declarations associating the formal
parameters with the actual parameters, followed by the body of the function.
{- Evaluation of add(1+2,3+4) -} val x = 1+2 val y = 3+4 x+y
Examples
add(10,10*10) evaluates to 110.add(add(5,3),5) evaluates to 13.Notice that the evaluation strategy of functions allows a call to proceed even if some of the actual parameters are silent, so long as the values of those actual parameters are not used in the evaluation of the body.
def cond(b,x,y) = if b then x else y cond(true, 3, 5/0)This evaluates to
3 even though 5/0 is silent, because y is not
needed.
A function definition or call may have zero arguments, in which case we write () for the arguments.
def Zero() = 0
Functions can be recursive; that is, the name of a function may be used in its own body.
def sumto(n) = if n < 1 then 0 else n + Sumto(n-1)Then,
sumto(5) evaluates to 15.
Mutual recursion is also supported.
def even(n) = if (n > 0) then odd(n-1) else if (n < 0) then odd(n+1) else true def odd(n) = if (n > 0) then even(n-1) else if (n < 0) then even(n+1) else falseThere is no special keyword for mutual recursion; any contiguous sequence of function declarations is assumed to be mutually recursive.
Functions are actually values, just like any other value. Defining a function creates a special value called a closure; the name of the function is a variable and its bound value is the closure. Thus, a closure can be put into a data structure, or bound to some other variable, just like any other value.
def a(x) = x-3 def b(y) = y*4 val funs = (a,b)
Like any other value, a closure can be passed as an argument to another function. This means that Cor supports higher-order functions.
def onetwosum(f) = f(1) + f(2) def triple(x) = x * 3Then,
onetwosum(triple) is triple(1) + triple(2), which is 1 * 3 + 2 * 3 which evaluates to 9.
Since all declarations (including function declarations) in Cor are lexically scoped, these closures are lexical closures. This means that when a closure is created, if the body of the function contains any variables other than the formal parameters, the bindings for those variables are stored in the closure. Then, when the closure is called, the evaluation of the function body uses those stored variable bindings.
Sometimes one would like to create a closure directly, without bothering to give it a name.
There is a special keyword lambda for this purpose. By writing a function
definition without the keyword def and replacing the function name with
the keyword lambda, that definition becomes an expression which evaluates to a closure.
def onetwosum(f) = f(1) + f(2) onetwosum( lambda(x) = x * 3 ) {- identical to: def triple(x) = x * 3 onetwosum(triple) -}Then,
onetwosum( lambda(x) = x * 3 ) evaluates to 9.
Since a function defined using lambda has no name, it is not possible to define
a recursive function in this way. Only def can create a recursive function.
The combination of functions and pattern matching offers a powerful capability: clausal definition of functions. We can define expressions which execute different code depending on the structure of their arguments.
Here's an example.
def sum([]) = 0 def sum(h:t) = h + sum(t)
sum(l) publishes the sum of the numbers in the list l. It has two clauses:
one which matches the empty list, and one which matches any nonempty list. If its argument is an empty
list, it returns 0, the appropriate sum for an empty list. If the argument is a nonempty list, it adds
the first element of that list to the sum of all of the other elements. In this way, it recursively finds
the sum of the list.
A function may have multiple clauses, each of which has a sequence of patterns to match each argument, and a body expression. Naturally, all clauses of a function must have the same number of arguments. Any contiguous sequence of definitions with the same name and different arguments is interpreted as a clausal definition, where each individual declaration is a clause of the larger function.
When the function is called, the clauses are tried in the order in which they appear until a match is found. If no clause matches, the call remains silent.
We allow a new form of pattern which is very useful in clausal definition of functions: a constant pattern. A constant pattern is a match only for the same constant value. We can use this to define the "base case" of a recursive function in a straightforward way.
{- Fibonacci numbers -} def fib(0) = 1 def fib(1) = 1 def fib(n) = if (n < 0) then 0 else fib(n-1) + fib(n-2)This definition of the Fibonacci function is straightforward, but slow, due to the repeated work in recursive calls to
fib. We can define a linear-time version, again with the help of pattern matching:
{- Alternate definition of the Fibonacci function -} {- A helper function: find the pair (Fibonacci(n-1), Fibonacci(n)) -} def H(0) = (1,1) def H(n) = val (x,y) = H(n-1) (y,x+y) def fib(n) = if (n < 0) then 0 else ( val (x,_) = H(n) x )
As a more complex example of matching, consider the following function which takes
a list argument and returns a new list containing only the first n
elements of the argument list.
def take(0,_) = [] def take(n,h:t) = if (n > 0) then h:(take(n-1,t)) else []
Mutual recursion and clausal definitions are allowed to occur together. Here are two functions,
stutter and mutter, which are each mutually recursive, and each have
multiple clauses. stutter(l) returns l with every odd element repeated.
mutter(l) returns l with every even element repeated.
def stutter([]) = [] def stutter(h:t) = h:h:mutter(t) def mutter([]) = [] def mutter(h:t) = h:stutter(t)
stutter([1,2,3]) evaluates to [1,1,2,3,3].
Clauses of mutually recursive functions may also be interleaved, to make them easier to read.
def even(0) = true def odd(0) = false def even(n) = odd(if n > 0 then n-1 else n+1) def odd(n) = even(if n > 0 then n-1 else n+1)
Cor has two kinds of comments.
A line which begins with two dashes (--), preceded only by whitespace, is a single line comment.
The region from the two dashes to the next encountered newline, inclusive, is ignored.
-- This is a single line comment. -- This is also a single line comment.
Multi-line comments are enclosed by matching braces of the form {- -}.
Multi-line comments may be nested. They may appear anywhere, even in the middle of an expression.
{- This is a multiline comment. -} {- Multiline comments {- can be nested -} -} {- They may appear anywhere, -} 1 + {- even in the middle of an expression. -} 2 + 3
Cor is a pure declarative language. It has no state, since variables are bound at most once and cannot be reassigned. Evaluation of an expression results in at most one value. It cannot communicate with the outside world except by producing a value. The full Orc language transcends these limitations by incorporating the orchestration of external services. We introduce the term site to denote an external service which can be called from an Orc program.
As in Cor, an Orc program is an expression; Orc expressions are built up recursively from smaller expressions. Orc is a superset of Cor, i.e., all Cor expressions are also Orc expressions. Orc expressions are executed, rather than evaluated; an execution may call external services and publish some number of values (possibly zero). Different executions of the same expression may have completely different behaviors; they may call different services, may receive different responses from the same site, and may publish different values. Expressions in the functional subset, though, will display the same behavior in all executions.
In the following sections we discuss the features of Orc. First, we discuss how Orc communicates with external services. Then we introduce Orc's concurrency combinators, which combine expressions into larger orchestrations and manage concurrent executions. We have already discussed the functional subset of Orc in our coverage of Cor, but we reprise some of those topics; some Cor constructs exhibit new behaviors in the concurrent, stateful context of Orc.
The following figure summarizes the syntax of Orc as an extension of the syntax of Cor. The original Cor grammar rules are abbreviated by ellipses (...).
Table 1.3. Basic Syntax of Orc
| D | ::= | ... | Declaration | |
| | |
site X = address
| site declaration | ||
| C | ::= | ... | Constant | |
| | |
signal
| signal value | ||
| E | ::= | ... | Expression | |
| | |
stop
| silent expression | ||
| | | E >P> E | sequential combinator | ||
| | | E | E | parallel combinator | ||
| | | E <P< E | pruning combinator | ||
| | | E ; E | otherwise combinator |
An Orc expression may be a site call. Sites are called using the same syntax as a function call, but with a slightly different meaning. Sites are introduced and bound to variables by a special declaration.
Suppose that the variable Google is bound to a site which invokes the
Google search engine service in "I'm Feeling Lucky" mode. A call to Google
looks just like a function call. Calling Google requests the URL of the top
result for the given search term.
{- Get the top search result for "computation orchestration" -} Google("computation orchestration")Once the Google search service determines the top result, it sends a response. The site call then publishes that response. Note that the service might not respond: Google's servers might be down, the network might be down, or the search might yield no result URL.
A site call sends only a single request to an external service and receives at most one response. These restrictions have two important consequences. First, all of the information needed for the call must be present before contacting the service. Thus, site calls are strict; all arguments must be bound before the call can proceed. If any argument is silent, the call never occurs. Second, a site call publishes at most one value, since at most one response is received.
A call to a site has exactly one of the following effects:
In the last two cases, the site call is said to be silent. However, unlike a silent expression in Cor, a silent site call in Orc might perform some meaningful computation or communication; silence does not necessarily indicate an error. Since halted site calls and pending site calls are both silent, they cannot usually be distinguished from each other; only the otherwise combinator can tell the difference.
A site is a value, just like an integer or a list. It may be bound to a variable, passed as an argument, or published by an execution, just like any other value.
{- Create a search site from a search engine URL, bind the variable Search to that site, then use that site to search for a term. -} val Search = SearchEngine("http://www.google.com/") Search("first class value")
A site is sometimes called only for its effect on the external world; its return
value is irrelevant. Many sites which do not need to return a meaningful value will
instead return a signal: a special value which carries no
information (analogous to the unit value () in ML). The signal value
can be written as signal within Orc programs.
{- Use the 'println' site to print a string, followed by a newline, to an output console. The return value of this site call is a signal. -} println("Hello, World!")
A site declaration makes some service available as a site and
binds it to a variable. The service might be an object in the host language
(e.g. a class instance in Java), or an external service on the web which is accessed
through some protocol like SOAP or REST, or even a primitive operation like addition.
Many useful sites are already defined in the Orc standard library, documented
in Appendix B. For now, we present a simple
type of site declaration: using an object in the host language as a site.
The following example instantiates a Java object to be used as a site in an Orc program, specifically a Java object which provides an asynchronous buffer service. The declaration uses a fully qualified Java class name to find and load a class, and creates an instance of that class to provide the service.
{- Define the Buffer site -} site Buffer = orc.lib.state.Buffer
Orc has four combinators: parallel, sequential, pruning, and otherwise. A combinator forms an expression from two component expressions. Each combinator captures a different aspect of concurrency. Syntactically, the combinators are written infix, and have lower precedence than operators, but higher precedence than conditionals or declarations.
Orc's simplest combinator is |, the parallel combinator. Orc executes
the expression F | G, where F and G are Orc expressions, by executing
F and G concurrently. Whenever F or G communicates with a service or publishes a value,
F | G does so as well.
-- Publish 1 and 2 in parallel 1 | 1+1
{- Access two search sites, Google and Yahoo, in parallel. Publish any results they return. Since each call may publish a value, the expression may publish up to two values. -} Google("cupcake") | Yahoo("cupcake")
The parallel combinator is fully associative: (F | G) | H and
F | (G | H) and F | G | H are all
equivalent.
It is also commutative: F | G is equivalent to G | F.
-- Publish 1, 2, and 3 in parallel 1+0 | 1+1 | 1+2
Now that we have expressions which publish multiple values, what can we do with those
publications? The sequential combinator, written F >x> G,
combines the expression F, which may publish some values, with another expression G,
which will use the values as they are published; x transmits the values
from F to G.
The execution of F >x> G starts by executing F. Whenever F publishes a value,
a new copy of G is executed in parallel with F (and with any previous copies of G); in that
copy of G, variable x is bound to the value published by F. Values published by copies
of G are published by the whole expression, but the values published by F are not published
by the whole expression; they are consumed by the variable binding.
-- Publish 1 and 2 in parallel (0 | 1) >n> n+1
-- Publish 3 and 4 in parallel 2 >n> (n+1 | n+2)
-- Publish 0, 1, 2 and 3 in parallel (0 | 2) >n> (n | n+1)
-- Prepend the site name to each published search result -- The cat site concatenates any number of arguments into one string Google("cupcake") >s> cat("Google: ", s) | Yahoo("cupcake") >s> cat("Yahoo: ", s)
The sequential combinator may be written as F >P> G,
where P is a pattern instead of just a variable name. Any value published by F is matched against the
pattern P. If this match is successful, a new copy of G is started with all of the bindings from the match.
Otherwise, the published value is simply ignored, and no new copy of G is executed.
-- Publish 3, 6, and 9 in arbitrary order. (3,6,9) >(x,y,z)> ( x | y | z )
-- Filter out values of the form (_,false) ( (4,true) | (5,false) | (6,true) ) >(x,true)> x -- Publishes 4 and 6
We may also omit the variable entirely, writing >> . This
is equivalent to using a wildcard pattern: >_>
We may want to execute an expression just for its effects, and hide all of its publications.
We can do this using >> together with the special expression
stop, which is always silent.
{- Print two strings to the console, but don't publish the return values of the calls. -} ( println("goodbye") | println("world") ) >> stop
The sequential combinator is right associative: F >x> G >y> H
is equivalent to F >x> (G >y> H). It has higher
precedence than the parallel combinator: F >x> G | H is equivalent to
(F >x> G ) | H.
The right associativity of the sequential combinator makes it easy to bind variables in sequence and use them together.
{- Publish the cross product of {1,2} and {3,4}: (1,3), (1,4), (2,3), and (2,4). -} (1 | 2) >x> (3 | 4) >y> (x,y)
The pruning combinator, written F <x< G, allows us to block
a computation waiting for a result, or terminate a computation. The execution of
F <x< G starts by executing F and G in parallel. Whenever
F publishes a value, that value is published by the entire execution. When G publishes
its first value, that value is bound to x in F, and then the execution of G is immediately
terminated. A terminated expression cannot call any sites or
publish any values.
During the execution of F, any part of the execution that depends on x
will be suspended until x is bound (to the first value published by G). If G
never publishes a value, that part of the execution is suspended forever.
-- Publish either 5 or 6, but not both x+2 <x< (3 | 4)
-- Query Google and Yahoo for a search result -- Print out the result that arrives first; ignore the other result println(result) <result< ( Google("cupcake") | Yahoo("cupcake") )
Though a terminated execution may not make any new calls, the calls that it has already made will continue normally; their responses are simply ignored. This may have surprising consequences when a call has side effects, as in the following example.
{- This example actually prints both "true" and "false" to the console, regardless of which call responds first. -} stop <x< println("true") | println("false")Both of the
println calls are initiated before either one of
them publishes a value and terminates the expression. Once the expression
is terminated, no new calls occur, but the other println call
still proceeds and still has the effect of printing its message to the
console.
The pruning combinator may include a full pattern P instead of just a variable name. Any value published by G is matched against the pattern P. If this match is successful, then G terminates and all of the bindings of the pattern P are made in F. Otherwise, the published value is simply ignored and G continues to execute.
-- Publish either 9 or 25, but not 16. x*x <(x,true)< ( (3,true) | (4,false) | (5,true) )
Note that even if (4,false) is published before (3,true) or (5,true),
it is ignored. The right side continues to execute and will publish one of (3,true) or
(5,true).
The pruning combinator is left associative: F <x< G <y< H
is equivalent to (F <x< G) <y< H. It has lower
precedence than the parallel combinator: F <x< G | H is equivalent to
F <x< (G | H).
Orc has a fourth concurrency combinator: the otherwise combinator,
written F ; G. The execution of F ; G proceeds as follows.
First, F is executed. If F completes, and has not published any values,
then G executes. If F did publish one or more values, then G is ignored. The publications of
F ; G are those of F if F publishes, or those of G otherwise.
We determine when an expression completes using the following rules.
| G completes when its subexpressions F and G have both completed. >x> G completes when F has completed and all instantiated copies of G have completed. <x< G completes when F has completed, and G has either completed or
published a value. Also, if G completes without publishing a value, then all expressions in F which
use x also complete, since they will never be able to proceed.; G completes either when F has published a value and subsequently completed,
or when F and G have both completed.stop completes immediately.
The otherwise combinator is fully associative, so F ; G ; H and
(F ; G) ; H and F ; (G ; H) are
all equivalent. It has lower precedence than the other three combinators.
The otherwise combinator was not present in the original formulation of the Orc concurrency
calculus; it has been added to support computation and iteration over strictly finite data.
Sequential programs conflate the concept of producing a value with the concept of completion.
Orc separates these two concepts; variable binding combinators like >x>
and <x< handle values, whereas ; detects the completion
of an execution.
Some Cor expressions have new behaviors in the context of Orc, due to the introduction of concurrency and of sites.
The arithmetic, logical, and comparison operators are actually calls to sites, simply
written in infix style with the expected operator symbols. For example, 2+3
is actually (+)(2,3), where (+) is a primitive site provided
by the language itself. All of the operators can be used directly as sites in this way;
the name of the site is the operator enclosed by parentheses, e.g. (**),
(>=), etc. Negation (unary minus) is named (0-).
The conditional expression if E then F else G
is actually a derived form based on a site named if. The if
site takes a boolean argument; it returns a signal if that argument is true,
or remains silent if the argument is false.
if E then F else G is equivalent
to ( if(b) >> F | if(~b) >> G) <b< E.
When the else branch of a conditional is unneeded, we can write if F then G,
with no else branch. This is equivalent to if(E) >> F.
The declaration val x = G, followed by expression F, is
actually just a different way of writing the expression F <x< G.
Thus, val shares all of the behavior of the pruning combinator,
which we have already described. (This is also true when a pattern is used instead
of variable name x).
The execution of an Orc expression may publish many values. What does such an expression
mean in a context where only one value is expected? For example, what does 2 + (3 | 4)
publish?
The specific contexts in which we are interested are as follows (where E is any Orc expression):
| E op E | operand |
if E then ... | conditional test |
X( ... , E , ... )
| call argument |
( ... , E , ... )
| tuple element |
[ ... , E , ... ]
| list element |
Whenever an Orc expression appears in such a context, it executes until it publishes its first value, and then it is terminated. The published value is used in the context as if it were the result of evaluating the expression.
-- Publish either 5 or 6 2 + (3 | 4)
-- Publish exactly one of 0, 1, 2 or 3 (0 | 2) + (0 | 1)
To be precise, whenever an Orc expression appears in such a context, it is treated as if it was
on the right side of a pruning combinator, using a fresh variable name to fill in the hole.
Thus, C[E] (where E is the Orc expression and C is the context) is equivalent to the expression
C[x] <x< E.
The body of a function in Orc may be any Orc expression; thus, function bodies in Orc are executed rather than evaluated, and may engage in communication and publish multiple values.
A function call in Orc, as in Cor, binds the values of its actual parameters to its formal parameters, and then executes the function body with those bindings. Whenever the function body publishes a value, the function call publishes that value. Thus, unlike a site call, or a pure functional Cor call, an Orc function call may publish many values.
-- Publish all integers in the interval 1..n, in arbitrary order. def range(n) = if (n > 0) then (n | range(n-1)) else stop -- Publish 1, 2, and 3 in arbitrary order. range(3)
In the context of Orc, function calls are not strict. When a function call executes, it begins to execute the function body immediately, and also executes the argument expressions in parallel. When an argument expression publishes a value, it is terminated, and the corresponding formal parameter is bound to that value in the execution of the function body. Any part of the function body which uses a formal parameter that has not yet been bound suspends until that parameter is bound to a value.
Orc is designed to communicate with the external world, and one of the
most important characteristics of the external world is the passage of
time. Orc implicitly interacts with the passage of time by calling external
services which take time to respond. However, Orc can also explicitly
wait for some amount of time, using the special site Rtimer.
The site Rtimer is a relative timer. It takes as an argument
a number of milliseconds to wait. It waits for exactly that amount of time,
and then responds with a signal.
-- Print "red", wait for 3 seconds (3000 ms), and then print "green" println("red") >> Rtimer(3000) >> println("green") >> stop
The following example defines a metronome, which publishes a signal once
every t milliseconds, indefinitely.
def metronome(t) = signal | Rtimer(t) >> metronome(t)
We can also use Rtimer together with the pruning combinator
to enforce a timeout.
{- Publish the result of a Google search. If it takes more than 5 seconds, time out. -} result <result< ( Google("impatience") | Rtimer(5000) >> "Search timed out.")We present many more examples of programming techniques using real time in Chapter 2.
In this section we introduce some advanced features of Orc. These include curried function definitions and curried calls, writing an arbitrary expression as the target of a call, a special syntax for writing calls in an object-oriented style, extensions to pattern matching, and new forms of declarations, and ML-style datatypes.
Table 1.4. Advanced Syntax of Orc
| E | ::= | ... | Expression | |
| | | E G+ | generalized call | ||
| G | ::= | Argument group | ||
( E , ... , E ) | curried arguments | |||
| | | .
field | field access | ||
| P | ::= | ... | Pattern | |
| | | P as X | as pattern | ||
| | |
=X | equality pattern | ||
| | | X( P , ... , P )
| datatype pattern | ||
| D | ::= | ... | Declaration | |
| | |
class X = classname
| class declaration | ||
| | |
type X = DT | ... | DT | datatype declaration | ||
| | |
include " filename "
| inclusion | ||
| DT | ::= | X(_, ... ,_) | Datatype |
In many object-oriented programming languages, one calls a method or accesses a field of an
object using the dot operator; for example, obj.m() calls the method
m of the object obj.
There is a special kind of site call in Orc which serves a similar purpose. One may write
x.msg, for any identifiers x and msg. This treats
the value bound to x as a site, and calls it with a special
message value msg.
If the site understands the message msg (for example, if x is
bound to a Java object with a field called msg), the site interprets the message
and responds with some appropriate value. If the site does not
understand the message sent to it, it does not respond, and no publication occurs.
If x cannot be interpreted as a site, no call is made.
Typically this capability is used so that sites may be syntactically treated like objects,
with multiple methods and fields. For example, a channel c might understand the messages
get and put, to get values from and put values on that channel,
respectively. Such calls would be written c.get(), or c.put(6).
A call such as c.put(6) actually occurs in two steps. First c.put sends the message
put to the site c; this publishes a site whose only purpose
is to put values on the channel. Next, that site is called on the argument
6, sending 6 on the channel. Readers familiar with functional programming
will recognize this technique as currying.
It is sometimes useful to stage the arguments to a function; that is, rather than writing a function on two arguments, one instead writes a function on one argument which returns a function taking the second argument and performing the remainder of the evaluation.
This technique is known as currying and it is common in functional programming languages. We can write curried functions using closures. Suppose we want to define a curried addition function on two arguments, and later apply that function to the arguments 3 and 4. We could write such a program in the following way:
def Sum(a) = ( lambda(b) = a+b ) val f = Sum(3) f(4)This defines a function
Sum which, given an argument a, creates a function
which will take an argument b and add a to b. It then creates the
function which adds 3 to its argument, binds that to f, and then invokes
f on 4 to yield 3+4.
When defining a curried function, we have abstracted it in two steps, and when applying it we have written two separate calls. However, this is verbose and not very clear. Orc has a special syntax for curried function definitions and curried applications that will simplify both of these steps. Function definitions may have multiple argument sequences; they are enclosed in parentheses and concatenated. Curried function calls chain together multiple applications in a similar way. Here is the previous program, written in this simplified syntax:
def Sum(a)(b) = a+b Sum(3)(4)Naturally, this syntax is backwards compatible; e.g. both of the following programs are also equivalent:
def Sum(a) = ( lambda(b) = a+b ) Sum(3)(4)
def Sum(a)(b) = a+b val f = Sum(3) f(4)
Clauses of a function may be defined with both curried arguments and patterns in arguments. However, the combination of these two features can produce unintuitive behavior. Consider the following clausal, curried function definition:
def sum(x)(0) = x def sum(x)(y) = x + y sum(2)(3)One might expect
sum(2)(3) to evaluate to 5. However, this is not
the case; instead sum(2)(3) remains silent due to a pattern matching failure.
This is because the compiler always expands curried definitions before considering patterns,
so the above program is in fact equivalent to:
def sum(x) = ( lambda(0) = x ) def sum(x) = ( lambda(y) = x + y ) sum(2)(3)Now the problem becomes apparent. The first clause will always match when the function is applied to its first argument, and then only one clause is available when the second argument is available: the clause with succeeds only on
0.
Therefore, use caution when writing function definitions that use both currying and clauses.
In taking apart a value with a pattern, it is often useful to capture intermediate results.
val (a,b) = ((1,2),(3,4)) val (ax,ay) = a val (bx,by) = bWe can use the
as keyword to simplify this program fragment, giving a name to an entire
sub-pattern. Here is an equivalent version of the above code.
val ((ax,ay) as a, (bx,by) as b) = ((1,2),(3,4))
When a variable occurs in a pattern, it is bound to the value being matched against that pattern. What
if instead we would like to use the value of a bound variable as a test pattern, in a way similar to
a literal value like 0?
We can use an equality pattern to do this. An equality pattern is a variable name preceded by =.
Consider the following definition, which compares its argument to the value bound to variable x,
returning true if they are equal and false otherwise.
def isx(=x) = true def isx(_) = false
The equality pattern becomes much more useful when it is embedded within other patterns. For example, consider
this definition withz. If one of the arguments passed to it is equal to z, then the function
returns its other argument. Otherwise it remains silent.
def withz(=z,y) = y def withz(x,=z) = x
We have seen Orc's predefined data structures: tuples and lists. Orc also provides the capability for programmers to define their own data structures, using a feature adopted from the ML/Haskell language family called datatypes (also called variants or tagged sums).
Datatypes are defined using the type declaration:
type Tree = Node(_,_,_) | Empty()
This declaration defines two new sites named Node and Empty.
Node takes three arguments, and publishes a tagged value
wrapping those arguments. Empty takes no arguments and does the same.
Once we have created these tagged values, we use a new pattern called a datatype pattern to match them and unwrap the arguments:
type Tree = Node(_,_,_) | Empty() {- Build up a small binary tree -} val l = Node(Empty(), 0, Empty()) val r = Node(Empty(), 2, Empty()) val t = Node(l,1,r) {- And then match it to extract its contents -} t >Node(l,j,r)> l >Node(_,i,_)> r >Node(_,k,_)> ( i | j | k )
One pair of datatypes is so commonly used that it is already predefined in the standard library:
Some(_) and None(). These are used as return values for calls that
need to distinguish between successfully returning a value (Some(v)), and successfully
completing but having no meaningful value to return (None()). For example, a lookup
function might return Some(result) if it found a result, or return None()
if it successfully performed the lookup but found no suitable result.
When Orc is run on top of an object-oriented programming language, classes
from that language may be used as sites in Orc itself, via the class
declaration.
{- Use the String class from Java's standard library as a site -} class String = java.lang.String val s = String("foo") s.concat("bar")
This program binds the variable String to Java's String class.
When it is called as a site, it constructs a new instance of String, passing
the given arguments to the constructor.
This instance of String is a Java object; its methods are called and its fields
are accessed using the dot (.) notation, just as one would expect
in Java. For complete details of how Orc interacts with Java, see
Java Integration.
It is often convenient to group related declarations into units that can be
shared between programs. The include declaration offers a simple
way to do this. It names a source file containing a sequence of Orc declarations;
those declarations are incorporated into the program as if they had textually
replaced the include declaration. An included file may itself contain
include declarations.
{- Contents of fold.inc -} def foldl(f,[],s) = s def foldl(f,h:t,s) = foldl(f,t,f(h,s)) def foldr(f,l,s) = foldl(f,rev(l),s)
{- This is the same as inserting the contents of fold.inc here -} include "fold.inc" def sum(L) = foldl(lambda(a,b) = a+b, L, 0) sum([1,2,3])
Note that these declarations still obey the rules of lexical scope. Also, Orc does not detect shared declarations; if the same file is included twice, its declarations occur twice.
Table of Contents
In Chapter 1, we described the syntax and semantics of the Orc language. Now, we turn our attention to how the language is used in practice, with guidelines on style and programming methodology, including a number of common concurrency patterns.
In this section we suggest some syntactic conventions for writing Orc programs. None of these conventions are required by the parser; newlines are used only to disambiguate certain corner cases in parsing, and other whitespace is ignored. However, following programming convention helps to improve the readability of programs, so that the programmer's intent is more readily apparent.
When the combined expressions are small, write them all on one line.
F | G | HNote that we do not need parentheses here, since
| is fully associative and commutative.
When the combined expressions are large enough to take up a full line, write one expression
per line, with each subsequent expression aligned with the first and preceded by |.
Indent the first expression to improve readability.
long expression | long expression | long expression
A sequence of parallel expressions often form the left hand side of a sequential combinator. Since the sequential combinator has higher precedence, use parentheses to group the combined parallel expressions together.
( expression | expression ) >x> another expression
When the combined expressions are small, write a cascade of sequential combinators all on the same line.
F >x> G >y> H
Remember that sequential is right associative; in this example, x is bound in both G and H, and y is
bound in H.
When the combined expressions are large enough to take up a full line, write one expression per line; each line ends with the combinator which binds the publications produced by that line.
long expression >x> long expression >y> long expression
For very long-running expressions, or expressions that span multiple lines, write the combinators on separate lines, indented, between each expression.
very long expression >x> very long expression >y> very long expression
When the combined expressions are small, write them on the same line:
F <x< GWhen multiple pruning combinators are used to bind multiple variables (especially when the scoped expression is long), start each line with a combinator, aligned and indented, and continue with the expression.
long expression <x< G <y< H
The pruning combinator is not often written in its explicit form
in Orc programs. Instead, the val declaration is often more
convenient, since it is semantically equivalent and mentions the variable
x before its use in scope, rather than after.
val x = G val y = H long expression
Additionally, when the variable is used in only one place, and the expression is small, it is often easier to use a nested expression. For example,
val x = G val y = H M(x,y)is equivalent to
M(G,H)
Sometimes, we use the pruning combinator simply for its capability to terminate
expressions and get a single publication; binding a variable is irrelevant. This
is a special case of nested expressions. We use the identity site let
to put the expression in the context of a function call.
For example,
x <x< F | G | His equivalent to
let(F | G | H)
The translation uses a pruning combinator, but we don't need to write the combinator, name an irrelevant variable, or worry about precedence (since the expression is enclosed in parentheses as part of the call).
Declarations should always end with a newline.
def add(x,y) = x + y val z = 7 add(z,z)While the parser does not require a newline to end a declaration, it uses the newline to disambiguate certain corner cases in parsing, such as function application.
When the body of a declaration spans multiple lines, start the body on a new line
after the = symbol, and indent the entire body.
def f(x,y) = declaration declaration body expression
Apply this style recursively; if a def appears within a def, indent its contents even further.
def f(x,y) = declaration def helper(z) = declaration in helper declaration in helper body of helper declaration body expression
In this section we give Orc implementations of some standard idioms from concurrent and functional programming. Despite the austerity of Orc's four combinators, we are able to encode a variety of idioms straightforwardly.
Orc has no communication primitives like pi-calculus channels[1] or Erlang mailboxes[2]. Instead, it makes use of sites to create channels of communication.
The most frequently used of these sites is Buffer. When called, it
publishes a new asynchronous FIFO channel. That channel is a site with two
methods: get and put. The call c.get()
takes the first value from channel c and publishes it, or blocks
waiting for a value if none is available. The call c.put(v) puts
v as the last item of c and publishes a signal.
A channel may be closed to indicate that it will not be sent any more values.
If the channel c is closed, c.put(v) always halts
(without modifying the state of the channel), and c.get() halts
once c becomes empty. The channel c may be closed by
calling either c.close(), which returns a signal once
c becomes empty, or c.closenb(), which returns a
signal immediately.
In the section on Cor, we were introduced to lists: how to construct them, and how to match them against patterns. While it is certainly feasible to write a specific function with an appropriate pattern match every time we want to access a list, it is helpful to have a handful of common operations on lists and reuse them.
One of the most common uses for a list is to send each of its elements through
a sequential combinator. Since the list itself is a single value, we want
to walk through the list and publish each one of its elements in parallel
as a value. The library function each does exactly that.
Suppose we want to send the message invite to each email
address in the list inviteList:
each(inviteList) >address> Email(address, invite)
Orc also adopts many of the list idioms of functional programming. The Orc library contains definitions
for most of the standard list functions, such as map and fold. Many of the
list functions internally take advantage of concurrency to make use of any available parallelism; for
example, the map function dispatches all of the mapped calls concurrently, and assembles
the result list once they all return using a fork-join.
Sometimes a source of data is not explicitly represented by a list or other data structure. Instead, it is made available through a site, which returns the values one at a time, each time it is called. We call such a site a stream. It is analogous to an iterator in a language like Java. Functions can also be used as streams, though typically they will not be pure functions, and should only return one value. A call to a stream may halt, to indicate that the end of the data has been reached, and no more values will become available. It is often useful to detect the end of a stream using the otherwise combinator.
Streams are common enough in Orc programming that there is a library function to take all of the
available publications from a stream; it is called repeat, and it is analogous to
each for lists.
def repeat(f) = f() >x> (x | repeat(f))
The repeat function calls the site or function f with no arguments,
publishes its return value, and recurses to query for more values. repeat should
be used with sites or functions that block until a value is available. Notice that if any
call to f halts, then repeat(f) consequently halts.
For example, it is very easy to treat a channel c as a stream, reading any
values put on the channel as they become available:
repeat(c.get)
While lists are a very useful data structure, they are not indexed; it is not possible to get the nth element of
a list in constant time. However, this capability is often needed in practice, so the Orc standard library
provides a function IArray to create immutable arrays. Once initialized, an immutable array cannot
be rewritten; it can only be read.
IArray takes two arguments: an array size, and a function to initialize the array. The function
takes the index being initialized as an argument (indices start at 0), and publishes the value to be stored
at that array position. Here are a few examples:
{- Create an array of 10 elements; element i is the ith power of 2 -} IArray(10, lambda(i) = 2 ** i)
{- Create an array of 5 elements; each element is a newly created buffer -} IArray(5, lambda(_) = Buffer())
The array is used like a function; the call A(i) returns the ith element of the
array A. A call with an out-of-bounds index halts.
{- Create an array of 3 buffers -} val A = IArray(10, lambda(_) = Buffer()) {- Send true on the 0th channel and listen for a value on the 0th channel. -} A(0).put(true) | A(0).get()
Since arrays are accessed by index, there is a library function specifically
designed to make programming with indices easier. The function upto(n)
publishes all of the numbers from 0 to
n-1 simultaneously; thus, it is very easy to access all of the elements of
an array simultaneously. Suppose we have an array A of n email
addresses and would like to send the message m to each one.
upto(n) >i> A(i) >address> Email(address, m)
Variables in Orc are immutable. There is no assignment operator, and there is no way to
change the value of a bound variable. However, it is often useful to have mutable state
when writing certain algorithms. The Orc library contains two sites that offer simple
mutable storage: Ref and Cell.
The Ref site creates rewritable reference cells.
val r = Ref(0) println(r.read()) >> r.write(2) >> println(r.read()) >> stop
These are very similar to ML's ref cells. r.write(v) stores
the value v in the reference r, overwriting any previous
value, and publishes a signal. r.read() publishes the current value stored
in r.
However, unlike in ML, a reference cell can be left initially empty by calling Ref
with no arguments. A read operation on an empty cell blocks until the cell is written.
{- Create a cell, and wait 1 second before initializing it. -} {- The read operation blocks until the write occurs. -} val r = Ref() r.read() | Rtimer(1000) >> r.write(1) >> stop
The Orc library also offers write-once reference cells, using the Cell site.
A write-once cell has no initial value. Read operations block until the cell has been
written. A write operation succeeds only if the cell is empty; subsequent write operations
simply halt.
{- Create a cell, try to write to it twice, and read it -} {- The read will block until a write occurs and only one write will succeed. -} val r = Cell() Rtimer(1000) >> r.write(2) >> println("Wrote 2") >> stop | Rtimer(1000) >> r.write(3) >> println("Wrote 3") >> stop | r.read()Write-once cells are very useful for concurrent programming, and they are often safer than rewritable reference cells, since the value cannot be changed once it has been written. The use of write-once cells for concurrent programming is not a new idea; they have been studied extensively in the context of the Oz programming language.
A word of caution: References, cells, and other mutable objects may be accessed concurrently by many different parts of an Orc program, so race conditions may arise.
Orc provides syntactic sugar for reading and writing mutable storage:
x? is equivalent to x.read(). This operator
is of equal precedence with the dot operator and function application, so
you can write things like x.y?.v?. This operator is very similar
to the C languages's * operator, but is postfix instead of prefix.
x := y is equivalent to x.write(y).
This operator has higher precedence than the concurrency combinators and if/then/else,
but lower precedence than any of the other operators.Here is a previous example rewritten using this syntactic sugar:
{- Create a cell, try to write to it twice, and read it -} {- The read will block until a write occurs and only one write will succeed. -} val r = Cell() Rtimer(1000) >> r := 2 >> println("Wrote 2") >> stop | Rtimer(1000) >> r := 3 >> println("Wrote 3") >> stop | r?
Orc does not have any explicit looping constructs. Most of the time, where a loop might be used in other languages, Orc programs use one of two strategies:
each, repeat, and upto.
foldl, applies. The library also defines a
function while, which handles many of the common use cases of
while loops.
Matching a value against multiple patterns, as we have seen it so far, is a linear
process, and requires a def whose clauses have patterns in their
argument lists. Such a match is linear; each pattern is tried in order until
one succeeds.
What if we want to match a value against multiple patterns in parallel, executing every clause that succeeds? Fortunately, this is very easy to do in Orc. Suppose we have an expression F which publishes pairs of integers, and we want to publish a signal for each 3 that occurs.
We could write:
F >(x,y)> ( if(x=3) >> signal | if(y=3) >> signal )But there is a more general alternative:
F >x> ( x >(3,_)> signal | x >(_,3)> signal )The interesting case is the pair
(3,3), which is counted twice
because both patterns match it in parallel.
This parallel matching technique is sometimes used as an alternative to pattern matching using function clauses, but only when the patterns are mutually exclusive.
For example,
def helper([]) = 0 def helper([_]) = 1 def helper(_:_:_) = 2 helper([4,6])is equivalent to
[4,6] >x> x >[]> 0 | x >[_]> 1 | x >_:_:_> 2whereas
def helper([]) = 0 def helper([_]) = 1 def helper(_) = 2 helper([5])is not equivalent to
[5] >x> x >[]> 0 | x >[_]> 1 | x >_> 2because the clauses are not mutually exclusive. Function clauses must attempt to match in linear order, whereas this expression matches all of the patterns in parallel. Here, it will match
[5] two different ways,
publishing both 1 and 2.
One of the most common concurrent idioms is a fork-join: run two processes concurrently,
and wait for a result from each one. This is very easy to express in Orc. Whenever we write a val
declaration, the process computing that value runs in parallel with the rest of the program. So if we write
two val declarations, and then form a tuple of their results, this performs a fork-join.
val x = F val y = G (x,y)
Fork-joins are a fundamental part of all Orc programs, since they are created by all nested expression translations. In fact, the fork-join we wrote above could be expressed even more simply as just:
(F,G)
In Orc programs, we often use fork-join and recursion together to dispatch many tasks in parallel and wait
for all of them to complete. Suppose that given a machine m, calling m.init()
initializes m and then publishes a signal when initialization is complete. The function
initAll initializes a list of machines.
def initAll([]) = signal def initAll(m:ms) = ( m.init() , initAll(ms) ) >> signal
For each machine, we fork-join the initialization of that machine (m.init()) with the initialization
of the remaining machines (initAll(ms)). Thus, all of the initializations proceed in parallel, and
the function returns a signal only when every machine in the list has completed its initialization.
Note that if some machine fails to initialize, and does not return a signal, then the initialization procedure will never complete.
We can also use a recursive fork-join to obtain a value, rather than just signaling completion. Suppose we
have a list of bidders in a sealed-bid, single-round auction. Calling b.ask() requests a bid
from the bidder b. We want to ask for one bid from each bidder, and then return the highest
bid. The function auction performs such an auction for a list of bidders (max
finds the maximum of its arguments):
def auction([]) = 0 def auction(b:bs) = max(b.ask(), auction(bs))
Note that all bidders are called simultaneously. Also note that if some bidder fails to return a bid, then the auction will never complete. Later we will see a different solution that addresses the issue of non-termination.
Consider an expression of the following form, where F and G are expressions and M and N are sites:
M() >x> F | N() >y> G
Suppose we would like to synchronize F and G, so that both start
executing at the same time, after both M() and N() respond. This is easily done
using the fork-join idiom. In the following, we assume that x does not occur
free in G, nor y in F.
( M() , N() ) >(x,y)> ( F | G )
Previous sections illustrate how Orc can use the fork-join idiom to process a
fixed set of expressions or a list of values. Suppose that instead we wish to
process all the publications of an expression F, and once this processing is
complete, execute some expression G. For example, F publishes the contents
of a text file, one line at a time, and we wish to print each line to the
console using the site println, then publish a signal after all lines
have been printed.
Sequential composition alone is not sufficient, because we have no way to
detect when all of the lines have been processed. A recursive fork-join
solution would require that the lines be stored in a traversable data structure
like a list, rather than streamed as publications from F. A better solution
uses the ; combinator to detect when processing is complete:
F >x> println(x) >> stop ; signal
Since ; only evaluates its right side if the left side does not publish,
we suppress the publications on the left side using stop. Here, we
assume that we can detect when F halts. If, for example,
F is publishing the lines of the file as it receives them over a socket,
and the sending party never closes the socket, then F never halts and no
signal is published.
The otherwise combinator is also useful for trying alternatives in sequence. Consider an expression of
the form F0 ; F1 ; F2 ; .... If Fi does not publish
and halts, then Fi+1 is executed. We can think of the Fi's as a series of
alternatives that are explored until a publication occurs.
Suppose that we would like to poll a list of buffers for available data. The list of buffers is ordered by priority. The first buffer in the list has the highest priority, so it is polled first. If it has no data, then the next buffer is polled, and so on.
Here is a function which polls a prioritized list of buffers in this way. It
publishes the first item that it finds, removing it from the originating
buffer. If all buffers are empty, the function halts. We use the getnb ("get non-blocking") method of the buffer, which retrieves the first
available item if there is one, and halts otherwise.
def priorityPoll([]) = stop def priorityPoll(b:bs) = b.getnb() ; priorityPoll(bs)
``Parallel or'' is a classic idiom of parallel programming. The ``parallel or'' operation executes two
expressions F and G in parallel, each of which may publish a single boolean,
and returns the disjunction of their publications as soon as possible.
If one of the expressions publishes true, then the disjunction is true,
so it is not necessary to wait for the other expression to publish a value.
This holds even if one of the expressions is silent.
The ``parallel or'' of expressions F and G may be expressed in Orc as follows:
let( val a = F val b = G (a || b) | if(a) >> true | if(b) >> true )
The expression (a || b) waits for both a and b to become
available and then publishes their disjunction. However if either a or
b is true we can publish true immediately regardless of whether the
other variable is available. Therefore we run if(a) >> true and if(b) >> true
in parallel to wait for either variable to become true and immediately
publish the result true. Since more than one of these expressions may
publish true, the surrounding let(...) is necessary to select and
publish only the first result.
Timeout, the ability to execute an expression for at most a specified
amount of time, is an essential ingredient of fault-tolerant and distributed
programming. Orc accomplishes timeout using pruning and the Rtimer site.
The following program runs F for at most one second, publishing its result if
available and the value 0 otherwise.
let( F | Rtimer(1000) >> 0 )
In the auction example given previously, the auction may never complete if one of the bidders does not respond. We can add a timeout so that a bidder has at most 8 seconds to provide a bid:
def auction([]) = 0 def auction(b:bs) = val bid = b.ask() | Rtimer(8000) >> 0 max(bid, auction(bs))
This version of the auction is guaranteed to complete within 8 seconds.
Sometimes, rather than just yielding a default value, we would like to
determine whether an expression has timed out, and if so, perform some other
computation. To detect the timeout, we pair the result of the original
expression with true and the result of the timer with false.
Thus, if the expression does time out, then we can distinguish that case
using the boolean value.
Here, we run expression F with a time limit t. If it publishes
within the time limit, we bind its result to r and execute G.
Otherwise, we execute H.
val (r, b) = (F, true) | (Rtimer(t), false) if b then G else HInstead of using a boolean and conditional, we could use pattern matching:
val s = Some(F) | Rtimer(t) >> None() s >Some(r)> G | s >None()> H
We can use a timer to give a window of priority to one computation over another. In this example, we run expressions F and G concurrently. For one second, F has priority; F's result is published immediately, but G's result is held until the time interval has elapsed. If neither F nor G publishes a result within one second, then the first result from either is published.
val x = F val y = G let( y | Rtimer(1000) >> x )
A timer can be used to execute an expression repeatedly at regular
intervals, for example to poll a service.
Recall the definition of metronome from the previous chapter:
def metronome(t) = signal | Rtimer(t) >> metronome()
The following example publishes "tick" once per second and "tock" once per
second after an initial half-second delay. The publications alternate: "tick
tock tick tock ...". Note that this program is not defined recursively;
the recursion is entirely contained within metronome.
metronome(1000) >> "tick" | Rtimer(500) >> metronome(1000) >> "tock"
The Orc combinators restrict the passing of values among their component
expressions. However, some programs will require greater
flexibility. For example, F <x< G provides F with the first
publication of G, but what if F needs the first n publications of G?
In cases like this we use channels or other stateful sites to redirect or
store publications. We call this technique routing
because it involves routing values from one execution to another.
The pruning combinator terminates an expression after it publishes its first
value. We have already seen how to use
pruning just for its termination capability, without binding a variable, using
the let site. Now, we use routing to terminate an expression
under different conditions, not just when it publishes a value; it may
publish many values, or none, before being terminated.
Our implementation strategy is to route the publications of the expression through a channel, so that we can put the expression inside a pruning combinator and still see its publications without those publications terminating the expression.
As a simple demonstration of this concept, we construct a more powerful form of timeout: allow an expression to execute, publishing arbitrarily many values (not just one), until a time limit is reached.
val c = Buffer() repeat(c.get) << F >x> c.put(x) >> stop | Rtimer(1000) >> c.closenb()
This program allows F to execute for one second and then terminates it. Each
value published by F is routed through channel c so that it does
not terminate F. After one second, Rtimer(1000) responds,
triggering the call c.closenb(). The call
c.closenb() closes c and publishes a signal,
terminating F. The library function repeat is used to repeatedly
take and publish values from c until it is closed.
We can also decide to terminate based on the values published. This expression executes F until it publishes a negative number, and then terminates it:
val c = Buffer() repeat(c.get) << F >x> (if x >= 0 then c.put(x) >> stop else c.closenb())
Each value published by F is tested. If it is non-negative, it is placed on
channel c (silently) and read by repeat(c.get).
If it is negative, the channel is closed, publishing a signal and causing
the termination of F.
We can use routing to interrupt an expression based on a signal from
elsewhere in the program. We set up the expression like a timeout, but instead
of waiting for a timer, we wait for the semaphore done to be released. Any
call to done.release will terminate the expression (because it will
cause done.acquire() to publish), but otherwise F executes as normal and
may publish any number of values.
val c = Buffer() val done = Semaphore(0) repeat(c.get) << F >x> c.put(x) >> stop | done.acquire() >> c.closenb()
We can limit an expression to n publications, rather than just one. Here is an expression which executes F until it publishes 5 values, and then terminates it.
val c = Buffer() val done = Semaphore(0) def allow(0) = done.release() >> stop def allow(n) = c.get() >x> (x | allow(n-1)) allow(5) << F >x> c.put(x) >> stop | done.acquire() >> c.closenb()
We use the auxiliary function allow to get only the first 5
publications from the channel c. When no more publications are allowed,
allow uses the interrupt idiom to halt F and close c.
We can use routing to create a modified version of the pruning combinator.
As in F <x< G, we'll run F and G in parallel and make the first
value published by G available to F. However instead of terminating G after
it publishes a value, we will continue running it, ignoring its remaining
publications.
val r = Ref() (F <x< r.read()) | (G >x> r.write(x))
We can also use routing to create a modified version of the otherwise combinator. We'll run F until it halts, and then run G, regardless of whether F published any values or not.
val c = Buffer() repeat(c.get) | (F >x> c.put(x) >> stop ; c.close() >> G)We use
c.close() instead of the more common c.closenb()
to ensure that G does not execute until all the publications of F have been
routed. Recall that c.close() does not return until c is
empty.
We can write a function interruptible that implements the interrupt idiom
to execute any function in an interruptible way. interruptible(g)
calls the function g, which is assumed to take no arguments, and
silences its publications. It immediately publishes another function, which we
can call at any time to terminate the execution of g. For simplicity,
we assume that g itself publishes no values.
Here is a naive implementation that doesn't quite work:
def interruptible(f) = val done = Semaphore(0) done.release << f() >> stop | done.acquire() >> c.closenb() {- wrong! -} val stopper = interruptible(g) ...
The function interruptible is correct, but the way it is used causes a strange error.
The function g executes, but is always immediately terminated! This happens because the
val declaration which binds stopper also kills all of the remaining
computation in interruptible(g), including the execution of g itself.
The solution is to bind the variable differently:
def interruptible(f) = val done = Semaphore(0) done.release << f() >> stop | done.acquire() >> c.closenb() interruptible(g) >stopper> ...
This idiom, wherein a function publishes some value that can be used to monitor or control its execution, arises occasionally in Orc programming. When using this idiom, always remember to avoid terminating that execution accidentally. Since Orc is a structured concurrent language, every process is contained with some other process; kill the containing process, and the contained processes die too.
def lifter() = val c = Buffer() def loop() = c.get() >f> ( f() >> stop | loop() ) c.put | loop() def delayedPrint() = Rtimer(1500) >> println("Delayed 1.5 seconds") lifter() >lift> ( println("Running...") << Rtimer(1000) | delayedPrint() | lift(delayedPrint) )The timeout stops the execution of
delayedPrint(), so it does not print a result.
However, the lifted execution of delayedPrint does succeed, since it is executing
within the loop of lifter(), unaffected by the timeout.
We consider various concurrent implementations of the classic "list fold" function from functional programming:
def fold(_, [x]) = x def fold(f, x:xs) = f(x, fold(xs))
This is a seedless fold (sometimes called fold1) which requires that the
list be nonempty and uses its first element as a seed. This implementation is
short-circuiting --- it may finish early if the reduction operator f does
not use its second argument --- but it is not concurrent; no two calls to f
can proceed in parallel. However, if f is associative, we can overcome this restriction
and implement fold concurrently. If f is also commutative, we can further increase concurrency.
We first consider the case when the reduction operator is associative. We
define afold(f,xs) where f is a binary associative function and
xs is a non-empty list. The implementation iteratively reduces xs
to a single value. Each step of the iteration applies the auxiliary function
step, which halves the size of xs by reducing disjoint pairs of
adjacent items.
def afold(_, [x]) = x def afold(f, xs) = def step([]) = [] def step([x]) = [x] def step(x:y:xs) = f(x,y):step(xs) afold(f, step(xs))
Notice that f(x,y):step(xs) is an implicit
fork-join. Thus, the call f(x,y)
executes in parallel with the recursive call step(xs).
As a result, all calls to f execute concurrently within
each iteration of afold.
We can make the implementation even more concurrent when the fold operator
is both associative and commutative. We define cfold(f,xs), where
f is a associative and commutative binary function and xs is a non-empty list.
The implementation initially copies all list items into a buffer in arbitrary
order using the auxiliary function xfer, counting the total
number of items copied. The auxiliary function combine repeatedly
pulls pairs of items from the buffer, reduces
them, and places the result back in the buffer. Each pair of items is reduced
in parallel as they become available. The last item in the buffer is the
result of the overall fold.
def cfold(f, xs) = val c = Buffer() def xfer([]) = 0 def xfer(x:xs) = c.put(x) >> stop | xfer(xs)+1 def combine(0) = stop def combine(1) = c.get() def combine(m) = c.get() >x> c.get() >y> ( c.put(f(x,y)) >> stop | combine(m-1)) xfer(xs) >n> combine(n)
The dining philosophers problem is a well known and intensely studied problem in concurrent programming. Five philosophers sit around a circular table. Each philosopher has two forks that she shares with her neighbors (giving five forks in total). Philosophers think until they become hungry. A hungry philosopher picks up both forks, one at a time, eats, puts down both forks, and then resumes thinking. Without further refinement, this scenario allows deadlock; if all philosophers become hungry and pick up their left-hand forks simultaneously, no philosopher will be able to pick up her right-hand fork to eat. Lehmann and Rabin's solution[3], which we implement, requires that each philosopher pick up her forks in a random order. If the second fork is not immediately available, the philosopher must set down both forks and try again. While livelock is still possible if all philosophers take forks in the same order, randomization makes this possibility vanishingly unlikely.
def shuffle(a,b) = if (random(2) = 1) then (a,b) else (b,a) def take((a,b)) = a.acquire() >> b.acquirenb() ; a.release() >> take(shuffle(a,b)) def drop(a,b) = (a.release(), b.release()) >> signal def phil(n,a,b) = def thinking() = println(n + " thinking") >> if (random(10) < 9) then Rtimer(random(1000)) else stop def hungry() = take((a,b)) def eating() = println(n + " eating") >> Rtimer(random(1000)) >> println(n + " done eating") >> drop(a,b) thinking() >> hungry() >> eating() >> phil(n,a,b) def philosophers(1,a,b) = phil(1,a,b) def philosophers(n,a,b) = val c = Semaphore(1) philosophers(n-1,a,c) | phil(n,c,b) val fork = Semaphore(1) philosophers(5,fork,fork)
The phil function simulates a single philosopher. It takes as arguments
two binary semaphores representing the philosopher's forks, and calls
the thinking, hungry, and eating functions in a continuous
loop. A thinking philosopher waits for a random amount of time, with a
10% chance of thinking forever. A hungry philosopher uses the take
function to acquire two forks. An eating philosopher waits for a random
time interval and then uses the drop function to relinquish ownership of
her forks.
Calling take(a,b) attempts to acquire a pair of forks (a,b) in two steps:
wait for fork a to become available, then immediately attempt to acquire fork b.
The call b.acquirenb() either acquires b and responds immediately, or halts if b is not available.
If b is acquired, signal success; otherwise, release a, and
then try again, randomly changing the order in which the forks are acquired
using the auxiliary function shuffle.
The function call philosophers(n,a,b) recursively creates a chain of n
philosophers, bounded by fork a on the left and b on the right. The
goal expression of the program calls philosophers to create a chain of
five philosophers bounded on the left and right by the same fork; hence, a
ring.
This Orc solution has several nice properties. The overall structure of the
program is functional, with each behavior encapsulated in its own function,
making the program easy to understand and modify. Mutable state is isolated to
the "fork" semaphores and associated take and get functions,
simplifying the implementation of the philosophers. The program never
manipulates threads explicitly, but instead expresses relationships between
activities using Orc's combinators.
Here we implement a different solution to the Dining Philosophers problem, described in "The Drinking Philosophers Problem", by K. M. Chandy and J. Misra. Briefly, this algorithm efficiently and fairly solves the dining philosophers problem for philosophers connected in an arbitrary graph (as opposed to a simple ring). The algorithm works by augmenting each fork with a clean/dirty state. Initially, all forks are dirty. A philosopher is only obliged to relinquish a fork to its neighbor if the fork is dirty. On receiving a fork, the philosopher cleans it. On eating, the philosopher dirties all forks. For full details of the algorithm, consult the original paper.
{- Start a philosopher actor; never publishes. Messages sent between philosophers include: - ("fork", p): philosopher p relinquishes the fork - ("request", p): philosopher p requests the fork - ("rumble", p): sent by a philosopher to itself when it should become hungry name: identify this process in status messages mbox: our mailbox; the "address" of this philosopher is mbox.put missing: set of neighboring philosophers holding our forks -} def philosopher(name, mbox, missing) = {- deferred requests for forks -} val deferred = Buffer() {- forks we hold which are clean -} val clean = Set() def sendFork(p) = {- remember that we no longer hold the fork -} missing.add(p) >> p(("fork", mbox.put)) def requestFork(p) = p(("request", mbox.put)) {- Start a timer which will tell us when we're hungry. -} def digesting() = println(name + " thinking") >> thinking() | Rtimer(random(1000)) >> mbox.put(("rumble", mbox.put)) >> stop {- Wait to become hungry -} def thinking() = def on(("rumble", _)) = println(name + " hungry") >> map(requestFork, missing) >> hungry() def on(("request", p)) = sendFork(p) >> thinking() on(mbox.get()) {- Eat once we receive all forks -} def hungry() = def on(("fork", p)) = missing.remove(p) >> clean.add(p) >> if missing.isEmpty() then println(name + " eating") >> eating() else hungry() def on(("request", p)) = if clean.contains(p) then deferred.put(p) >> hungry() else sendFork(p) >> requestFork(p) >> hungry() on(mbox.get()) {- Dirty forks, process deferred requests, then digest -} def eating() = clean.clear() >> Rtimer(random(1000)) >> map(sendFork, deferred.getAll()) >> digesting() {- All philosophers start out digesting -} digesting() {- Create an NxN 4-connected grid of philosophers. Each philosopher holds the fork for the connections below and to the right (so the top left philosopher holds both its forks). -} def philosophers(n) = {- A set with 1 item -} def Set1(item) = Set() >s> s.add(item) >> s {- A set with 2 items -} def Set2(i1, i2) = Set() >s> s.add(i1) >> s.add(i2) >> s {- create an NxN matrix of mailboxes -} val cs = uncurry(IArray(n, lambda (_) = IArray(n, ignore(Buffer)))) {- create the first row of philosophers -} philosopher((0,0), cs(0,0), Set()) | for(1, n) >j> philosopher((0,j), cs(0,j), Set1(cs(0,j-1).put)) {- create remaining rows -} | for(1, n) >i> ( philosopher((i,0), cs(i,0), Set1(cs(i-1,0).put)) | for(1, n) >j> philosopher((i,j), cs(i,j), Set2(cs(i-1,j).put, cs(i,j-1).put)) ) {- Simulate a 3x3 grid of philosophers for 10 seconds -} let( philosophers(3) | Rtimer(10000) ) >> "HALTED"
Our implementation is based on the actor model of concurrency. An actor is a state machine which reacts to messages. On receiving a message, an actor can send asynchronous messages to other actors, change its state, or create new actors. Each actor is single-threaded and processes messages sequentially, which makes some concurrent programs easier to reason about and avoids explicit locking. Erlang is one popular language based on the actor model.
Orc emulates the actor model very naturally. In Orc, an actor is an Orc thread
of execution, together with a Buffer which serves as a mailbox. To send a
message to an actor, you place it in the actor's mailbox, and to receive a
message, the actor gets the next item from the mailbox. The internal states of
the actor are represented by functions: while an actor's thread of execution is
evaluating a function, it is considered to be in the corresponding state.
Because Orc implements tail-call optimization,
state transitions can be encoded as function calls without running out of stack
space.
In this program, a philosopher is implemented by an actor with three primary
states: eating, thinking, and hungry.
An additional transient state, digesting, is used to start a timer
which will trigger the state change from thinking to
hungry. Each state is implemented by a function which reads a
message from the mailbox, selects the appropriate action using pattern
matching, performs the action, and finally transitions to the next state
(possibly the same as the current state) by calling the corresponding function.
Forks are never represented explicitly. Instead each philosopher identifies a fork with the "address" (sending end of a mailbox) of the neighbor who shares the fork. Every message sent includes the sender's address. Therefore when a philosopher receives a request for a fork, it knows who requested it and therefore which fork to relinquish. Likewise when a philosopher receives a fork, it knows who sent it and therefore which fork was received.
Here we present an Orc solution to the readers-writers problem. Briefly, the readers-writers problem involves concurrent access to a mutable resource. Multiple readers can access the resource concurrently, but writers must have exclusive access. When readers and writers conflict, different solutions may resolve the conflict in favor of one or the other, or fairly. In the following solution, when a writer tries to acquire the lock, current readers are allowed to finish but new readers are postponed until after the writer finishes. Lock requests are granted in the order received, guaranteeing fairness. Normally, such a service would be provided to Orc programs by a site, but it is educational to see how it can be implemented directly in Orc.
-- Queue of lock requests val m = Buffer() -- Count of active readers/writers val c = Counter() {-- Process requests in sequence --} def process() = -- Grant read request def grant((false,s)) = c.inc() >> s.release() -- Grant write request def grant((true,s)) = c.onZero() >> c.inc() >> s.release() >> c.onZero() -- Goal expression of process() m.get() >r> grant(r) >> process() {-- Acquire the lock: argument is "true" if writing --} def acquire(write) = val s = Semaphore(0) m.put((write, s)) >> s.acquire() {-- Release the lock --} def release() = c.dec() ------------------------------------------------- {-- These definitions are for testing only --} def reader(start) = Rtimer(start) >> acquire(false) >> println("START READ") >> Rtimer(1000) >> println("END READ") >> release() >> stop def writer(start) = Rtimer(start) >> acquire(true) >> println("START WRITE") >> Rtimer(1000) >> println("END WRITE") >> release() >> stop let( process() {- Output: -} | reader(10) {- START READ -} | reader(20) {- START READ -} {- END READ -} {- END READ -} | writer(30) {- START WRITE -} {- END WRITE -} | reader(40) {- START READ -} | reader(50) {- START READ -} {- END READ -} {- END READ -} -- halt after the last reader finishes | Rtimer(60) >> acquire(true) )
The lock receives requests over the channel m and processes them
sequentially with the function grant. Each request includes a
boolean flag which is true for write requests and false for read requests, and a
Semaphore which the requester blocks on. The lock grants access
by releasing the semaphore, unblocking the requester.
The counter c tracks the number of readers or writers currently
holding the lock. Whenever the lock is granted, grant increments
c, and when the lock is released, c is decremented.
To ensure that a writer has exclusive access, grant waits for the
c to become zero before granting the lock to the writer, and then
waits for c to become zero again before granting any more requests.
The original quicksort algorithm [4] was designed for efficient execution on a uniprocessor. Encoding it as a functional program typically ignores its efficient rearrangement of the elements of an array. Further, no known implementation highlights its concurrent aspects. The following program attempts to overcome these two limitations. The program is mostly functional in its structure, though it manipulates the array elements in place. We encode parts of the algorithm as concurrent activities where sequentiality is unneeded.
The following listing gives the implementation of the quicksort
function which sorts the array a in place.
The auxiliary function sort sorts the subarray given by indices
s through t by calling part to partition
the subarray and then recursively sorting the partitions.
def quicksort(a) = def swap(x, y) = a(x)? >z> a(x) := a(y)? >> a(y) := z def part(p, s, t) = def lr(i) = if i < t && a(i)? <= p then lr(i+1) else i def rl(i) = if a(i)? > p then rl(i-1) else i (lr(s), rl(t)) >(s', t')> ( if (s' + 1 < t') >> swap(s', t') >> part(p, s'+1, t'-1) | if (s' + 1 = t') >> swap(s', t') >> s' | if (s' + 1 > t') >> t' ) def sort(s, t) = if s >= t then signal else part(a(s)?, s+1, t) >m> swap(m, s) >> (sort(s, m-1), sort(m+1, t)) >> signal sort(0, a.length()-1)
The function part partitions the subarray given by indices
s through t into two partitions, one containing values
less than or equal to p and the other containing values > p. The last index of the lower partition is returned.
The value at a(s-1) is assumed to be less than or equal to p --- this is satisfied
by choosing p = a(s-1)? initially. To create the partitions, part
calls two auxiliary functions lr and rl concurrently. These
functions scan from the left and right of the subarray respectively, looking
for out-of-place elements. Once two such elements have been found, they are
swapped using the auxiliary function swap, and then the unscanned portion
of the subarray is partitioned further. Partitioning is complete when the
entire subarray has been scanned.
This program uses the syntactic sugar x? for x.read()
and x := y for x.write(y). Also note that the expression
a(i) returns a reference to the element of array a at index
i, counting from 0.
Orc makes very few assumptions about the behaviors of services it uses. Therefore it is straightforward to write programs which interact with human agents and network services. This makes Orc especially suitable for encoding workflows, the coordination of multiple activities involving multiple participants. The following program illustrates a simple workflow for scheduling a business meeting. Given a list of people and a date range, the program asks each person when they are available for a meeting. It then combines all the responses, selects a meeting time which is acceptable to everyone, and notifies everyone of the selected time.
include "net.inc" val during = Interval(LocalDate(2009, 9, 10), LocalDate(2009, 10, 17)) val invitees = ["john@example.com", "jane@example.com"] def invite(invitee) = Form() >f> f.addPart(DateTimeRangesField("times", "When are you available for a meeting?", during, 9, 17)) >> f.addPart(Button("submit", "Submit")) >> SendForm(f) >receiver> SendMail(invitee, "Meeting Request", receiver.getURL()) >> receiver.get() >response> response.get("times") def notify([]) = each(invitees) >invitee> SendMail(invitee, "Meeting Request Failed", "No meeting time found.") def notify(first:_) = each(invitees) >invitee> SendMail(invitee, "Meeting Request Succeeded", first.getStart()) map(invite, invitees) >responses> afold(lambda (a,b) = a.intersect(b), responses) >times> notify(times)
This program begins with declarations of during (the date range for the
proposed meeting) and invitees (the list of people to invite represented
by email addresses).
The invite function obtains possible meeting times from a given invitee, as
follows. First it uses library sites (Form, DateTimeRangesField,
Button, and SendForm) to construct a web form which may be used to
submit possible meeting times. Then it emails the URL of this form to the
invitee and blocks waiting for a response. When the invitee receives the
email, he or she will use a web browser to visit the URL, complete the form,
and submit it. The corresponding execution of invite receives the
response in the variable response and extracts the chosen meeting times.
The notify function takes a list of possible meeting times, selects the
first meeting time in the list, and emails everyone with this time. If the
list of possible meeting times is empty, it emails everyone indicating that no
meeting time was found.
The goal expression of the program uses the library function map to
apply notify to each invitee and collect the responses in a list. It
then uses the library function afold to intersect all of the responses.
The result is a set of meeting times which are acceptable to everyone. Finally,
notify is called to select one of these times and notify everyone of the result.
This program may be extended to add more sophisticated features, such as a quorum (to select a meeting as soon as some subset of invitees responds) or timeouts (to remind invitees if they don't respond in a timely manner). These modifications are local and do not affect the overall structure of the program. For complete details, see examples on our website.
[1] R. Milner. Communicating and Mobile Systems: the π-Calculus. Cambridge University Press, May 1999.
[2] J. Armstrong, R. Virding, C. Wikstr¨om, and M. Williams. Concurrent programming in ERLANG (2nd ed.). Prentice Hall International (UK) Ltd., Hertfordshire, UK, UK, 1996.
[3] D. J. Lehmann and M. O. Rabin. On the advantages of free choice: A symmetric and fully distributed solution to the dining philosophers problem. In POPL, pages 133–138, 1981.
[4] C. A. R. Hoare. Partition: Algorithm 63, Quicksort: Algorithm 64, and Find: Algorithm 65. Communications of the ACM, 4(7):321–322, 1961.
Table of Contents
There are two primary ways to create sites which can be used in Orc:
class declaration. This approach is easy for anyone
already familiar with Java, and such sites are straightforward to share
between Orc and Java code. However such sites are limited in how they
can interact with the Orc engine.
site declaration.
This approach provides full access to the features of the Orc engine.
However such sites are difficult to use from Java code.
External services (including web services) are handled using the Proxy pattern. A site implemented in Java (using one of the two techniques above) must act as the local proxy for the service, translating Orc site calls into the appropriate requests and translating responses into site return values, halts, or errors.
Java classes can be imported into Orc as sites using the
class declaration.
Imported classes must be in the classpath of the JVM running the Orc
interpreter. The following sections describe in detail how such imported
classes behave in Orc programs.
x.member, where x evaluates to a Java class or object, is evaluated as follows:
x has one or more methods named member,
a "method handle" site is returned which may be called like any other Orc site.
When a method handle is actually called with arguments, the appropriate Java
method is selected and called depending on the number and type of arguments, as
described in Method Resolution
below.x has a field named member,
the object's field is returned, encapsulated in a
Ref object. The
Ref object has read and write methods
which are used to get and set the value of the field.Note that no distinction is made between static and non-static members; it is an error to reference a non-static member through a class, but this does not change how members are resolved. Note also that if a field shares a name with one or more methods, there is no way to access the field directly.
The following (rather useless) example illustrates how the dot operator can be used to access both static and non-static methods and fields:
{- bind Integer to a Java class -} class Integer = java.lang.Integer {- call a static method -} val i = Integer.decode("5") {- read a field -} val m = Integer.MIN_VALUE.read() {- write a field -} Integer.MIN_VALUE.write(5) >> {- call a non-static method -} i.toString()
When x evaluates to a Java object (but not a Java class), the syntax
x(...) is equivalent to x.apply(...).
When x evaluates to a Java class, the syntax x(...)
calls the class's constructor. In case of overloaded constructors, the
appropriate constructor is chosen based on the number and types of arguments as
described in Method Resolution.
If C is bound to a Java class, it can be used as a pattern.
A pattern C(x) matches any Java object of that class or
any of its subclasses. The variable x is simply bound
to the object again; thus the matcher is just a partial identity function.
When a method handle is called, the actual Java method called is chosen based on the runtime types of the arguments, as follows:
BigDecimal,
the argument is implicitly narrowed to the formal parameter type.BigInteger,
the argument is implicitly narrowed to the formal parameter type.
The reason for the unusual implicit narrowing of BigDecimal and
BigInteger is that Orc numeric literals have these types, and it
would be awkward to have to perform an explicit conversion every time such a
value is passed to a Java method expecting a primitive.
Currently we do not implement specificity rules for choosing the best matching method; the first matching method (according to some unspecified order) is chosen. Note also that we do not support varargs methods explicitly, but instead varargs may be passed as an array of the appropriate type.
Orc values are implemented by Java objects, so in general any Orc value may be passed to a site implemented in Java. Standard Orc values have the following Java types:
java.lang.Stringjava.lang.Booleanjava.lang.Numberorc.runtime.values.TupleValueorc.runtime.values.ListValueorc.runtime.values.ClosureCurrently it is not possible to call Orc functions from Java code.
orc.runtime.values.SiteCurrently it is not possible to directly call Orc sites from Java code. However if you are implementing a site yourself, you may provide methods which can be called from Java code to invoke the behavior of the site.
Java objects may be used directly as values anywhere in an Orc program. Primitive Java values cannot be used directly in an Orc program, but are automatically boxed (and unboxed) as necessary.
When both arguments of an arithmetic or comparison operator are Java numeric types, the arguments are implicitly coerced to the widest of the two argument types. "Widest" is defined by the following relation, where ">" means "is wider than": BigDecimal > Double > Float > BigInteger > Long > Integer > Short > Byte
In order to support massive concurrency efficiently in Java, Orc uses cooperative threading. Orc programs are broken into discrete steps which are executed by a fixed-size thread pool. This approach works for Orc expressions, but the internals of an Orc site written in Java cannot be easily broken down. So Orc has two choices when it needs to call a local Java site:
Fortunately, in practice most Orc site calls take a short, deterministic amount
of time to complete. Of those that don't, it's usually only because they are
blocked waiting for some external event, like another site call. In this case,
instead of blocking the site can return control to the Orc engine, asking to
be run again when the external event occurs. In typical Java applications this
kind of non-blocking behavior is implemented using callbacks (or in its most
general form, continuation passing). Unfortunately this creates convoluted and
verbose code: what if you need to block in the middle of a for loop, for example?
Kilim resolves this issue by allowing programmers to write code in a natural blocking style and then rewriting the bytecode to perform a form of CPS conversion, so that instead of blocking the code actually returns control to a scheduler. Orc supports Kilim, allowing site authors to write sites with internal concurrency and blocking behavior which don't use Java threads and cooperate with the Orc engine.
For a full introduction to Kilim, see A Thread of One's Own, by Sriram Srinivasan. In the following, we will cover just enough to get you started writing Orc sites using Kilim.
Kilim introduces three key primitives:
Pausable
This checked exception marks methods which may block. You should never catch
this exception, and you should always declare it even if you have already
declared throws Exception or throws Throwable. In all other respects it follows the
normal rules for checked exceptions: an override can only throw it if the
superclass method throws it, and you must throw it if you call any method which
throws it.
Originally Kilim used annotations to mark pausable methods. It turned out that javac would sometimes manipulate code in ways which violate the invariants regarding use of the annotation. Annotations also have the disadvantage that the invariants on their use are not checked automatically by tools like Eclipse.
Mailbox
A multiple-producer, single-consumer queue used to communicate
between tasks. The most important methods are put(Object), which places an item in the
mailbox, and get(), which blocks until the mailbox
is non-empty and then returns the first item.
Task
Thread. Every Pausable
method must be run within a task. To create and run a task:
new Task() { public void execute() throws Pausable { // do some stuff } }.start();
Together, these primitives allow you to write highly-concurrent Java programs in a familiar threaded style. Under the hood, Kilim can schedule any number of concurrent tasks on a fixed number of physical threads, providing better scalability and performance than possible with Java threads.
When writing Orc sites you rarely need to use tasks explicitly because every Orc
site call is implicitly run inside a Kilim task if necessary. Therefore all you need to do is mark
Pausable methods. Here's a short example of a
buffer site written using Kilim Mailboxes:
public class KilimBuffer<V> { private LinkedList<Mailbox<V>> waiters = new LinkedList<Mailbox<V>>(); private LinkedList<V> buffer = new LinkedList<V>(); public synchronized void put(V o) { Mailbox<V> waiter = waiters.poll(); if (waiter != null) waiter.putnb(o); else buffer.add(o); } public synchronized V get() throws Pausable { V out = buffer.poll(); if (out != null) return out; else { Mailbox<V> waiter = new Mailbox<V>(); waiters.add(waiter); return waiter.get(); } } }
Classes using Kilim can be imported like other Java classes using the Orc
class declaration. For example, we can use the
KilimBuffer class defined above:
class Buffer = orc.lib.state.KilimBuffer val b = Buffer() Rtimer(1000) >> b.put("1 second later") >> null | b.get()
new Foo(bar.pausable());
The workaround is to use a temporary variable:
Object tmp = bar.pausable();
new Foo(tmp);
Sometimes blocking is unavoidable, for example if a site must perform blocking
IO. For such cases, Orc provides a utility method orc.runtime.Kilim#runThreaded(Callable) which farms work
out to a thread pool. The advantage of doing this over spawning your own
thread is that there is no chance of using too many threads; if no thread is
available, the method pauses until one becomes available. The disadvantage is
that if you have too many Java methods which must communicate with each other
concurrently and can't use Mailboxes, there won't be enough threads for them
all and execution will deadlock. This situation is sufficiently rare that
runThreaded is usually the correct approach.
Code which uses Kilim must be processed with the Kilim "weaver" after compiling and before running. The weaver is bundled with the Orc source code, or may be downloaded separately from the Kilim project site.
To process compiled classes, run kilim.tools.Weaver.
If your .class files are in the
build/ directory, you would run the weaver like this:
java -cp kilim.jar:build/ kilim.tools.Weaver -d build/ build/
The Orc source distribution includes an ant kilim
task (in build.xml) which weaves compiled classes in the
build directory. If you are compiling Orc in Eclipse, an
Eclipse "Builder" is included which does this automatically whenever any source
files change.
Orc sites imported with the site declaration are implemented as
Java classes which extend orc.runtime.sites.Site.
Here we will summarize the most important features of this and related classes;
for a detailed description, refer to the Javadoc.
Sites must implement a single method: callSite(Args args,
Token token). The Orc engine calls this method whenever the site is
called by the Orc program.
The args argument (of type Args) is used to get the arguments which were passed to
the site call by the Orc program. It has the following important methods:
getArg(int
n)
Object. For example, to get the value of the
first argument, call args.getArg(0).intArg(int n), stringArg(int n), boolArg(int
n), etc.ArgumentTypeMismatchException.
The token argument (of type Token) is a sort of callback which is used by the site
to return a value or signal that it has halted. It has the following important
methods:
resume(Object
value)
value from the site call. For example, to return the
number 5, call token.resume(5). To publish a
signal without returning any specific value, call token.resume().error(TokenException
problem)
callSite method; the engine will call
token.error on its behalf. To convert a Java
exception to the necessary TokenException type,
wrap it in an instance of orc.error.runtime.JavaException.die()
if uses this method to halt when called with a
false argument.
The callSite method must return control to its
caller within a short, deterministic amount of time. To implement a site which
blocks or waits for some condition, callSite must
save the token and return without calling any of
the above methods. Then when it is time for the site call to return, call the
appropriate method on the token.
Many sites are deterministic, always returning a value immediately when called.
Such sites can be implemented easily by extending orc.runtime.sites.EvalSite and implementing the method
evaluate(Args args). This method should read the
arguments from args as usual, and then return the
value to be returned from the site call, or throw a checked exception to
indicate an error.
Some sites need to perform blocking IO or use other Java blocking methods like
Object#wait(). This cannot be done directly in
the callSite or EvalSite#evaluate methods because that might block the
Orc engine. Instead, extend orc.runtime.sites.ThreadedSite and implement the method
evaluate(Args args). This method should read the
arguments from args as usual, and then return the
value to be returned from the site call, or throw a checked exception to
indicate an error. Your evaluate method will be
automatically run asynchronously in a new thread so that it does not interfere
with the Orc engine.
Beware that the Orc engine may be configured to only allow a fixed number of site threads. If too many sites need to use threads simultaneously, some will block until threads become available.
Recall that the notation x.msg corresponds to calling the site
x with the special message value msg. To implement a
site which responds to such messages, extend orc.runtime.sites.DotSite.
In your site's constructor, call the method addMember(String message, Object value) once for each
message you wish the site to respond to. The call addMember(m, v) instructs the site to return the value
v when it is called with the message m.
Any object can be used as a member value, even another site. Anonymous inner
classes extending Site are often used as member
values which behave like methods of the enclosing DotSite.
We will implement a simplified version of Orc's Buffer site,
called ExampleBuffer. The goal will be to be able
to run this Orc program:
-- Import the site definition site ExampleBuffer = ExampleBuffer -- Create a new buffer site val b = ExampleBuffer() -- wait for a value in the buffer b.get() -- put a value in the buffer | b.put(3) >> stop -- Publishes: 3
The first step is to create ExampleBuffer.java. The ExampleBuffer site is actually a discovery site: all
it does when called is create and return a reference to a new site (the buffer
instance).
public class ExampleBuffer extends EvalSite { public Object evaluate(Args args) { return new ExampleBufferInstance(); } }
Next we must implement ExampleBufferInstance,
which defines the behavior of an individual buffer. We can put this class
either in its own file or in ExampleBuffer.java, since
that is the only class which refers to it directly. How does an instance
behave? If we write b.get(), that means that b
responds to the message get with a site which we call to get a
value from the buffer. So we'll extend DotSite,
telling it to respond to the message "get" with a GetMethod site, and likewise for "put". We'll use a
linked list to store the actual contents of the buffer.
class ExampleBufferInstance extends DotSite { private LinkedList<Object> contents = new LinkedList<Object>(); public ExampleBufferInstance() { addMember("get", new GetMethod()); addMember("put", new PutMethod()); } }
The GetMethod site is implemented as an inner
class so it has easy access to the contents of the buffer. In the
implementation we first check if there is an item in the buffer. If so, we
return it. Otherwise, we must store the token in a queue to be notified when
the next item is put in the buffer. We synchronize on the outer object to
protect against concurrent modification.
private LinkedList<Token> waiters = new LinkedList<Token>(); private class GetMethod extends Site { public void callSite(Args args, Token token) { synchronized (ExampleBufferInstance.this) { if (!contents.isEmpty()) { token.resume(contents.removeFirst()); } else { waiters.add(token); } } } }
PutMethod is even simpler since it doesn't need to
block. If there is a token waiting for an item, we give it the item. Otherwise
we put the item in the buffer. We synchronize on the outer object to protect
against concurrent modification. When done we return a dummy "signal" value.
private class PutMethod extends EvalSite { public Object evaluate(Args args) { Object item = args.getArg(0); synchronized (ExampleBufferInstance.this) { if (!waiters.isEmpty()) { waiters.removeFirst().resume(item); } else { contents.add(item); } } return signal(); } }
Putting it all together, here is the entire implementation:
import java.util.LinkedList; import orc.runtime.sites.EvalSite; import orc.runtime.sites.DotSite; import orc.runtime.sites.Site; import orc.runtime.Token; public class ExampleBuffer extends EvalSite { public Object evaluate(Args args) { return new ExampleBufferInstance(); } } class ExampleBufferInstance extends DotSite { private LinkedList<Object> contents = new LinkedList<Object>(); private LinkedList<Token> waiters = new LinkedList<Token>(); private class GetMethod extends Site { public void callSite(Args args, Token token) { synchronized (ExampleBufferInstance.this) { if (!contents.isEmpty()) { token.resume(contents.removeFirst()); } else { waiters.add(token); } } } } private class PutMethod extends EvalSite { public Object evaluate(Args args) { Object item = args.getArg(0); synchronized (ExampleBufferInstance.this) { if (!waiters.isEmpty()) { waiters.removeFirst().resume(item); } else { contents.add(item); } } return signal(); } } public ExampleBufferInstance() { addMember("get", new GetMethod()); addMember("put", new PutMethod()); } }
While we believe Orc is an excellent language for web service scripting, currently the library support for such tasks is at a proof-of-concept level. The web service libraries are not bundled with the core Orc distribution, do not have stable APIs, and are not officially documented. However this section should provide you with enough information to get started.
All Orc sites related to web services are bundled in a separate "OrcSites" library. As with the core Orc distribution, you can either download this library as a prepackaged JAR or check out the source code from the "OrcSites" module in version control. We generally recommend the latter approach, since the source code provides several examples which can be used as a basis for creating your own web service sites.
Within the OrcSites source code you will find:
src/orc/lib/net/examples/Several of the examples require you to create
.properties files and place them in your
classpath. The simplest way to do this is to make sure the
examples/ directory is in your classpath, and
place the necessary .properties files there.
Web services use a variety of protocols, so there are a variety of ways to contact them. All of them boil down to creating a Java proxy site for the service and calling that. Previous sections explain how to implement such Java sites which can be called in Orc. Because web services tend to use blocking I/O, Java wrappers make frequent use of ThreadedSite and Kilim to ensure that web service calls don't block the Orc engine.
Some web services provide Java APIs specifically for the service. For example, Google Calendar. In these cases we just use the Java API from Orc, either directly or via a small Java wrapper which simplifies the interface.
OrcSites includes class orc.lib.net.GoogleCalendar as an
example of this type of web service.
OrcSites includes a generic site orc.lib.net.Webservice which allows you to connect to any SOAP RPC (specifically
rpc/encoded) service, without writing Java wrappers. Instead Apache Axis is used to
generate Java wrappers on-demand.
http://www.xmethods.net is a good place to find examples of SOAP RPC services:
Webservice site. E.g. Webservice("http://site.com/wsdl").Example:
site orc.lib.net.Webservice {- Find documentation of this service at: http://www.xmethods.net/ve2/WSDLRPCView.po? key=uuid:BF3EFCDD-FCD4-8867-3AAC-068985E7CB89 -} val service = Webservice( "http://www.ebob42.com/cgi-bin/" + "Romulan.exe/wsdl/IRoman") service.intToRoman(451)
Many services use ad-hoc REST/XML protocols. Unfortunately, there is no REST equivalent to WSDL, so there's no way to automatically generate an API for REST services. We're not trying to solve this problem with Orc, but when the web services community reaches some kind of consensus, Orc will support it.
For now you must write a Java wrapper for each service which handles marshalling and unmarshalling the data. There's no reason such marshalling code couldn't be written in Orc, calling low-level HTTP and XML libraries directly, but there would be no advantage to doing so, so we let each language play to its strengths. OrcSites includes utility classes to assist with submitting requests and parsing responses.
OrcSites includes the following examples of this type of service:
class orc.lib.net.Upcomingsite orc.lib.net.TrueRandomsite orc.lib.net.YahooSpellFactoryTable of Contents
The Orc language, as it is described in Chapter 1, is dynamically typed. If an operation occurs at runtime which is not type-correct, the call which attempted that operation becomes silent, and produces a type error that is reported on the console.
Orc also has an optional static typechecker, which will guarantee that a program is free of type errors before the program is run. For every expression in the program, the typechecker tries to find the types of values that it could publish, and checks that all such types are consistent. It performs a limited form of type inference, so it can discover many types automatically; however, the programmer must provide additional type information for defined functions, and in a few other specific cases.
The typechecker uses the local type inference algorithm described by Pierce and Turner in their paper,
Local Type Inference.
It extends this algorithm with polymorphic type constructors (e.g. List and
Buffer), user-defined datatypes (which may also be polymorphic),
and a typing strategy for external services. The typechecker supports both
parametric polymorphism (generics) and
inclusion polymorphism (subtyping),
though it does not currently implement the extended version
of the algorithm which allows both of them simultaneously (bounded polymorphism).
The typechecker is disabled by default, though typed syntax is still permitted (and types are still
checked for syntax errors) even when the typechecker is not used. It may be enabled as a project
property in the Eclipse plugin, or by using the -typecheck switch on the command line.
If the typechecker can verify that a program is correctly typed, it will display the message
... :: T
on the console. The symbol :: means "has type", and T is the type of any value that
the program might publish.
Each of the Orc constants has the expected type:
true , false :: Boolean -1, 0, 1 ... :: Integer "orc" , "ceci n'est pas une |" ... :: String 0.001, -1.5, 2.71828 ... :: Numbersignal has its own unique type, Signal.
Orc allows subtyping: some types are included in other types. For example, Integer is a subtype of Number, because all Integers are also Numbers.
Each of the primitive operators typechecks in the obvious way; for example && requires
two Boolean operands and returns a Boolean result. Some of the arithmetic operators also support
ad-hoc polymorphism; for example, a + expression with two Integer
operands has type Integer, but with one or more Number operands it instead has type Number.
There are two other important types: Top and Bot.
Top is the universal type; it is the type of any value, and all other types are subtypes
of it.
Bot is the empty type; no value has type Bot. It is a subtype of all other
types. Expressions with type Bot are expressions that the typechecker can verify will
never publish any values. In particular, stop will never publish, so it has type Bot.
Bot has an interesting status in Orc. In other typed languages, if an expression
has type Bot, this usually indicates a guaranteed error, infinite loop, or other failure
to return a value. Since sequential programming rarely involves subexpressions that are guaranteed
never to return, Bot is usually just a curiosity or a formal artifact of the type system,
and indeed many static type systems do not have a Bot type at all.
In Orc, however, Bot is very useful, since it is frequently the case that Orc expressions
are written to carry out ongoing concurrent activities but never publish any values, and the type system
can use the type Bot to indicate that no publications will ever be seen from such
expressions.
The type of a tuple is a tuple of the types of each of its elements.
(3+3, true) :: (Integer, Boolean)("orc", (2, "orc")) :: (String, (Integer, String))
The type of F | G is the join of the types of F and G.
A common supertype of two types S and T is any type U such that S and T are both subtypes of U. The join J is the least common supertype of S and T, that is, J is a common supertype, and J is also a subtype of every other common supertype.
The most common case is the join of a type T with itself. Then the join is just T. This happens when both parallel branches publish the same type of values.
3+4 | 5 :: IntegerIf S is a subtype of T, the join of S and T is T. This happens when one branch is publishing a less specific type of value. To be safe, the whole expression then has that less specific type.
3 | 3.1 :: Number
As a special case, the join of any type T with Bot is T. This occurs
when one branch publishes values and the other one never publishes. Thus, it is
safe to use the type of values from the first branch, since the second branch
contributes no values at all.
"all quiet on the western front" | EasternFront() >> stop :: String
In any other case, the typechecker will try to infer the join. However, it
may not always find the least supertype, and in some cases there may not
even be a useful common supertype. In these cases it will default to
the Top type, which is a supertype of all other types, and is
therefore always a safe join, if not a very useful one.
3 | "three" :: Top
The type of F >x> G is the type of G using the assumption x :: T, where T is the type of F.
If a pattern is used instead of a variable, the structure of the pattern must match the structure
of the type of F, and the types for each variable bound in G are taken from the corresponding
pieces of the type of F. For example, in the expression F >(x,b)> G,
where F :: (Integer, Boolean), G is typechecked with the assumptions x :: Integer
and b :: Boolean.
The type of F <x< G is the type of F using the assumption x :: T, where T is the type of G.
When a declaration val x = F occurs, the expression in scope for the declaration
is typechecked using the assumption x :: T, where T is the type of F.
Patterns in the pruning combinator and in val declarations are typechecked in the same way as in the sequential combinator.
In the conditional expression if E then F else G, the expression E must have type
Boolean, and the type of the whole expression is the join of the types of F and G, as described for the
parallel combinator.
Though the typechecker can infer the types of many expressions without additional information from the programmer, there are some
cases where the algorithm does need assistance. In particular, whenever a function is defined (using the def keyword),
the programmer must also include a signature for that function, supplying its argument types and return type.
A signature immediately precedes a function definition, and is written as follows:
def magsq(Number, Number) :: Number def magsq(x,y) = x*x + y*yArgument types replace the arguments,
:: replaces =, and the return type (the type of the body expression)
replaces the body expression. If the function is not recursive, then the return type is optional, because the typechecker can infer it from
the body expression without additional information. So, in this case, we could have just written:
def magsq(Number, Number) def magsq(x,y) = x*x + y*y
It is also possible to include the argument types and return type directly in a definition, rather than as a separate signature. The signature and definition above could be written together as:
def magsq(x :: Number, y :: Number) :: Number = x*x + y*yAn argument
x with type T is replaced by x :: T in the argument list. The return type is inserted
between the argument list and the = sign. This form is not typically used for functions with multiple clauses or pattern
matches, though it is allowed in those cases so long as only one clause has the type information.
lambda expressions are a special case: the type information must be included directly in this way,
since there is no way to declare the signature separately. Specifying the return type of a lambda is always optional,
since it is not possible for an anonymous function to make a recursive call. Written as a typed lambda function, the magsq
definition looks like:
lambda (x :: Number, y :: Number) = x*x + y*yWhen the typechecker has enough information to check the type of a lambda against a given function type, rather than needing to infer it, the argument types of the lambda are also optional. This occurs, for example, when a lambda is given as an argument to a defined higher-order function.
Once defined, functions are values, and thus have a special type of their own. The type of a function is written
very much like a signature, but using the lambda keyword instead of def and the function
name.
magsq, defined earlier, has type lambda (Number, Number) :: Numberlambda (x :: Integer) = x+1 has type lambda (Integer) :: Integerlambda () = signal has type lambda () :: Toplambda (a :: Boolean, b :: Boolean) = (b,a) has type lambda (Boolean,Boolean) :: (Boolean,Boolean)
In addition to annotation defined functions with a signature, the typechecker can also accept programmer-supplied type
information for any expression. An ascription, written E :: T, asks the typechecker
to verify that expression E has type T.
Normally, the typechecker would simply find the type of E by inference, but in certain situations it is easier to check a given type. For example, the typechecker may not be able to infer the correct join type for a parallel combinator, but it is always able to check that both branches are subtypes of an already provided type. Furthermore, adding extra type information makes it easier to pinpoint the source of a typechecking failure.
Ascriptions are also allowed in patterns; P :: T is a valid pattern, instructing the typechecker to verify
that the value fragment bound by the pattern P will always be of type T.
type
definition.
-- A rectangle is defined by its lower left and upper right coordinates. type Rect = ((Number, Number), (Number, Number)) -- Transpose such a rectangle, rotating it around an x=y axis def flip(Rect) :: Rect def flip( ((a, b), (c, d)) ) = ((a, b), (d, c))While it does not change the behavior of the typing algorithm, aliasing can substantially improve readability and make the intention of one's code much clearer.
_,
now contain the actual types that go in those slots. As an example, here is a datatype for geometric
shapes.
{- Rect(w,h): A rectangle with width w and height h. Circle(r): A circle with radius r. RegularPolygon(n,l): A regular polygon with n sides, each of length l. -} type Shape = Rect(Number, Number) | Circle(Number) | RegularPolygon(Integer, Number)We will see how to declare generic datatypes, like
Tree or Option, in the next section.
When an external site is made available to Orc with the site declaration, its type is also
made available to the Orc typechecker. The type of a site is itself treated like a service; it is passed
the types of its arguments, and responds with a return type for those arguments. Thus, a site call can typecheck
in ways not possible for function calls or other expressions. For example, sites can support ad-hoc polymorphism.
Also, sites can respond to messages sent using the dot operator. In fact, Orc has no native record type, because
the dot operator only applies to sites, so the type of the site can interpret the message internally
and determine the correct type to return, if any.
Additionally, a site can introduce new types into Orc. For example, the Semaphore site responds with
new instances of semaphores. And just as sites can be declared using site, these types can be
declared using type. Typically they are represented as Java classes, just as sites are. As an example,
here is a program which declares the Semaphore type, and then defines a function which "toggles" a pair
of semaphores by acquiring the first and releasing the second.
type Semaphore = orc.lib.state.types.SemaphoreType def toggle(Semaphore, Semaphore) :: Top def toggle(s, t) = s.acquire() >> t.release()Notice that it is not possible to typecheck
toggle without first giving a name to the Semaphore type,
because it is used in the signature of toggle.
type definitions occur alongside site definitions in the standard library, so the
types returned by many of the library functions are already available.
While the typechecker can be helpful, it will not accept programs that are not safe according to its algorithm, which can be burdensome when the programmer knows that an expression will have a certain type but the typechecker cannot verify it.
Since the typechecker is optional, it can always be turned off in these cases. But this is often too drastic a solution: typechecking difficulties often arise from small segments of a much larger program, and the rest of the program still benefits from typechecking.
Fortunately, the typechecker can be selectively disabled for parts of a program. For this purpose,
the typechecker supports a special operation called an assertion, written
E :!: T, where E is an expression and T is its asserted type. An assertion
is used like an ascription, but rather than verifying that E has type T, the typechecker instead
assumes that E has type T, without inspecting E at all. Thus, the programmer can supply the correct
type T without being restricted by the typechecking algorithm.
This feature should be used sparingly, with the knowledge that it does compromise the integrity of the typechecking algorithm. If the supplied type is wrong, runtime type errors could propagate to any part of the program that depends on that type. Assertions are useful for rapid prototyping, but they are not recommended for production code.
What is the type of [1,2,3]? It is a list, containing Integers. What about [true, true]?
Again a list, but it contains Booleans instead. It would be very cumbersome to have separate types such
as IntegerList and BooleanList for each of these possibilities, and in fact
infinitely many such types are possible. What we would like instead is one List type, with
a parameter which gives us the types contained in the list.
Such a type is called a parametric, or generic, type. It contains two parts: a type operator, such as
List, followed by a sequence of paramemters enclosed in brackets [...].
For example, [1,2,3] :: List[Integer], and [true,true] :: List[Boolean].
Lists are not the only parametric type. The standard library includes other parametric types, such as
Option, Buffer, and Cell. Type operators, such as List,
are declared using the same type declarations as any other type. For example, the
declaration
type Buffer = orc.lib.state.types.BufferTypeis part of the standard library; it declares the
Buffer type operator.
How do we write functions that use parametric types? Consider the following definition
of the append function, which appends two lists:
def append[T](List[T], List[T]) :: List[T] def append([], l) = l def append(h::t, l) = h::append(t,l)The function
append has a type argument, T, in its
signature. The type T is the type of elements in the lists that we are appending. Notice that both argument
lists must contain the same type of elements. The resulting list contains elements of that same type.
When calling the append function, in addition to providing its normal arguments, we must
also provide its type argument:
append[Integer]([1,2,3], [4,5])However, it would be very burdensome and verbose to provide type arguments to all such calls. Fortunately, in most cases, the type checker can infer the correct type arguments, in the same way that it infers the correct type for many expressions without any additional information. So in this case, we can simply write:
append([1,2,3], [4,5])and the typechecker infers that the parameter T is
Integer, since both argument lists are
of type List[Integer]. For a more thorough explanation of how this inference occurs, please
refer to Pierce and Turner's paper.
Inference of type arguments will always fail on certain kinds of calls, because the typechecker does not have enough information to infer the correct type. The most common case is a site call which constructs a parametric type without taking any arguments. For example:
val b = Buffer()will never typecheck, since there is no way for the typechecker to know what type of elements the buffer should contain. In other languages such as ML, the typechecker might be able to infer this information from the rest of the program, but Orc's typechecker is based on local type inference, which must find the information locally, such as from the types of the arguments. So, to construct a buffer that will contain Numbers, a type parameter must be given:
val b = Buffer[Number]()
A type alias may have type parameters. For example, here is a type alias for triples:
type Triple[T] = (T,T,T)
Triple is now a type operator that takes one parameter. Any occurrence of
Triple[T] is equivalent to (T,T,T).
Datatypes can also have type parameters. This allows a programmer to define very useful generic data structures. For example, this is necessary to define a reasonable datatype for trees:
type Tree[T] = Node(Tree[T], Tree[T], T) | Leaf()Notice that the datatype is recursive, and the type parameter
T is
passed recursively to Tree to define the type of the subtrees.
When a Java class is made accessible to Orc using the class declaration,
the Orc typechecker interacts with the Java type system to make Java types available
in Orc. This feature is experimental and does not allow access to all of Java's typing
features (for example, interfaces are not fully supported), but it does allow
the typechecker to handle many of the common cases.
The type returned by the constructor of a class has the same name in Orc as the class. It is the Orc type of Java objects of that class.
class File = java.io.File File("test.orc") :: File
If a class declaration binds a generic Java class, that type shows up
in Orc as a parametric type, and its constructor takes type arguments. For example:
class TreeSet = java.util.TreeSet TreeSet[String]() :: TreeSet[String]
There is a correspondence between Orc's primitive types and the equivalent classes
in Java; they are interchangeable. For example, a call expecting a java.lang.Integer
can be given an Orc Integer.
Subtyping between Java object types is determined by Java's subtyping relation: if one class is a subclass of the other, then one type will be a subtype of the other. The typechecker does not implement support a full join operation for Java types; it will not find the least common ancestor of two classes as the join.
The following figure summarizes the syntax of typed Orc as an extension of untyped Orc. The original Orc grammar rules are abbreviated by ellipses (...).
Table 4.1. Typed Orc
| T | ::= | Type | ||
| X | Type variable | |||
| | |
Integer | Boolean | String | Number | Signal | Top | Bot
| Ground type | ||
| | |
( T , ... , T )
| Tuple type | ||
| | |
lambda [ X , ... , X ]
( T , ... , T ) :: T | Function type | ||
| | | X[ T , ... , T ]
| Type application | ||
| D | ::= | ... | Declaration | |
| | |
type X = classname
| type import | ||
| | |
type X[ X , ... , X ] = T | type alias | ||
| | |
type X[ X , ... , X ] = TC | ... | TC | datatype declaration (typed) | ||
| | |
def X[ X , ... , X ]
( T , ... , T ) :: T | function signature | ||
| TC | ::= | X( T , ... , T ) | Constructor (typed) | |
| E | ::= | ... | Expression | |
| | | E :: T | type ascription | ||
| | | E :!: T | type assertion | ||
| | |
lambda
[ T , ... , T ]
( P , ... , P )
:: T = E | closure (typed) | ||
| G | ::= | ... | Argument group | |
[ T , ... , T ] ( E , ... , E ) | type parameters plus arguments | |||
| P | ::= | ... | Pattern | |
| | | P :: T | type ascription |
Table A.1. Complete Syntax of Orc
| E | ::= | Expression | ||
| C | constant value | |||
| | | X | variable | ||
| | |
stop
| silent expression | ||
| | |
( E , ... , E )
| tuple | ||
| | |
[ E , ... , E ]
| list | ||
| | | E G+ | call | ||
| | | op E | prefix operator | ||
| | | E op E | infix operator | ||
| | | E >P> E | sequential combinator | ||
| | | E | E | parallel combinator | ||
| | | E <P< E | pruning combinator | ||
| | | E ; E | otherwise combinator | ||
| | |
lambda ( P , ... , P ) = E | closure (untyped) | ||
| | |
lambda
[ T , ... , T ]
( P , ... , P )
:: T = E | closure (typed) | ||
| | |
if E then E else E | conditional | ||
| | | D E | scoped declaration | ||
| | | E :: T | type ascription | ||
| | | E :!: T | type assertion | ||
| G | ::= | Argument group | ||
( E , ... , E ) | arguments | |||
| | | [ T , ... , T ] ( E , ... , E ) | type parameters plus arguments | ||
| | | .
field | field access | ||
| | | ? | dereference | ||
| C | ::= |
Boolean | Number | String | signal | null
| Constant | |
| X | ::= | identifier | Variable | |
| D | ::= | Declaration | ||
val P = E | value declaration | |||
| | |
site X = address
| site declaration | ||
| | |
class X = classname
| class declaration | ||
| | |
include " filename "
| inclusion | ||
| | |
def X( P , ... , P ) = E | function declaration | ||
| | |
def X[ X , ... , X ]
( T , ... , T ) :: T | function signature | ||
| | |
type X = classname
| type import | ||
| | |
type X[ X , ... , X ] = T | type alias | ||
| | |
type X = UC | ... | UC | datatype declaration (untyped) | ||
| | |
type X[ X , ... , X ] = TC | ... | TC | datatype declaration (typed) | ||
| UC | ::= | X(_, ... ,_) | Constructor (untyped) | |
| TC | ::= | X( T , ... , T ) | Constructor (typed) | |
| P | ::= | Pattern | ||
| X | variable | |||
| | | C | constant | ||
| | |
_
| wildcard | ||
| | | X ( P , ... , P )
| datatype pattern | ||
| | |
( P , ... , P )
| tuple pattern | ||
| | |
[ P , ... , P ]
| list pattern | ||
| | | P : P | cons pattern | ||
| | | P as X | as pattern | ||
| | |
=X | equality pattern | ||
| | | P :: T | type ascription | ||
| T | ::= | Type | ||
| X | Type variable | |||
| | |
Integer | Boolean | String | Number | Signal | Top | Bot
| Ground type | ||
| | |
( T , ... , T )
| Tuple type | ||
| | |
lambda [ X , ... , X ] ( T , ... , T ) :: T | Function type | ||
| | | X[ T , ... , T ]
| Type application |
Where relevant, syntactic constructs are ordered by precedence, from highest to lowest. For example, among expressions, calls have higher precedence than any of the combinators, which in turn have higher precedence than conditionals.
Table of Contents
The standard library is a set of declarations implicitly available to all Orc programs. In this section we give an informal description of the standard library, including the type of each declaration and a short explanation of its use.
Orc programs are expected to rely on the host language and environment for
all but the most essential sites. For example, in the Java implementation of
Orc, the entire Java standard library is available to Orc programs via
class declarations. Therefore the Orc standard library aims only
to provide convenience for the most common Orc idioms, not the complete set of
features needed for general-purpose programming.
The standard library is fully compatible with the static type checker; all library declarations have associated type declarations, which also serve as helpful documentation.
The documentation of library functions uses special notation for parametric
types that have dot-accessible members.
Member names are written in the form Type.member, e.g. Foo.get
refers to the get member of an object of type Foo.
The object type can include type variables which are referenced by the member
type, so for example site Buffer[A].get() :: A means that
when the get method is called on a Buffer holding an
arbitrary element type A, it will return a value of the same type.
Fundamental sites and operators.
These declarations include both prefix and infix sites (operators). For
consistency, all declarations are written in prefix form, with the site name
followed by the operands. When the site name is surrounded in parentheses, as
in (+), it denotes an infix operator.
For a more complete description of the built-in operators and their syntax, see the Operators section of the User Guide.
|
When applied to no arguments, return a signal. |
|
When applied to a single argument, return that argument (behaving as the identity function). |
|
When applied to two or more arguments, return the arguments in a tuple. |
|
Fail silently if the argument is false. Otherwise return a signal. Example: -- Publishes: "Always publishes" if(false) >> "Never publishes" | if(true) >> "Always publishes" |
|
Halt with the given error message. Example, using def assert(b) = if b then signal else error("assertion failed") -- Fail with the error message: "assertion failed" assert(false) |
|
|
|
|
|
Return the additive inverse of the argument.
When this site appears as an operator, it is written in prefix form without the
zero, i.e. |
|
|
|
|
|
Example: 7/3 -- publishes 2 | 7/3.0 -- publishes 2.333... |
|
|
|
|
|
|
|
|
|
|
|
Two values with the same object identity are always considered equal. In addition, Cor constant values and data structures are considered equal if their contents are equal. Other types are free to implement their own equality relationship provided it conforms to the rules given here. Note that although values of different types may be compared with
|
|
|
|
Return the logical negation of the argument. |
|
Return the logical conjunction of the arguments. This is not a short-circuiting operator; both arguments must be evaluated and available before the result is computed. |
|
Return the logical disjunction of the arguments. This is not a short-circuiting operator; both arguments must be evaluated and available before the result is computed. |
|
The list Example: -- Publishes: (3, [4, 5]) 3:4:5:[] >x:xs> (x,xs) In patterns, the |
|
Return the absolute value of the argument. Implementation. def abs(Number) :: Number def abs(x) = if x < 0 then -x else x |
|
Implementation. def signum(Number) :: Number def signum(x) = if x < 0 then -1 else if x > 0 then 1 else 0 |
|
Return the lesser of the arguments. If the arguments are equal, return the first argument. Implementation. def min[A](A,A) :: A def min(x,y) = if y < x then y else x |
|
Return the greater of the arguments. If the arguments are equal, return the second argument. Implementation. def max[A](A,A) :: A def max(x,y) = if x > y then x else y |
|
Return the greatest integer less than or equal to this number. |
|
Return the least integer greater than or equal to this number. |
General-purpose supplemental data structures.
|
An optional value which is available. This site may also be used in a pattern. Example: -- Publishes: (3,4) Some((3,4)) >s> ( s >Some((x,y))> (x,y) | s >None()> signal ) | ||||||||||||||||||||
|
An optional value which is not available. This site may also be used in a pattern. | ||||||||||||||||||||
|
Return a semaphore with the given value. The semaphore maintains the invariant that its value is always non-negative. An example using a semaphore as a lock for a critical section: -- Prints: -- Entering critical section -- Leaving critical section val lock = Semaphore(1) lock.acquire() >> println("Entering critical section") >> println("Leaving critical section") >> lock.release()
| ||||||||||||||||||||
|
Create a new buffer (FIFO channel) of unlimited size. A buffer supports get, put and close operations. A buffer may be either empty or non-empty, and either open or closed. When
empty and open, calls to Example: -- Publishes: 10 val b = Buffer() Rtimer(1000) >> b.put(10) >> stop | b.get()
| ||||||||||||||||||||
|
Create a new buffer (FIFO channel) with the given number of slots. Putting an item into the buffer fills a slot, and getting an item opens a slot. A buffer with zero slots is equivalent to a synchronous channel. A bounded buffer may be empty, partly filled, or full, and either open or
closed. When empty and open, calls to Example: -- Publishes: "Put 1" "Got 1" "Put 2" "Got 2" val c = BoundedBuffer(1) c.put(1) >> "Put " + 1 | c.put(2) >> "Put " + 2 | Rtimer(1000) >> ( c.get() >n> "Got " + n | c.get() >n> "Got " + n )
| ||||||||||||||||||||
|
Create a synchronous channel, or rendezvous. Example: -- Publish: 10 val c = SyncChannel() c.put(10) | Rtimer(1000) >> c.get()
| ||||||||||||||||||||
|
Create a write-once storage location. Example: -- Publishes: 5 5 val c = Cell() c.write(5) >> c.read() | Rtimer(1) >> ( c.write(10) ; c.read() )
| ||||||||||||||||||||
|
Create a rewritable storage location without an initial value. Example: val r = Ref() Rtimer(1000) >> r := 5 >> stop | println(r?) >> r := 10 >> println(r?) >> stop
| ||||||||||||||||||||
|
Get the value held by a reference.
Implementation. def (?)[A](Ref[A]) :: A def (?)(r) = r.read() | ||||||||||||||||||||
|
Set the value held by a reference.
Implementation. def (:=)[A](Ref[A], A) :: Top def (:=)(r,v) = r.write(v) | ||||||||||||||||||||
|
Swap the values in two references. Implementation. def swap[A](Ref[A], Ref[A]) :: Top def swap(r,s) = (r?,s?) >(rval,sval)> (r := sval, s := rval) >> signal | ||||||||||||||||||||
|
Create a new native array of the given size. The array is initialized
to contain The resulting array can be called directly with an index, as if
its type were Simple example: -- Publishes: 3 val a = Array(1) a(0) := 3 >> a(0)? More complex example: -- Publishes: 0 1 2 val a = Array(3) for(0, a.length()) >i> a(i) := i >> stop ; a(0)? | a(1)? | a(2)?
| ||||||||||||||||||||
|
The call The user may also think of the call as returning an array whose
This function provides a simple form of memoisation; we avoid recomputing
the value of Example: val a = IArray(5, fib) -- Publishes the 4th number of the fibonnaci sequence: 5 a(3) Implementation. def IArray[A](Integer, lambda (Integer) :: A)(Integer) :: A def IArray(n, f) = val a = Array[A](n) def fill(Integer, lambda (Integer) :: A) :: Top def fill(i, f) = if i < 0 then signal else (a.set(i, f(i)), fill(i-1, f)) >> signal fill(n-1, f) >> a.get | ||||||||||||||||||||
|
Construct an empty mutable set. The set considers two
values
| ||||||||||||||||||||
|
Construct an empty mutable map from keys to values. Each key contained in the
map is associated with exactly one value. The mapping considers two keys
| ||||||||||||||||||||
|
Create a new counter initialized to the given value.
| ||||||||||||||||||||
|
Create a new dictionary (a mutable map from field names to values), initially empty. The first time each field of the dictionary is accessed (using dot notation), the dictionary creates and returns a new empty Ref which will also be returned on subsequent accesses of the same field. Dictionaries allow you to easily create object-like data structures. Example: -- Prints: 1 2 val d = Dictionary() println(d.one.read()) >> println(d.two.read()) >> stop | d.one.write(1) >> d.two.write(2) >> stop Here is the same example rewritten using Orc's reference syntax to improve clarity: -- Prints: 1 2 val d = Dictionary() println(d.one?) >> println(d.two?) >> stop | d.one := 1 >> d.two := 2 >> stop To create a multi-level dictionary, you must explicitly create sub-dictionaries for each field. For example: -- Prints: 2 val d = Dictionary() d.one := Dictionary() >> d.one?.two := 2 >> println(d.one?.two?) >> stop Note that you cannot write | ||||||||||||||||||||
|
Create a new record (an immutable map from field names to values). Arguments are consumed in pairs; the first argument of each pair is the key, and the second is the value for that key. To access the value in record -- Publishes: 1 val r = Record( "one", 1, "two", 2) r.one | ||||||||||||||||||||
|
Return the first element of a pair. Implementation. def fst[A,B]((A,B)) :: A def fst((x,_)) = x | ||||||||||||||||||||
|
Return the second element of a pair. Implementation. def snd[A,B]((A,B)) :: B def snd((_,y)) = y | ||||||||||||||||||||
|
|