Chapter 4. Type Checking

Table of Contents

4.1. Simple Typing
4.1.1. Values and Operators
4.1.2. Top and Bot
4.1.3. Tuples
4.1.4. Combinators
4.1.5. Conditionals
4.1.6. Defined Functions
4.1.7. Ascriptions
4.1.8. Aliasing
4.1.9. Datatypes
4.1.10. Interacting with External Services
4.1.11. Assertions
4.2. Polymorphism
4.2.1. Parametric types
4.2.2. Parametric functions
4.2.3. Generic calls
4.2.4. Polymorphic aliases
4.2.5. Polymorphic datatypes
4.3. Interacting with Java
4.4. Syntax of Typed Orc

The Orc language, as it is described in Chapter 1, is dynamically typed. If an operation occurs at runtime which is not type-correct, the call which attempted that operation becomes silent, and produces a type error that is reported on the console.

Orc also has an optional static typechecker, which will guarantee that a program is free of type errors before the program is run. For every expression in the program, the typechecker tries to find the types of values that it could publish, and checks that all such types are consistent. It performs a limited form of type inference, so it can discover many types automatically; however, the programmer must provide additional type information for defined functions, and in a few other specific cases.

The typechecker uses the local type inference algorithm described by Pierce and Turner in their paper, Local Type Inference. It extends this algorithm with polymorphic type constructors (e.g. List and Buffer), user-defined datatypes (which may also be polymorphic), and a typing strategy for external services. The typechecker supports both parametric polymorphism (generics) and inclusion polymorphism (subtyping), though it does not currently implement the extended version of the algorithm which allows both of them simultaneously (bounded polymorphism).

The typechecker is disabled by default, though typed syntax is still permitted (and types are still checked for syntax errors) even when the typechecker is not used. It may be enabled as a project property in the Eclipse plugin, or by using the -typecheck switch on the command line.

If the typechecker can verify that a program is correctly typed, it will display the message

... :: T

on the console. The symbol :: means "has type", and T is the type of any value that the program might publish.

4.1. Simple Typing

In this section we will see how Orc programs are typechecked, starting with simple expressions like constants and arithmetic operations, and working up to more interesting cases such as defined functions, the concurrency combinators, and calls to external services.

4.1.1. Values and Operators

Each of the Orc constants has the expected type:

  • true , false :: Boolean
  • ... -1, 0, 1 ... :: Integer
  • ... "orc" , "ceci n'est pas une |" ... :: String
  • ... 0.001, -1.5, 2.71828 ... :: Number
The expression signal has its own unique type, Signal.

Orc allows subtyping: some types are included in other types. For example, Integer is a subtype of Number, because all Integers are also Numbers.

Each of the primitive operators typechecks in the obvious way; for example && requires two Boolean operands and returns a Boolean result. Some of the arithmetic operators also support ad-hoc polymorphism; for example, a + expression with two Integer operands has type Integer, but with one or more Number operands it instead has type Number.

4.1.2.  Top and Bot

There are two other important types: Top and Bot.

Top is the universal type; it is the type of any value, and all other types are subtypes of it.

Bot is the empty type; no value has type Bot. It is a subtype of all other types. Expressions with type Bot are expressions that the typechecker can verify will never publish any values. In particular, stop will never publish, so it has type Bot.

Bot has an interesting status in Orc. In other typed languages, if an expression has type Bot, this usually indicates a guaranteed error, infinite loop, or other failure to return a value. Since sequential programming rarely involves subexpressions that are guaranteed never to return, Bot is usually just a curiosity or a formal artifact of the type system, and indeed many static type systems do not have a Bot type at all. In Orc, however, Bot is very useful, since it is frequently the case that Orc expressions are written to carry out ongoing concurrent activities but never publish any values, and the type system can use the type Bot to indicate that no publications will ever be seen from such expressions.

4.1.3. Tuples

The type of a tuple is a tuple of the types of each of its elements.

  • (3+3, true) :: (Integer, Boolean)
  • ("orc", (2, "orc")) :: (String, (Integer, String))

4.1.4. Combinators

4.1.4.1. Parallel

The type of F | G is the join of the types of F and G.

A common supertype of two types S and T is any type U such that S and T are both subtypes of U. The join J is the least common supertype of S and T, that is, J is a common supertype, and J is also a subtype of every other common supertype.

The most common case is the join of a type T with itself. Then the join is just T. This happens when both parallel branches publish the same type of values.

  • 3+4 | 5 :: Integer

If S is a subtype of T, the join of S and T is T. This happens when one branch is publishing a less specific type of value. To be safe, the whole expression then has that less specific type.

  • 3 | 3.1 :: Number

As a special case, the join of any type T with Bot is T. This occurs when one branch publishes values and the other one never publishes. Thus, it is safe to use the type of values from the first branch, since the second branch contributes no values at all.

  • "all quiet on the western front" | EasternFront() >> stop :: String

In any other case, the typechecker will try to infer the join. However, it may not always find the least supertype, and in some cases there may not even be a useful common supertype. In these cases it will default to the Top type, which is a supertype of all other types, and is therefore always a safe join, if not a very useful one.

  • 3 | "three" :: Top

4.1.4.2. Sequential

The type of F >x> G is the type of G using the assumption x :: T, where T is the type of F.

If a pattern is used instead of a variable, the structure of the pattern must match the structure of the type of F, and the types for each variable bound in G are taken from the corresponding pieces of the type of F. For example, in the expression F >(x,b)> G, where F :: (Integer, Boolean), G is typechecked with the assumptions x :: Integer and b :: Boolean.

4.1.4.3. Pruning

The type of F <x< G is the type of F using the assumption x :: T, where T is the type of G.

When a declaration val x = F occurs, the expression in scope for the declaration is typechecked using the assumption x :: T, where T is the type of F.

Patterns in the pruning combinator and in val declarations are typechecked in the same way as in the sequential combinator.

4.1.4.4. Otherwise

The type of F ; G is the join of the types of F and G, as described for the parallel combinator.

4.1.5. Conditionals

In the conditional expression if E then F else G, the expression E must have type Boolean, and the type of the whole expression is the join of the types of F and G, as described for the parallel combinator.

4.1.6. Defined Functions

Though the typechecker can infer the types of many expressions without additional information from the programmer, there are some cases where the algorithm does need assistance. In particular, whenever a function is defined (using the def keyword), the programmer must also include a signature for that function, supplying its argument types and return type.

A signature immediately precedes a function definition, and is written as follows:

def magsq(Number, Number) :: Number
def magsq(x,y) = x*x + y*y
Argument types replace the arguments, :: replaces =, and the return type (the type of the body expression) replaces the body expression. If the function is not recursive, then the return type is optional, because the typechecker can infer it from the body expression without additional information. So, in this case, we could have just written:
def magsq(Number, Number)
def magsq(x,y) = x*x + y*y

It is also possible to include the argument types and return type directly in a definition, rather than as a separate signature. The signature and definition above could be written together as:

def magsq(x :: Number, y :: Number) :: Number = x*x + y*y
An argument x with type T is replaced by x :: T in the argument list. The return type is inserted between the argument list and the = sign. This form is not typically used for functions with multiple clauses or pattern matches, though it is allowed in those cases so long as only one clause has the type information.

lambda expressions are a special case: the type information must be included directly in this way, since there is no way to declare the signature separately. Specifying the return type of a lambda is always optional, since it is not possible for an anonymous function to make a recursive call. Written as a typed lambda function, the magsq definition looks like:

lambda (x :: Number, y :: Number) = x*x + y*y
When the typechecker has enough information to check the type of a lambda against a given function type, rather than needing to infer it, the argument types of the lambda are also optional. This occurs, for example, when a lambda is given as an argument to a defined higher-order function.

Once defined, functions are values, and thus have a special type of their own. The type of a function is written very much like a signature, but using the lambda keyword instead of def and the function name.

  • magsq, defined earlier, has type lambda (Number, Number) :: Number
  • lambda (x :: Integer) = x+1 has type lambda (Integer) :: Integer
  • lambda () = signal has type lambda () :: Top
  • lambda (a :: Boolean, b :: Boolean) = (b,a) has type lambda (Boolean,Boolean) :: (Boolean,Boolean)

4.1.7. Ascriptions

In addition to annotation defined functions with a signature, the typechecker can also accept programmer-supplied type information for any expression. An ascription, written E :: T, asks the typechecker to verify that expression E has type T.

Normally, the typechecker would simply find the type of E by inference, but in certain situations it is easier to check a given type. For example, the typechecker may not be able to infer the correct join type for a parallel combinator, but it is always able to check that both branches are subtypes of an already provided type. Furthermore, adding extra type information makes it easier to pinpoint the source of a typechecking failure.

Ascriptions are also allowed in patterns; P :: T is a valid pattern, instructing the typechecker to verify that the value fragment bound by the pattern P will always be of type T.

4.1.8. Aliasing

When adding function signatures and ascriptions, it is often helpful to have simpler names for complex types that frequently occur. In the same way that variable binding can be used to give a name to a complex value, there is a type declaration that binds a complex type to a type variable. This is called aliasing, and it is accomplished using the type definition.
-- A rectangle is defined by its lower left and upper right coordinates.
type Rect = ((Number, Number), (Number, Number))

-- Transpose such a rectangle, rotating it around an x=y axis
def flip(Rect) :: Rect
def flip( ((a, b), (c, d)) ) = ((a, b), (d, c))
While it does not change the behavior of the typing algorithm, aliasing can substantially improve readability and make the intention of one's code much clearer.

4.1.9. Datatypes

We have already seen user-defined datatypes in the untyped Orc language. In the typed language, they are declared in the same way, except that the slots in the datatype, which previously were always _, now contain the actual types that go in those slots. As an example, here is a datatype for geometric shapes.
{- 
  Rect(w,h): A rectangle with width w and height h. 
  Circle(r): A circle with radius r.
  RegularPolygon(n,l): A regular polygon with n sides, each of length l.
-}
type Shape = Rect(Number, Number) | Circle(Number) | RegularPolygon(Integer, Number)
We will see how to declare generic datatypes, like Tree or Option, in the next section.

4.1.10. Interacting with External Services

In addition to verifying the correctness of code written within Orc itself, the typechecker must also ensure that external services are used correctly in the context of the Orc program.

When an external site is made available to Orc with the site declaration, its type is also made available to the Orc typechecker. The type of a site is itself treated like a service; it is passed the types of its arguments, and responds with a return type for those arguments. Thus, a site call can typecheck in ways not possible for function calls or other expressions. For example, sites can support ad-hoc polymorphism. Also, sites can respond to messages sent using the dot operator. In fact, Orc has no native record type, because the dot operator only applies to sites, so the type of the site can interpret the message internally and determine the correct type to return, if any.

Additionally, a site can introduce new types into Orc. For example, the Semaphore site responds with new instances of semaphores. And just as sites can be declared using site, these types can be declared using type. Typically they are represented as Java classes, just as sites are. As an example, here is a program which declares the Semaphore type, and then defines a function which "toggles" a pair of semaphores by acquiring the first and releasing the second.

type Semaphore = orc.lib.state.types.SemaphoreType

def toggle(Semaphore, Semaphore) :: Top
def toggle(s, t) = s.acquire() >> t.release()
Notice that it is not possible to typecheck toggle without first giving a name to the Semaphore type, because it is used in the signature of toggle.

type definitions occur alongside site definitions in the standard library, so the types returned by many of the library functions are already available.

4.1.11. Assertions

While the typechecker can be helpful, it will not accept programs that are not safe according to its algorithm, which can be burdensome when the programmer knows that an expression will have a certain type but the typechecker cannot verify it.

Since the typechecker is optional, it can always be turned off in these cases. But this is often too drastic a solution: typechecking difficulties often arise from small segments of a much larger program, and the rest of the program still benefits from typechecking.

Fortunately, the typechecker can be selectively disabled for parts of a program. For this purpose, the typechecker supports a special operation called an assertion, written E :!: T, where E is an expression and T is its asserted type. An assertion is used like an ascription, but rather than verifying that E has type T, the typechecker instead assumes that E has type T, without inspecting E at all. Thus, the programmer can supply the correct type T without being restricted by the typechecking algorithm.

This feature should be used sparingly, with the knowledge that it does compromise the integrity of the typechecking algorithm. If the supplied type is wrong, runtime type errors could propagate to any part of the program that depends on that type. Assertions are useful for rapid prototyping, but they are not recommended for production code.