OrcWiki: Deparallelization
https://orc.csres.utexas.edu/wiki/
The Orc programming language project wiki (wiki changes feed)en-usJSPWiki 2.10.2
https://orc.csres.utexas.edu/wiki/Wiki.jsp?page=Deparallelization&version=1
Deparallelization, version 1AdrianQuark created this page on Sat Jan 17 17:01:40 CST 2009:<br /><hr /><br /><p><div class="toc">
<div class="collapsebox">
<h4 id="section-TOC">Table of Contents</h4>
<ul>
<li class="toclevel-1"><a class="wikipage" href="#section-Deparallelization-Overview">Overview</a></li>
<li class="toclevel-1"><a class="wikipage" href="#section-Deparallelization-Theory">Theory</a></li>
<li class="toclevel-1"><a class="wikipage" href="#section-Deparallelization-Limitations">Limitations</a></li>
<li class="toclevel-1"><a class="wikipage" href="#section-Deparallelization-NestedAnd">Nested | and ;</a></li>
</ul>
</div>
</div>
</p><h2 id="section-Deparallelization-Overview"> Overview<a class="hashlink" href="#section-Deparallelization-Overview">#</a></h2>
<p>The main purpose of deparallelization is to remove pruning and parallel combinators to produce an expression which uses only sequencing and "otherwise" (which can be interpreted efficiently or compiled into bytecode). The goal is to deparallelize first-order expressions written in Cor + | + >>.
</p>
<h2 id="section-Deparallelization-Theory"> Theory<a class="hashlink" href="#section-Deparallelization-Theory">#</a></h2>
<p>An expression is <b>contained</b> if it calls only contained sites. A contained site is any site whose observed behavior is not affected by, and cannot affect, other site calls. Some examples of contained sites:
</p>
<ul><li><tt>+</tt> is contained: obviously
</li><li><tt>if</tt> is contained: contained sites are allowed to halt
</li><li><tt>/</tt> is contained: contained sites may trigger an error
</li><li><tt>println</tt> is contained: although it prints to stdout, this side-effect cannot affect the behavior of the program
</li><li><tt>random</tt> is contained: although it is non-functional, any relationship to concurrent calls cannot be observed
</li><li><tt>x := 3</tt> is not contained: the side-effect of updating x may be observed
</li><li><tt>x?</tt> is not contained: the behavior depends on writes to x
</li><li><tt>Rtimer</tt> is not contained: the behavior depends on a shared clock (this case shows the definition of containment needs clarification)
</li></ul>The predicate contained[] is defined by:
<pre>
contained[M(x)] = M is contained
contained[f <x< g] = contained[f] and contained[g]
contained[f >x> g] = contained[f] and contained[g]
contained[f ; g] = contained[f] and contained[g]
contained[f | g] = contained[f] and contained[g]
</pre>
<p>An expression is <b>pure</b> if it calls only pure sites. A pure site is any site whose publication is determined only by the values of its arguments, and has no observable behavior aside from publication (or lack thereof). This corresponds to the mathematical notion of a function. All pure expressions are contained, but not all contained expressions are pure:
</p>
<ul><li><tt>/</tt> is not pure because it may print an error message
</li><li><tt>println</tt> is not pure because it prints to stdout
</li><li><tt>random</tt> is not pure because its publications vary even with the same arguments
</li><li><tt>if</tt> is pure: pure sites are allowed to halt (without error)
</li></ul>The predicate pure[] is defined by:
<pre>
pure[M(x)] = M is pure
pure[f <x< g] = pure[f] and pure[g]
pure[f >x> g] = pure[f] and pure[g]
pure[f ; g] = pure[f] and pure[g]
pure[f | g] = pure[f] and pure[g]
</pre>
<p>An expression is <b>sequential</b> (maybe "deterministic" is a better term?) if it does not use the parallel combinator. The predicate seq[] is defined by:
</p>
<pre>
seq[M(x)] = true
seq[f <x< g] = seq[f]
seq[f >x> g] = seq[f] and seq[g]
seq[f ; g] = seq[f] and seq[g]
seq[f | g] = false
</pre>
<p>An expression is <b>safe</b> if it publishes at least one value. The predicate safe[] is defined by:
</p>
<pre>
safe[M(x), E] = E[x] and M is a safe site
safe[f <x< g, E] = safe[f, E+x:safe[g, E]]
safe[f >x> g, E] = safe[f] and safe[g, E+x:true]
safe[f ; g, E] = safe[f, E] or safe[g, E]
safe[f | g, E] = safe[f, E] or safe[g, E]
</pre>
<p>Here's a more refined definition of safe[] which evaluates to a statement of
predicate logic; if the statement is a theorem (= true) then the expression is
safe. This definition is necessary, for example, to recognize that "if then
else" will always execute either the then branch or the else branch.
</p>
<pre>
value[M(x), V] = fresh symbol
value[~a, V] = not V[a]
value[a && b, V] = V[a] and V[b]
value[a || b, V] = V[a] or V[b]
value[f >x> g, V] = value[g, V+x:value[f, V]]
value[f <x< g, V] = value[f, V+x:value[g, V]]
value[f | g, V] = value[f, V] or value[g, V]
value[f ; g, V] = value[f, V] or value[g, V]
safe[if(x), P, V] = P[x] and V[x]
safe[M(x), P, V] = P[x] and (M is a safe site)
safe[f <x< g, P, V] = safe[f, P+x:safe[g, P], V+x:value[g, V]]
safe[f >x> g, P, V] = safe[f, P, V] and safe[g, P+x:true, V+x:value[f, V]]
safe[f ; g, P, V] = safe[f, P, V] or safe[g, P, V]
safe[f | g, P, V] = safe[f, P, V] or safe[g, P, V]
</pre>
(When evaluating | and ;, should we take advantage of safety to prove that one branch can never publish?)
<p>Now we will enumerate the identities which we will use to deparallelize expressions. First, some standard identities:
</p>
<pre>
(f | g) >> h
<=> f >> h | g >> h
(f >> g) <x< h
<=> (f <x< h) >> g
if g does not refer to x
<=> (f | g) <x< h
(f <x< h) | g
if g does not refer to x
<=> (f | g) <x< h
f | (g <x< h)
if f does not refer to x
<=> (f ; g) <x< h
(f <x< h) ; g
if g does not refer to x
</pre>
<p>Define a transformation O[] on contained expressions:
</p>
<pre>
O[M(...)] = M(...)
O[f | g] = O[f] ; O[g]
O[f ; g] = O[f] ; O[g]
O[f >> g] = f >> O[g]
O[f << g] = O[f] << g
</pre>
<p>This transformation obeys the identity:
</p>
<pre>
f << g
<=> f << O[g]
if g is contained
</pre>
<p>Some new identities:
</p>
<pre>
f <x< h
<=> f
if h is pure
and f does not refer to x
(f | g) <x< h
<=> (f <x< h) | (g <x< h)
if h is pure
(is this necessary/desirable?)
(f ; g) <x< h
<=> (f <x< h) ; (g <x< h)
if h is pure
(is this necessary/desirable?)
f <x< h
<=> h >x> f
if h is contained, safe and sequential
f ; g
<=> f
if f is safe
(f >> g) | h
<=> f >> (g | h)
if f is safe and sequential
(f >> g) <x< h
<=> f >> (g <x< h)
if (h is pure or (f is safe and sequential))
and f does not refer to x
(f_0 | f_1 | ... | f_n) <x< g
<=> g >x> (f_0 | f_1 | ... | f_n)
if g is sequential
and every f_i has the form "M(x)" or "M(x) >> g"
(can we generalize this better?)
</pre>
<p>We can apply these identities in a bottom-up fashion to:
</p>
<ol><li>pull parallel combinators up (until they can be removed by O[])
</li><li>push pruning combinators down (to the point where the variable is used)
</li><li>remove parallel combinators (to make expressions sequential)
</li><li>remove pruning combinators (to deparallelize)
</li></ol><h2 id="section-Deparallelization-Limitations"> Limitations<a class="hashlink" href="#section-Deparallelization-Limitations">#</a></h2>
<p>What are some interesting expressions we cannot deparallelize?
</p>
<h2 id="section-Deparallelization-NestedAnd"> Nested | and ;<a class="hashlink" href="#section-Deparallelization-NestedAnd">#</a></h2>
<pre>
x <x< ((f | g) ; h) >> j
</pre>
<p>If f succeeds and j fails, then we must still try g, but we must not try h; there is no way to express this without |, and therefore no way to remove the pruning combinator. Fortunately, Cor + | + >> expressions, lacking ";", cannot have this form<a class="footnoteref" href="#ref-Deparallelization-1">[1]</a>.
</p>
<hr />
<p><a class="footnote" name="ref-Deparallelization-1">[#1]</a> This assumes that <tt>if b then f else g</tt> is encoded as <tt>if(b) >> f | if(~b) >> g</tt> rather than <tt>if(b) >> f ; if(~b) >> g</tt>.
</p>Sat, 17 Jan 2009 23:01:40 GMT