!!! Orc's External Value API

Orc needs a powerful API for accessing external objects, sites, and values.
The existing API is not providing the flexibility and performance the new implementations need.

!! The Orc 2.x API

The API provided in Orc 2.x is based on dispatching the call based on runtime values.
A cake-pattern stack of traits tries various approaches to make a call, each time a call is needed.
All calls go through the same {{invoke}} method on the cake, and all responses return through a defined handle interface.

! Pros:

* Very simple in terms of implementation and execution.

! Cons:

* Extremely slow due to a huge number of megamorphic call sites and the fact that much of the dispatch has to be repeated at every call.
* Does not enable static/JIT optimization since dispatch has to happen at the same time as the call itself.

! Field Support

The Orc 2.x site API does not have any specific support for fields.
However, it could be added fairly simply by adding another method {{getField}} similar to invoke, but implementing field extractions semantics.

!! Accessor API (Proposed)

The accessor API is based on class-based dispatch which is decoupled from the call itself.
The API would provide a {{getInvoker}} method which takes the callee class and argument classes.
The argument classes are required because Java supports static multiple dispatch which translates to dynamic multiple dispatch in a partially dynamic language like Orc.
{{getInvoker}} returns an {{Invoker}} object with a single method {{invoke(handle, callee, args...)}} which executes the call.
The invoker can be reused on any {{callee}} and {{args}} with the same classes.

For example, if types are known at compile-time we might be able to generate this:
static final Invoker inv = runtime.getInvoker(Adder.class, BigInt.class, BigInt.class)
void run() {
  inv.invoke(adder, one, two)

! Pros:

* Can be implemented as a wrapper over the old API by returning an invoker which implements the old API when an unknown class is encountered.
* The Invoker class can be a custom generated class for the specific set of argument types so the invocation can be ''very'' fast.
* The Java overloading and dispatch rules are totally type based so they would only need to be run at {{getInvoker}} time.
* Polymorphic methods, like {{+}}, could use dispatch to type specific implementations at {{getInvoker}} time.
* Compiled code (or even the interpreter) can cache or pre-generate invokers and use them if the types at the call site are as expected or as before. 
* With full compile-time type information for the Orc program, all of the invokers could be constructed statically (or at program start) without any need for runtime types. With support from the sites the invokers could even be eliminated and {{invoke}} calls replaced with inline dispatch logic.

! Cons:

* Sites supporting this interface would be harder to write since they need to separate class based dispatch from value based execution.
* This approach cannot be ported to targets which do not support strong runtime type information (~JavaScript) if the input Orc program lacks types. 

! Field Support

Fields can be implemented using a {{getFieldAccessor}} method similar to {{getInvoker}}.
{{getFieldAccessor}} takes class and a field name and returns a {{~FieldAccessor}} with a single {{get(object)}} method.
{{get}} returns the value of the selected field in the given object.
{{get}} can be applied to any object of any subtype of the class passed to {{getFieldAccessor}}.

The {{~FieldAccessor}} returned from {{getFieldAccessor}} can be generated at runtime to provide the best performance by "statically" referencing the field name and object type.

! Further Optimization

For even more performance the {{~FieldAccessor}} and {{Invoker}} classes could be replaced by Java method handles.
This would allow call sites in the Orc program to maintain a polymorphic inline cache for calls across multiple different external invocation or field lookup schemes (using {{invokedynamic}}).
This approach would make Orc calls into Java code perform almost exactly the same as Java to Java calls (4-10ns depending on how polymorphic the site is).

This optimization does not require generating the whole program as bytecode, so the Orc compiler can still target Java.
Instead, the {{invokedynamic}} code can be generated at runtime by generating small subclasses of a known superclass.
The subclass would contain a single invoke dynamic instruction which invokes an appropriate bootstrap method to get and cache the appropriate {{Invoker}}/{{~FieldAccessor}}.
The ~HotSpot compiler can inline the generated method (complete with its invoke dynamic instruction) into the calling class written in Java as long as that call site is monomorphic.

Some simple microbenchmarks show that using {{invokedynamic}}, dynamic call sites perform almost identically to normal Java method calls (probably with a slight cost in additional compiled code).
Both cost about 4ns for monomorphic calls and ~10ns for polymorphic calls (5-6 possible targets).
This out performs other options:
* Looking up the target every call: ~200ns/~480ns calls (~40x slower)
* Caching the reflected method and calling it each time: 13ns/16ns calls (~2x slower) (Scala structural type method calls)
* Caching a method handle and calling it each time: 10ns/not-tested (1.6x slower)
In addition, the {{invokedynamic}} implementation scales somewhat better to larger number of threads. 
For example, monomorphic calls in 6 threads are ~8ns with {{invokedynamic}}, 52ns (6.5x) with cached method reflection, and 20ns (2.5x) with cached method handles.

The factor differences are significant, but the total amount of time doing these invocations is likely to be small at first.
However, once other optimizations are in place then site invocation and field access may become a more important performance element, so it is important to plan for these optimizations.

! Static Metadata

Compile-time versions of the {{Invoker}} and {{~FieldAccessor}} classes and a parallel compile time site loading API provide the compiler with site metadata at various levels.

The compile-time and run-time APIs should be distinct because the runtime should not pay the time and memory cost of generating metadata that is not needed.
In may be useful to have some metadata (such as non-blocking) on invokers at runtime.

!!! See Also

* JavaScriptExternalAPI