Raku Dispatch and Compiler Improvements: Grant Report Jonathan Worthington 
        
        
      
      
         
          Tue, 14-Sep-2021 by 
          Matthias Bloch
        
        
        
          
             edit post
          
        
        
        
      
      
Jonathan reports a lot of progress on his grant. We would like to thank the sponsors and Jonathan for his work.
Here is his report:
---
# Raku Dispatch and Compiler Improvements Grant Update
Since the [approval](https://news.perlfoundation.org/post/grants_may_2021_votes)
of my [grant](https://news.perlfoundation.org/post/grant_proposal_raku_dispatch_compiler_improvements)
in late June, I have been making a lot of progress with it. The grant allowed
me to dedicate the vast majority of my working time in July and August to Raku
(although I was away for 2 weeks of August on vacation). This report covers
the work done between grant approval up to the end of August.
The key goal of the grant is to bring my work on a new generalized dispatch
mechanism to the point where it can be merged and delivered to Raku users.
In summary, the new dispatch mechanism:
* Delivers greatly improved performance for a number of constructs that
  are very slow in Rakudo/MoarVM today, including deferral with `callsame`
  and other such functions (thus also aiding code using `wrap`), multiple
  dispatch involving `where` clauses or named arguments, method calls on
  roles that are punned into classes, invocation of objects that implement
  `CALL-ME`, and others.
* Replaces many special-case performance mechanisms with a single, general,
  programmable one. This simplifies MoarVM internally, while simultaneously
  allowing it to do more optimization.
Far more details can be found in the presentation I gave about this work at
The Raku Conference 2021 ([slides](https://jnthn.net/papers/2021-trc-dispatch.pdf),
[video](https://www.youtube.com/watch?v=yRFyGDVHl0E)).
At the point the grant got underway, the new dispatch mechanism was looking
promising, but still some distance from being ready to ship. The work so far
under this grant has decisively changed that, the expectation being that it
will be merged shortly after the September monthly releases (of Rakudo and
MoarVM) and thus be delivered to Raku users in the October releases.
Key tasks performed under the grant up to the end of August are as follows:
* Switch all method and subroutine dispatches in both NQP and Raku over to
  using the new dispatch mechanism, taking care of cross-language calls
  (for example, where the compiler calls bits of Raku code at `BEGIN` time)
* Switch over all implicit calls emitted during compilation to use the new
  dispatch mechanism also
* Switch the regex compiler over to emitting its calls using the new dispatch
  mechanism
* Replace the boolification mechanism and complex `if`/`unless` object ops,
  which previously involved an opaque chunk of C code, over to the new
  dispatch mechanism; this eliminated a bunch of code in the optimizer too
* Replace NQP's stringification and numification - which also involved a
  bunch of custom logic in MoarVM - with a dispatcher
* Bring the implementation of Raku multiple dispatch using the new dispatch
  mechanism to completion, including handling of required named arguments,
  typed exceptions on dispatch failure, `Junction` failover, `Proxy` args,
  dispatch based on argument unpacking, and `nextcallee` support in complex
  dispatch cases
* Add support for `callwith` to the method, wrap, and multiple dispatchers
* Various fixes to `lastcall` handling
* Switch NQP's multiple dispatch over to the new dispatcher
* Implement support for `CALL-ME`, which can be handled far more efficiently
  using the new dispatch mechanism (current Rakudo has an intermediate
  invocation that leads to slurping and re-flattening arguments, which in turn
  frustrates optimization; with the new dispatcher, the `CALL-ME` body can even
  be a candidate for inlining)
* Handle coercions using the new dispatch mechanism, again with some
  performance wins
* Replace the `findmethod`, `tryfindmethod`, and `can` ops with a dispatcher
  based solution; while the use of `nqp::ops` in modules is discouraged, these
  are among the more common ones, so retaining the API compatibility is good
  for the module ecosystem
* Implement a dispatcher-based solution for `istype`: if the answer cannot be
  given by the type cache, then a dispatcher is now used for the fallback. This
  opens the door to a range of future optimizations.
* Implement sink handling in Raku using a dispatcher, which in turn allows us
  to avoid a huge number of method calls in the common no-op situation, by
  instead using a type guard and mapping it directly to `Nil`
* Eliminate lots of superseded mechanisms in MoarVM: the multiple dispatch
  cache, smart coercion ops, the method cache, the legacy argument capture
  data structure, the invocation protocol mechanism, and the legacy calling
  conventions
* Replace a number of Rakudo extension ops with dispatcher-based solutions
  (these are C extensions to MoarVM, which we are seeking to fully eliminate;
  while this is not a goal for the new dispatcher work, we are now down to
  around 10 of them, putting it in reach in the near future; this is of some
  end user interest as it is currently a blocker for making a single executable
  that bundles MoarVM, Rakudo, and a program)
* Reinstate type statistics collection when using the new dispatcher, so the
  type specializer can start to do its optimization work again
* Start translating dispatch programs built at callsites into sequences of
  ops, including guards. This means that, in specialized code, we can very
  often avoid interpreting dispatch programs, and instead have JITted guard
  sequences (with the guards potentially being eliminated), and also exposes
  dispatches resulting in bytecode invocation for further optimization
* Reinstate specialization linking for bytecode invocations (this is where
  one piece of specialized code can directly call a specialized form of the
  caller without additional type checks); this is restricted so far to
  calls that don't have potential resumptions, so doesn't yet work for method
  or multi calls, for example
* Resinstate inlining, with the same restrictions as for specialization
  linking
* Reinstate OSR (On Stack Replacement, used to switch hot loops into their
  optimized form when it is available)
* Design and implement a solution for better handling of megamorphic method
  callsites, and make use of it in the NQP method dispatcher
A few other improvements were made not directly related to the new dispatch
mechanism, but because the opportunity for improvement was spotted during
performance analysis:
* Rework how action methods are invoked, such that most such invocations are
  monomorphic rather than all going through a megamorphic site; this should
  allow simple action methods to even be inlined in the future
* Make specializer statistics cleanup much cheaper, meaning the specializer
  thread can spend more time doing useful work
The total time worked up to the end of August on the grant is **144 hours
42 minutes**, meaning that 55 hours and 18 minutes remain.
      
      
      
      
Comments (0)