Programming with tail call optimization combines high performance and algorithmic clarity (short, medium and long intros). Javascript does not optimize tail calls, so I implemented tail call optimization in Javascript itself, and tested it on various browsers.
The hurried ones can directly read the GCD example, check the cross-browser test results, or even jump to conclusion(s). Those interested can check the ⇒ production examples.
I am greatly indebted to Aubrey Jaffer, for his inspiring work, to Thomas Lahn, for his insightful comment on syntax, and to the people on comp.lang.javascript
for their stimulating discussions, which resulted in a precision on the current context (2010).
Update 2018-06-28: fext.js
supports mutual tail recursion, generates very fast code, and includes easy debugging tools.
You might also like the transfun.js tool for fast map/filter/reduce/...
Javascript engines are getting faster and faster, leading to bigger and bigger front-end applications, as well as Javascript servers (see e.g. node.js). Both cases involve large and complex data structures and processes. Resource-efficient implementations are thus needed, especially with the rising trend of mobile platforms.
Recursive processes often lead to an explosive resource cost. A toy example first:
function fact_rec(n) { // Recursive implementation
return n > 1 ? n * fact_rec(n - 1) : 1;
}
function fact_iter(n) { // Iterative implementation
var a, ret = 1;
for (a = 2; a <= n; a++) {
ret = ret * a;
}
return ret;
}
total number of function calls | memory: maximum stack depth | CPU: for n=13, duration in seconds | |
fact_rec(n) | n | n | 1.758e-6 s (100: base) |
fact_iter(n) | 1 | 1 | 1.299e-7 s (7) |
(Whenever you want, you can duration tests on this page.)
For this simple task, the recursive implementation uses already about one order of magnitude more resources than the iterative implementation does.
Whereas the recursive one directly implements a mathematical definition, the iterative implementation directly describes a run-time process. Converting from the former to the latter may well be difficult on more complex tasks.
A compromise between easing implementation and saving resources is tail optimization, where a compiler executes in an iterative manner a certain type of recursive implementation (example below).
Unfortunately, most Javascript engines do not implement tail optimization. Therefore, I am proposing to optimize Javascript functions on the fly using... Javascript itself (tailopt.js
), inspired by the great works of Eugene Lazutkin and Aubrey Jaffer:
Implementation | → Generated code |
Verbose: | fact_tailopt.toString() :
|
Concise:
| fact_tailopt2.toString() :Same as above. |
Note: the parentheses around the sub-function expression are not strictly necessary - however I tend to prefer writing them to suggest right away the presence of an actual call. A matter of taste.
total number of function calls | memory: maximum stack depth | CPU: for n=13, duration in seconds | |
fact_rec(n) | n | n | 1.758e-6 s (100: base) |
fact_tailopt(n) | 1 | 1 | 2.208e-7 s (13) |
fact_tailopt2(n) | 1 | 1 | 2.203e-7 s (13) |
fact_iter(n) | 1 | 1 | 1.299e-7 s (7) |
Interpretation: The function calls disappear:
fact_tailopt
's speed is:fact_rec
,fact_iter
.How tailopt
works: tailopt(fun)
detects in fun.toString()
the Javascript equivalent of Scheme "named lets" and unrolls them.
Note about very buggy JS engines (e.g. IE7, maybe IE8, sorry I am loosing interest in IE...): if the non-anonymous function expressions are not correctly supported, you can still define your sub-function in an anonymous - but more verbose - manner:
var fact_tailopt = tailopt( function fact_tail2(n) {
var fact_sub;
return (fact_sub = function (p, tmp) { // Anonymous function expression on the RHS
return p ? fact_sub(p - 1, p * tmp) : tmp;
})(n, 1); // The initial parameters
}
);
The use case is tail call optimization (TCO) in Javascript without trampoline.
ECMAScript 3 & 5 neither require the engine to do TCO, nor fully specify function decompilation, so one has to compromise:
tailopt()
's parameters,Who's not willing to compromise has to wait until all engines implement TCO.
In the case of a Javascript server, things might be simpler, since one knows which engine is running (e.g. the Google's V8 Engine used by the server node.js).
(Inspired by the page: "Refactoring - Replace Iteration with Recursion".)
The iterative version has no stack depth issue, but lacks algorithmic clarity:
function gcd_iter(a, b) {
while (true) {
if (a > b) {
a -= b;
} else if (b > a) {
b -= a;
} else {
break;
}
}
return a;
}
In the recursive code, we have the stack depth issue, but the algorithm is clearly written:
function gcd_rec(a, b) {
if (a > b) {
return gcd_rec(a-b, b);
} else if (a < b) {
return gcd_rec(a, b-a);
}
// a === b
return a;
}
Could we combine the best of both? Clear code and constant stack depth? That's where tailopt
comes in:
var gcd_tailopt = tailopt(gcd_rec);
In this case, we need neither extra auxiliary variable nor sub-function (as opposed to fact_tail
).
total number of function calls | memory: maximum stack depth | CPU: for a=28974330 and b=310200 (gcd: 330), duration in seconds | |
gcd_rec(a, b) | n | n | 1.450e-5 s (100: base) |
gcd_tailopt(a, b) | 1 | 1 | 1.215e-6 s (8) |
gcd_iter(a, b) | 1 | 1 | 4.323e-7 s (3) |
Note about IE7 and such, i.e. if you need to use anonymous function expressions:
var gcd_tailopt = tailopt("gcd_rec", function (a, b) {
if (a > b) {
return gcd_rec(a-b, b);
} else if (a < b) {
return gcd_rec(a, b-a);
}
// a === b
return a;
});
The task is to replicate a string many times, as fast as possible - and without blowing the memory (first work on arrays, then do a join(''))
.
Recursive code:
function string_repli_rec(s, n) {
if (!((n > 0) && s)) { return ''; }
return (function sub(s, n) {
if (n < 1) { return ['']; }
if (n === 1) { return [s]; }
var half = Math.floor(n / 2), a = sub(s, half);
return a.concat(a, sub(s, n - 2 * half));
})(s, n).join('');
}
Tail-recursive code:
function string_repli_tail(s0, n0){
if (!((n0 > 0) && s0)) { return ''; }
return (function sub(s, n, tmp){
if (n & 1) { tmp.push(s); }
return n ? sub(s + s, n >> 1, tmp) : tmp.join('');
})(s0, n0, []);
}
Tail-optimized code (= iterative process):
var string_repli_tailopt = tailopt( string_repli_tail );
Iterative code (taken from dojo 1.4.1):
function string_repli_iter(str, num){
if(num <= 0 || !str){ return ''; }
var buf = [];
for(;;){
if(num & 1){
buf.push(str);
}
if(!(num >>= 1)){ break; }
str += str;
}
return buf.join('');
}
Both iterative and tail-recursive codes can be seen as very similar. However, the tail-recursive way *forces* to clearly separate initialization, update and termination, whereas the iterative way does not clearly separate all three.
Average iteration duration on 500 runs on a linux platform, in seconds (relative values in brackets) - the lower, the better:
Stack depth: case name Firefox 3.6 Google Chrome 4.0 Google V8 Engine Rhino Engine Your browser & system
n: string_repli_rec 5.972e-5 (100.0) 5.312e-5 (100.0) 3.324e-5 (100.0) 1.337e-2 (100.0) *
n: string_repli_tail 1.582e-5 ( 26.5) 4.075e-6 ( 7.7) 2.381e-6 ( 7.2) 1.243e-3 ( 9.3) *
1: string_repli_tailopt 1.105e-5 ( 18.5) 3.479e-6 ( 6.5) 2.231e-6 ( 6.7) 1.100e-3 ( 8.2) *
1: string_repli_iter 9.818e-6 ( 16.4) 3.361e-6 ( 6.3) 2.166e-6 ( 6.5) 8.072e-4 ( 6.0) *
Detailed statistical results are in appendice.
The speed improvement of the iterative version over the tail-optimized version seems marginal at best. The tail-optimized version forces to clearly separate initialization, update and termination, which can help a lot, especially for more complex code.
We have a tree of categories, whose non-leaf nodes have counts:
ooi
tour
winter
skiFreeride (count: 3)
sledging (count: 156)
winterHiking (count: 157)
...
summer
...
points of interest
...
For a given non-leaf node, we want to sum the counts of its children, so we need to traverse the subtree.
total number of function calls | memory: maximum stack depth | CPU: duration in seconds | |
treeforeach_rec() | total number of tree nodes (245) | maximum tree depth (4) | 1.870e-4 s (100: base) |
treeforeach_tailopt() | 1 | 1 | 3.608e-4 s (193) |
Let us list DOM nodes on this web page who (1) are text nodes (2) contain the string ".js":
var node_list = [],
node_check = function (node) {
if ((node.nodeName === '#text') && (node.textContent.indexOf('.js') > -1)) {
node_list.push(node);
}
};
DOMtreeforeach_rec(document, node_check);
total number of function calls | memory: maximum stack depth | CPU: duration in seconds | |
DOMtreeforeach_rec() | total number of DOM nodes (210) | maximum tree depth (12) | 1.312e-3 s (100: base) |
DOMtreeforeach_tail() | 1 | 1 | 1.462e-3 s (111) |
System: windows 7 professional, dell optiplex 760, intel core quad cpu Q9650 3.00GHz, RAM 8.00 GB, 64 bits
Results: Duration of a single call, in seconds: the lower, the better.
Remember that we are hugely saving on stack depth, so a small duration increase may not be that terrible!
Warning: With Firefox, I have observed slight variations on the CPU time, depending whether Firebug is activated or not. It likely has an impact on Tracemonkey optimizations. All results reported on this page have Firebug turned off. A further discussion on this theme can be found here.
Stack depth: case name Firefox 3.6 Safari 4.0.4 Google Chrome 3.0 IE8 Your browser & system
n: fact_rec 1.758e-6 s (100: base) 2.155e-7 s (100: base) 1.540e-7 s (100: base) 5.635e-6 s (100: base) *
n: fact_tail 2.834e-6 s (161) 6.681e-7 s (310) 4.056e-7 s (263) 8.785e-6 s (156) *
1: fact_tailopt 2.208e-7 s (13) 1.532e-7 s (71) 9.653e-8 s (63) 3.548e-6 s (63) *
1: fact_iter 1.299e-7 s (7) 1.155e-7 s (54) 6.055e-8 s (39) 1.381e-6 s (24) *
n: fact_tail2 2.891e-6 s (165) 6.807e-7 s (316) 4.084e-7 s (265) 8.785e-6 s (156) *
1: fact_tailopt2 2.203e-7 s (13) 1.577e-7 s (73) 9.593e-8 s (62) 3.612e-6 s (64) *
n: gcd_rec 1.450e-5 s (100: base) 1.798e-6 s (100: base) 1.445e-6 s (100: base) 5.294e-5 s (100: base) *
1: gcd_tailopt 1.215e-6 s (8) 7.652e-7 s (43) 5.343e-7 s (37) 1.921e-5 s (36) *
1: gcd_iter 4.323e-7 s (3) 6.823e-7 s (38) 5.355e-7 s (37) 1.046e-5 s (20) *
n: treeforeach_rec() 1.870e-4 s (100: base) 5.935e-5 s (100: base) 1.303e-4 s (100: base) 6.286e-4 s (100: base) *
n: treeforeach_tail() 2.139e-4 s (114) 8.308e-5 s (140) 2.116e-4 s (162) 6.229e-4 s (99) *
1: treeforeach_tailopt() 3.608e-4 s (193) 7.600e-5 s (128) 3.094e-4 s (238) 4.638e-4 s (74) *
n: DOMtreeforeach_rec() 1.312e-3 s (100: base) 7.761e-4 s (100: base) 1.566e-3 s (100: base) 7.231e-3 s (100: base) *
1: DOMtreeforeach_tail() 1.191e-3 s (91) 5.068e-4 s (65) 7.021e-4 s (45) 6.333e-3 s (88) *
1: DOMtreeforeach_tailopt() 1.462e-3 s (111) 4.822e-4 s (62) 7.671e-4 s (49) 9.923e-3 s (137) *
fact_tail
and fact_tail2
, as well as for fact_tailopt
& fact_tailopt2
.fact
and gcd
: the tail-optimized version _tailopt
*always* runs faster than the original tail & recursive versions (_tail
, _rec
)._tailopt
*sometimes* runs slower than _tail
and/or _rec
. treeforeach_tailopt()
and Firefox 3.6/Google Chrome 3.0.fact_rec
and gcd_rec
).We obtain a constant stack depth *and* a much smaller duration than the recursive version:
treeforeach_tailopt()
for Safari 4.0.4 and IE8, DOMtreeforeach_tailopt()
for Firefox 3.6 and IE8.fact_tailopt()
and gcd_tailopt()
for all browsers. - almost as fast as the iterative code.DOMtreeforeach_tailopt()
for Safari 4.0.4 and Google Chrome 3.0System: Ubuntu 8.04 on a Thinkpad T43 (Intel Pentium M, 2 GHz, 1.5 GB RAM)
Java version: 1.5.0, gij (GNU libgcj) version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
Rhino version: 1.6 release 7 2008 10 14
Test: 10 runs
rhino -e "load('tailopt-js-rerun.js');rerun(10);"
Results: Duration of a single call, in seconds: the lower, the better.
Stack depth: test case
n: fact_rec_____________ iter_sec: 4.002e-4 (100: base)
n: fact_tail____________ iter_sec: 5.825e-4 (146)
1: fact_tailopt_________ iter_sec: 4.899e-4 (122)
n: fact_tail2___________ iter_sec: 5.931e-4 (148)
1: fact_tailopt2________ iter_sec: 4.748e-4 (119)
1: fact_iter____________ iter_sec: 1.207e-4 (30)
n: gcd_rec______________ iter_sec: 3.042e-3 (100: base)
1: gcd_tailopt__________ iter_sec: 3.003e-3 (99)
1: gcd_iter_____________ iter_sec: 7.957e-4 (26)
n: string_repli_rec_____ iter_sec: 1.340e-2 (100: base)
n: string_repli_tail____ iter_sec: 1.215e-3 (9)
1: string_repli_tailopt_ iter_sec: 1.123e-3 (8)
1: string_repli_iter____ iter_sec: 8.237e-4 (6)
n: treeforeach_rec______ iter_sec: 5.478e-2 (100: base)
n: treeforeach_tail_____ iter_sec: 5.668e-2 (103)
1: treeforeach_tailopt__ iter_sec: 6.306e-2 (115)
Stack depth savings apart, the tailopt
results do not always bring a speed improvement, probably because Rhino runs on top of Java.
System: Ubuntu 8.04 on a Thinkpad T43 (Intel Pentium M, 2 GHz, 1.5 GB RAM)
V8 version: 2.1.0
Test: 10 runs
d8 -e "load('tailopt-js-rerun.js');rerun(10);"
Results: Duration of a single call, in seconds: the lower, the better.
Stack depth: test case
n: fact_rec_____________ iter_sec: 2.674e-7 (100: base)
n: fact_tail____________ iter_sec: 4.149e-7 (155)
1: fact_tailopt_________ iter_sec: 1.904e-7 (71)
n: fact_tail2___________ iter_sec: 4.160e-7 (156)
1: fact_tailopt2________ iter_sec: 1.904e-7 (71)
1: fact_iter____________ iter_sec: 1.338e-7 (50)
n: gcd_rec______________ iter_sec: 2.676e-6 (100: base)
1: gcd_tailopt__________ iter_sec: 1.085e-6 (41)
1: gcd_iter_____________ iter_sec: 1.215e-6 (45)
n: string_repli_rec_____ iter_sec: 3.352e-5 (100: base)
n: string_repli_tail____ iter_sec: 2.486e-6 (7)
1: string_repli_tailopt_ iter_sec: 2.411e-6 (7)
1: string_repli_iter____ iter_sec: 2.214e-6 (7)
n: treeforeach_rec______ iter_sec: 2.247e-4 (100: base)
n: treeforeach_tail_____ iter_sec: 2.922e-4 (130)
1: treeforeach_tailopt__ iter_sec: 2.813e-4 (125)
tailopt
is bringing an additional speed improvement, like in the browser cases.
We can write clear Javascript recursive code that runs almost as fast as the iterative code.
This seems particularly adequate when having:
Why are Javascript engines *not already* optimizing this?
function gcd_rec(a, b) {
if (a > b) {
return gcd_rec(a-b, b);
} else if (a < b) {
return gcd_rec(a, b-a);
}
// a === b
return a;
}
Done (new implementation that supports mutual recursion and solves name collisions).
On another page. They include standard deviation statistics and a comparison with trampoline implementations.
Produced on 2018-06-28 by index.scm - by Guillaume Lathoud (glathoud _at_ yahoo _dot_ fr)