flatorize_asmjs
: Generate fast TypedArray
code that is compatible with asm.js
by Guillaume Lathoud [1], September 2014
This page presents a plugin method flatorize.getAsmjs()
(GitHub source) that goes on top of flatorize
(see the main article, GitHub source).
Examples describe how to use flatorize.getAsmjs()
to generate asm.js/TypedArray
code that runs very fast in at least Firefox & Chrome.
Each input or output can be a number or an array of numbers. All arrays are grouped into a single one (inputs and/or output), so they must have the same type: double
, float
or int
in flatorize
notation, i.e. respectively Float64Array
, Float32Array
or Int32Array
in JavaScript notation.
See also:
asm.js
, C and D (source on GitHub).asm.js
, C, D…; GitHub)..
Here is an expression definition that uses complex numbers (details in the main article):
// f:
A call to flatorize()
:
// note the type declarations, ignored by flatorize but used later for asm.js
f2 = flatorize('a:[2 float],b:[2 float],c:[2 float]->d:[2 float]',f);
...generates flatorized JavaScript code:
// f2.getDirect():
Then, a call to flatorize.getAsmjs()
:
...returns an asm.js
generator:
// f2_asmjsGen:
The generator can be used as follows to compile and use the asm.js
code:
(This check, as a few others below, ran as you loaded the page.)
We used two steps to create the asm.js
generator f2_asmjsGen
. First, we called flatorize
, then we called flatorize.getAsmjsGen()
:
// Note the type declarations, ignored by flatorize but used later for asm.js
f2 = flatorize('a:[2 float],b:[2 float],c:[2 float]->d:[2 float]',f);
// Now the type declarations will matter
Having the intermediate flatorize
d implementation f2
can be useful to build other flatorize
d implementations, i.e. to write well-encapsulated, maintainable code using many small functions.
We only need the second step — a faster asm.js
implementation — for the functions actually used in massive computations.
If an intermediate flatorize
d implementation is not needed, one can directly create the asm.js
generator in a single step:
A call to flatorize()
(details in the main article):
...generates flatorized JavaScript code:
// matmulrows_zip_342.getDirect():
Then, a call to flatorize.getAsmjs()
:
...returns an asm.js
generator:
// matmulrows_zip_342_asmjsGen:
The generator can be used as follows to compile and use the asm.js
code:
A call to flatorize()
(details in the main article):
...generates flatorized JavaScript code:
// dftreal16flat.getDirect():
Then, a call to flatorize.getAsmjs()
:
...returns an asm.js
generator:
// dftreal16flat_asmjsGen:
The generator can be used as follows to compile and use the asm.js
code:
asmjs_dftrealflat_check( 16 );
A call to flatorize()
(details in the main article):
...generates flatorized JavaScript code:
(Might last a few seconds.)
Then, a call to flatorize.getAsmjs()
:
var dftreal1024flat_asmjsGen = flatorize.getAsmjsGen(
{ switcher: dftreal1024flat, name: "dftreal1024flat" }
);
...returns an asm.js
generator:
(Might last a few seconds)
The generator can be used as follows to compile and use the asm.js
code:
asmjs_dftrealflat_check( 1024 );
"use asm"
#We compare the speed with & without "use asm"
statement, on DFT1024. The only difference is whether or not the "use asm"
statement appears, the rest of the code remains the same.
(Feel free to do it multiple times.)
(The speed measurement can last long in some browsers.)
Example of result:
__________ Firefox 32: without "use asm": speed: 2.15e+3 iterations/second. with "use asm": speed: 2.41e+4 iterations/second. -> speedup: +1018% without "use asm": speed: 2.13e+3 iterations/second. with "use asm": speed: 2.41e+4 iterations/second. -> speedup: +1030% without "use asm": speed: 2.10e+3 iterations/second. with "use asm": speed: 2.34e+4 iterations/second. -> speedup: +1013% __________ Chrome 38: without "use asm": speed: 1.12e+4 iterations/second. with "use asm": speed: 1.23e+4 iterations/second. -> speedup: +10% without "use asm": speed: 3.48e+4 iterations/second. with "use asm": speed: 3.45e+4 iterations/second. -> speedup: -1% without "use asm": speed: 3.47e+4 iterations/second. with "use asm": speed: 3.52e+4 iterations/second. -> speedup: +1%
speedup:
as expected, Chrome does not care about "use asm"
, whereas in Firefox having the "use asm"
statement leads to a +1000% speedup.
speed:
at first, Chrome runs slower than Firefox, but afterwards, Chrome has the highest speed. Most likely the repeated use of the code triggers an extra optimization in Chrome after it "warms up".
Conclusion:
Use asm.js
for a dramatic speedup in Firefox (+1000%).
We compare the speed with Typed Arrays & with normal arrays, on DFT1024. We replace:
// Using Typed Arrays
var float64 = new stdlib.Float64Array( heap );
...
dftrealflat_buffer =
new ArrayBuffer( dftrealflat_asmjsGen.buffer_bytes )
with:
// Using normal arrays
var float64 = heap;
...
dftrealflat_buffer =
new Array( dftrealflat_asmjsGen.count )
To have a meaningful comparison, we remove "use asm"
in both cases, because the "normal array" version cannot be compiled anyway.
(Feel free to do it multiple times.)
(The speed measurement can last long in some browsers.)
Example of result:
__________ Firefox 32: with normal array: speed: 2.00e+3 iterations/second. with Typed Array: speed: 2.14e+3 iterations/second. -> speedup: +7% with normal array: speed: 2.08e+3 iterations/second. with Typed Array: speed: 2.14e+3 iterations/second. -> speedup: +3% with normal array: speed: 2.11e+3 iterations/second. with Typed Array: speed: 2.12e+3 iterations/second. -> speedup: +1% __________ Chrome 38: with normal array: speed: 1.06e+4 iterations/second. with Typed Array: speed: 1.13e+4 iterations/second. -> speedup: +6% with normal array: speed: 2.95e+4 iterations/second. with Typed Array: speed: 3.42e+4 iterations/second. -> speedup: +16% with normal array: speed: 2.95e+4 iterations/second. with Typed Array: speed: 3.48e+4 iterations/second. -> speedup: +18%
speedup:
almost none in Firefox, and about +15% to +20% in Chrome.
speed:
Since "use asm"
was removed for this comparison, Firefox runs slower than previously. Chrome exhibits the same "warm up" behaviour.
Conclusion:
Use Typed Arrays for a speedup in Chrome (+15% to +20%).
Coding forasm.js
brings you this speedup as a side-product.
We compare the speed of flatorize
, which outputs a new array at each call,
return [ _1k, _c3, _4b ];
...with the speed of flatorize.getAsmjsGen()
, which generates an in-place implementation with Typed Arrays:
float64[ 0 ] = _1k;
float64[ 1 ] = _c3;
float64[ 2 ] = _4b;
The speed tests run on on DFT1024. Based on the 2 previous results, to ensure a meaningful comparison, since flatorize
uses normal arrays, we modify the code generated by flatorize.getAsmjsGen()
to have it use normal arrays as well.
(Feel free to do it multiple times.)
(The speed measurement can last long in some browsers.)
Example of result:
__________ Firefox 32: with new output array: speed: 2.05e+3 iterations/second. with in-place array: speed: 2.15e+3 iterations/second. -> speedup: +5% with new output array: speed: 2.09e+3 iterations/second. with in-place array: speed: 2.15e+3 iterations/second. -> speedup: +3% with new output array: speed: 2.09e+3 iterations/second. with in-place array: speed: 2.15e+3 iterations/second. -> speedup: +3% __________ Chrome 38: with new output array: speed: 3.38e+3 iterations/second. with in-place array: speed: 4.81e+3 iterations/second. -> speedup: +42% with new output array: speed: 1.17e+4 iterations/second. with in-place array: speed: 2.94e+4 iterations/second. -> speedup: +151% with new output array: speed: 1.19e+4 iterations/second. with in-place array: speed: 3.01e+4 iterations/second. -> speedup: +153%
speedup:
very little in Firefox, but quite high in Chrome.
speed:
Chrome exhibits the same "warm-up" behaviour as above. Interestingly, after the "warm-up", in-place arrays are even better optimized.
We compare the speed of flatorize
with the speed of flatorize.getAsmjsGen()
, with all improvements activated ("use asm"
, Typed Arrays, in-place output).
(Feel free to do it multiple times.)
(The speed measurement can last long in some browsers.)
Example of result:
__________ Firefox 32: flatorize : speed: 2.12e+3 iterations/second. flatorize.getAsmjsGen(): speed: 2.49e+4 iterations/second. -> speedup: +1075% flatorize : speed: 2.12e+3 iterations/second. flatorize.getAsmjsGen(): speed: 2.42e+4 iterations/second. -> speedup: +1045% flatorize : speed: 2.12e+3 iterations/second. flatorize.getAsmjsGen(): speed: 2.39e+4 iterations/second. -> speedup: +1028% __________ Chrome 38: flatorize : speed: 3.55e+3 iterations/second. flatorize.getAsmjsGen(): speed: 6.05e+3 iterations/second. -> speedup: +70% flatorize : speed: 1.23e+4 iterations/second. flatorize.getAsmjsGen(): speed: 3.51e+4 iterations/second. -> speedup: +186% flatorize : speed: 1.14e+4 iterations/second. flatorize.getAsmjsGen(): speed: 1.33e+4 iterations/second. -> speedup: +17% flatorize : speed: 1.23e+4 iterations/second. flatorize.getAsmjsGen(): speed: 3.47e+4 iterations/second. -> speedup: +183% flatorize : speed: 1.25e+4 iterations/second. flatorize.getAsmjsGen(): speed: 3.54e+4 iterations/second. -> speedup: +183%
Not much left to say: huge speedups everywhere.
Writingasm.js
code brings high speedups in Firefox and Chrome.flatorize.getAsmjsGen()
conveniently generates such code for you.
See also: more speed tests of the various solutions (JS, asm.js
, C...).
flatorize
vs. flatorize.getAsmjsGen()
#flatorize
already generates very fast code (see the main article), and flatorize.getAsmjsGen()
generates even faster code.
Usage trade-off: while flatorize
always creates a new output array, flatorize.getAsmjsGen()
uses side effects — in-place output — which requires slightly more care, but provides an extra speedup.
asm.js
and small tasks#All the above showed an excellent extra speedup brought by flatorize.getAsmjsGen()
compared to flatorize
on a computationally intensive task like DFT1024.
However, on a smaller task like DFT16, in the Firefox case, you would get such a speedup only when called from asm.js
client code, and not when called from non-asm.js
client code. See also this stackoverflow page.
On an Ubuntu laptop for the (heavy) DFT1024 case I measured a speed of about 47800 iterations per seconds for Chrome 39
and about 60000 iterations per seconds for clang
(had to forget the increasingly unreliable GCC).
This is fast enough for me to do scientific computation in the browser with a much much faster and simpler developement process (JavaScript) than in C.
How to run the speed test:
flatorize_asmjs
on Chrome 39
: open the present page with Chrome 39
and go to any performance section, for example the first performance test, and run 3-4 times the test by clicking on the button that says "Measure the speed!". After the first run, Chrome should have optimized the code. Pick the median speed of the next few runs.
flatorize_c
with clang
: install V8
, Python 3
and clang
(for the latter two something like sudo apt-get install python3
and sudo apt-get install clang
should be enough). In the command line do this:
jars@jars-desktop:~/gl/flatorize$ cd test/
jars@jars-desktop:~/gl/flatorize/test$ ./test_c_v8_speed.py
...and wait. This runs all necessary unit tests, then at the end the DFT1024 speed test. The final two lines should look like this:
test_v8_c_speed: (3) evaluate the speed of the C implementation of asmjs_dftreal1024
test_v8_c_speed done, speed in clang: 59856.2169386728 iterations/second = 65536 iterations / 1.094890445 seconds
More speed tests and comparisons
→ More speed tests of the various solutions & languages (JS, asm.js
, C...).
.
Detail: