New Application implementation for Emscripten
If you build your Magnum apps for the web, you can now make use of a new feature-packed, smaller and more power-efficient application implementation. It is using the Emscripten HTML5 APIs directly instead of going through compatibility layers.
Until now, the Platform::Sdl2Application was the go-to solution for most platforms including the web and mobile. However, not everybody needs all the features SDL provides and, especially on Emscripten, apart from simplifying porting it doesn’t really add anything extra on top. On the contrary, the additional layer of translation between HTML5 and SDL APIs increases the executable size and makes some features unnecessarily hard to access.
To solve that, the new Platform::EmscriptenApplication, contributed in mosra/magnum#300 by @Squareys, is using Emscripten HTML5 APIs directly, opening new possibilities while making the code smaller and more efficient.
“SDL2” vs SDL2
Since there’s some confusion about SDL among Emscripten users, let’s clarify that first. Using SDL in Emscripten is actually possible in two ways — the implicit support, implemented in library_sdl.js, gives you a slightly strange hybrid of SDL1 and SDL2 in a relatively small package. Not all SDL2 APIs are present there, on the other hand it has enough from SDL2 to make it a viable alternative to the SDL2 everyone is used to. This is what Platform::Sdl2Application is using.
The other way is a “full SDL2”, available if you pass -s USE_SDL=2
to the
linker. Two years ago we tried to remove all Emscripten-specific workarounds
from Platform::Sdl2Application by switching to this full SDL2, but
quickly realized it was a bad decision — in total it removed 30 lines of
code, but caused the resulting code to be almost 600 kB larger. The size
increase was so serious that it didn’t warrant the very minor improvements in
code maintainability. For the record, the original pull request is archived at
mosra/magnum#218.
The SDL-free EmscriptenApplication
All application implementations in Magnum strive for almost full API
compatibility, with the goal of making it possible to use an implementation
optimal for chosen platform and use case. This was already the case with
Platform::GlfwApplication and Platform::Sdl2Application, where
switching from one to the other is in 90% cases just a matter of using a
different #include
and passing a different component to CMake’s
find_package()
.
The new Platform::EmscriptenApplication continues in this fashion and we ported all existing examples and tools that formerly used Platform::Sdl2Application to it to ensure it works in broad use cases. Apart from that, the new implementation fixes some of the long-standing issues like miscalculated event coordinates on mobile web browsers or the Delete key leaking through text input events.
Power-efficient idle behavior
Since the very beginning, all Magnum application implementations default to redrawing only when needed in order to save power — because Magnum is not just for games that have to animate something every frame, it doesn’t make sense to use up all system resources by default. While this is simple to implement efficiently on desktop apps where the application has the full control over the main loop (and thus can block indefinitely waiting for an input event), it’s harder in the callback-based browser environment.
The original Platform::Sdl2Application makes use of emscripten_set_main_loop(), which periodically calls window.requestAnimationFrame() in order to maintain a steady frame rate. For apps that need to redraw only when needed this means the callback will be called 60 times per second only to be a no-op. While that’s still significantly more efficient than drawing everything each time, it still means the browser has to wake up 60 times per second to do nothing.
Platform::EmscriptenApplication instead makes use of requestAnimationFrame() directly — the next animation frame is implicitly scheduled, but cancelled again after the draw event if the app doesn’t wish to redraw immediately again. That takes the best of both worlds — redraws are still VSync’d, but the browser is not looping needlessly if the app just wants to wait with a redraw for the next input event. To give you some numbers, below is a ten-second output of Chrome’s performance monitor comparing SDL and Emscripten app implementation waiting for an input event. You can reproduce this with the Magnum Player — no matter how complex animated scene you throw at it, if you pause the animation it will use as much CPU as a plain static text web page.
DPI awareness revisited
Arguably to simplify porting, the Emscripten SDL emulation recalculates all input event coordinates to match framebuffer pixels. The actual DPI scaling (or device pixel ratio) is then being exposed through dpiScaling(), making it behave the same as Linux, Windows and Android on high-DPI screens. In contrast, HTML5 APIs behave like macOS / iOS and Platform::EmscriptenApplication follows that behavior — framebufferSize() thus matches device pixels while windowSize() (to which all events are related) is smaller on HiDPI systems. For more information, check out the DPI awareness docs.
Executable size savings
Because we didn’t end up using the heavyweight “full SDL2” in the first place, the difference in executable size is nothing extreme — in total, in a Release WebAssembly build, the JS size got smaller by about 20 kB, while the WASM file stays roughly the same.
Minimal runtime, or brain surgery with a chainsaw
On the other hand, since the new application doesn’t use any of the emscripten_set_main_loop()
APIs from library_browser.js
, it makes it
a good candidate for playing with the relatively recent
MINIMAL_RUNTIME feature of Emscripten.
Now, while Magnum is moving in the right direction, it’s not yet in a state
where this would “just work”. Supporting MINIMAL_RUNTIME
requires either
moving fast and breaking lots of things or have the APIs slowly evolve into a
state that makes it possible. Because reliable backwards compatibility and
painless upgrade path is a valuable asset in our portfolio, we chose the
latter — it will eventually happen, but not right now. Another reason is that
while Magnum itself can be highly optimized to be compatible with minimal
runtime, the usual application code is not able to satisfy those requirements
without removing and rewriting most third-party dependencies.
That being said, why not spend one afternoon with a chainsaw and try
demolishing the code to see what could come out? It’s however important to
note that MINIMAL_RUNTIME
is still a very fresh feature and thus it’s very
likely that a lot of code will simply not work with it. All the discovered
problems are listed below because at this point there are no results at all
when googling them, so hopefully this helps other people stuck in similar
places:
- std::getenv() or the
environ
variable (used by Utility::Arguments) results inwriteAsciiToMemory()
being called, which is right now explicitly disabled for minimal runtime (and thus you either get a failure at runtime or the Closure Compiler complaining about these names being undefined). Since Emscripten’s environment is just a bunch of hardcoded values and Magnum is using Node.js APIs to get the real values for command-line apps anyway, solution is to simply not use those functions. - Right now, Magnum is using C++ iostreams on three isolated places
(Utility::Debug being the most prominent) and those uses are
gradually being phased off. On Emscripten, using anything that even
remotely touches them will make the backend emit calls to
llvm_stacksave()
andllvm_stackrestore()
. The JavaScript implementations then callstackSave()
andstackRestore()
which however do not get pulled in inMINIMAL_RUNTIME
, again resulting in either a runtime error every time you call into JS (so also allemscripten_set_mousedown_callback()
functions) or when you use the Closure Compiler. After wasting a few hours trying to convince Emscripten to emit these two by adding_llvm_stacksave__deps: ['$stackSave']
the ultimate solution was to kill everything stream-related. Considering everyone who’s interested inMINIMAL_RUNTIME
probably did that already, it explains why this is another ungoogleable error. - If you use C++ streams, the generated JS driver file contains a full
JavaScript implementation of
strftime()
and the only way to get rid of it is removing all stream usage as well. Grep your JS file forMonday
— if it’s there, you have a problem. - JavaScript Emscripten APIs like
dynCall()
orallocate()
are not available and putting them into eitherEXTRA_EXPORTED_RUNTIME_METHODS
orRUNTIME_FUNCS_TO_IMPORT
either didn’t do anything or moved the error into a different place. For the former it was possible to work around it by directly calling one of its specializations (in that particular casedynCall_ii()
), the second resulted in a frustrated tableflip and the relevant piece of code getting cut off.
Below is a breakdown of various optimizations on a minimal application that
does just a framebuffer clear, each step chopping another bit off the total
download size. All sizes are uncompressed, built in Release mode with -Oz
,
--llvm-lto 1
and --closure 1
. Later on in the process,
Bloaty McBloatFace experimental
WebAssembly support was used to discover what functions contribute the most to final
code size.
Operation | JS size | WASM size |
---|---|---|
Initial state | 52.1 kB | 226.3 kB |
Enabling minimal runtime 1 | 36.3 kB | 224.5 kB |
Additional slimming flags 2 | 35.7 kB | 224.5 kB |
Disabling filesystem 3 | 19.4 kB | 224.5 kB |
Chopping off all C++ stream usage | 14.7 kB | 83.6 kB |
Enabling CORRADE_NO_ASSERT | 14.7 kB | 75.4 kB |
Removing a single use of std::sort() 4 | 14.7 kB | 69.3 kB |
Removing one std::unordered_map 4 | 14.7 kB | 62.6 kB |
Using emmalloc instead of dlmalloc 5 | 14.7 kB | 56.3 kB |
Removing all printf() usage 6 | 14.7 kB | 44 kB (estimate) |
- 1.
- ^
-s MINIMAL_RUNTIME=2 -s ENVIRONMENT=web -lGL
plus temporarily enabling also-s IGNORE_CLOSURE_COMPILER_ERRORS=1
in order to make Closure Compiler survive undefined variable errors due to iostreams and other, mentioned above - 2.
- ^
-s SUPPORT_ERRNO=0 -s GL_EMULATE_GLES_VERSION_STRING_FORMAT=0 -s GL_EXTENSIONS_IN_PREFIXED_FORMAT=0 -s GL_SUPPORT_AUTOMATIC_ENABLE_EXTENSIONS=0 -s GL_TRACK_ERRORS=0 -s DISABLE_DEPRECATED_FIND_EVENT_TARGET_BEHAVIOR=1
— basically disabling what’s enabled by default. In particular, theGL_EXTENSIONS_IN_PREFIXED_FORMAT=0
is not supported by Magnum right now, causing it to not report any extensions, but that can be easily fixed. The result of disabling all these is … underwhelming. - 3.
- ^
-s FILESYSTEM=0
, makes Emscripten not emit any filesystem-related code. Magnum provides filesystem access through various APIs (Utility::Directory, GL::Shader::addFile(), Trade::AbstractImporter::openFile(), …) and at the moment there’s no possibility to compile all these out, so this is a nuclear option that works. - 4.
- ^ a b GL::Context uses a std::sort() and a std::unordered_map to check for extension presence and print their list in the engine startup log. It was frightening to see a removal of a single std::sort() causing a 10% drop in executable size — since WebGL has roughly two dozens extensions (compared to > 200 on desktop and ES), maybe a space-efficient alternative implementation could be done for this target instead.
- 5.
- ^ Doug Lea‘s
malloc() is a general-purpose allocator, used by
glibc among others. It’s very performant and a good choice for code that
does many small allocations (std::unordered_map, I’m looking at
you). The downside is its larger size, and code doing fewer larger
allocations might want to use
-s MALLOC=emmalloc
instead. We don’t pretend Magnum is at that state yet, but other projects sucessfully switched to it, shaving more bytes off the download size. - 6.
- ^ After removing all of the above, std::printf() internals started appearing at the top of Bloaty’s size report, totalling at about 10% of the executable size. Magnum doesn’t use it anywhere directly and all transitive usage of it was killed together with iostreams; further digging revealed that it gets called from libc++’s abort_message(), for example when aborting due to a pure virtual function call. Independent measurement showed that std::printf() is around 12 kB of additional code compared to std::puts(), mainly due to the inherent complexity of floating-point string conversion. It’s planned to use the much simpler and smaller Ryū algorithm for Magnum’s std::printf() replacement, additionally ensuring that float-to-string conversions can be DCE-d when not used. We might be looking into patching Emscripten’s libc++ to not use the expensive implementation in its abort messages.
While all of the above size reductions were done in a hack-and-slash manner,
the final executable still initializes and executes properly, clearing the
framebuffer and reacting to input events. For reference, check out diffs of
the chainsaw-surgery
branches in corrade
and magnum.
The above is definitely not all that can be done — especially considering
that removing two uses of semi-heavy STL APIs led to almost 20% save in code
size, there are most probably more of such low hanging fruits. The above tasks
were added to mosra/magnum#293 (if not there already) and will get
gradually integrated into master
.
Conclusion
Bright times ahead! The new Platform::EmscriptenApplication is the first step to truly minimal WebAssembly builds and the above hints that it’s possible to have download sizes not too far from code carefully written in plain C. To give a fair comparison, the basic framebuffer clear sample from @floooh‘s Sokol Samples is 42 kB in total, while the above equivalent is roughly 59 kB. Using C++(11), but not overusing it — and that’s just the beginning.