A hot topic for Firefox at the moment is the new out-of-process rendering, but is it common knowledge that this has already been in Firefox Mobile for a long time? For mobile, there’s what we call a ‘chrome’ process (this processes and renders the main UI) and then ‘content’ processes, which handle the rendering of the pages in your tabs. There are lots of fun and tricky issues when you choose to do things like this, mostly centering around synchronisation – and recently, I was trying to add a feature that’s lead me to writing this post.
You may have already heard about how Firefox accelerates the rendering of web content. In a nut-shell, a page is separated into a series of layers (say, background, content, canvases, plug-ins, etc.) These layers are then pasted onto each other, in what tends to get called composition. If you’re lucky and have decent drivers, or you run on Windows, this process of composition is accelerated by your video card. Turns out video cards are very good at composition, so this is often a nice bonus. We also try to accelerate the rendering of these layers too, but that’s another topic…
These layers are arranged in what’s known as a layer-tree – when something on the screen needs to update, this tree is traversed, and painted to the screen. But how is this affected by out-of-process rendering? You can’t have both processes painting to the screen simultaneously without some kind of coordination, and often there are various rules on memory sharing/protection that limit how sharing happens too. We choose to let the chrome process handle getting things to the screen. It’s important, however, that the content process not be able to hold up the chrome process too readily. But if we want the page to render correctly and respond to user input, we need the page’s layer tree… So how do we go about solving this?
We use what we’ve called ‘shadow’ layers – the chrome process has a mirror-copy of the content process’s layer tree, and the content process can update it when it’s ready. In the meantime, we have something we can paint and the page continues to be reactive, to the extent at least that you can read it, you can scroll it and you can zoom it. We render a larger area of the page than is visible so that while the content process is busy rendering, we don’t appear to ‘fall behind’ (when we do, you see the checker-board background, similar to the iPhone).
We have various implementations of these layers for different platforms, so we can take advantage of platform-specific features. There’s a GL implementation (GL[X] and EGL), a Direct3D implementation (9 and 10) and a ‘basic’ implementation that uses cairo and runs in software. When the content process changes its layer tree, it sends a transaction representing that change over to the chrome process. Part of this transaction is likely to involve updating of visible buffers. If both processes use basic layers (the default case, on android at least), we use shared memory and page-flipping. That is, the content process renders into one buffer while the chrome process renders out of another buffer, and when the content process updates, they swap around.
For accelerated layers, this is a slightly different and more complicated story. As we can’t share textures across processes and we don’t currently have a remote cairo implementation, the content process always uses basic layers and renders into memory (though there is work going on to allow remote access to acceleration). The chrome process is free to use whatever implementation it likes though, and not all of these implementations allow for page-flipping. The GL layers implementation only uses a single buffer on the content side, and when this is updated, it is synchronously uploaded to the GPU on the chrome side (and the content has to wait). Thankfully, on Maemo and X11, there are extensions that make this very fast (EglLockSurface on Maemo, texture-from-pixmap on GL/X11), though it’s still quite a large, synchronous copy. On Android, this copy is very slow – we have no fast-path due to the API we need not currently being advertised (and possibly not implemented yet).
There are things that we could do to avoid this speed hit though. I thought, for example, we could use EGLImage (which, thankfully, is available on Android) and asynchronously update textures in a thread (or even in chunks in the main-loop). I still think this is a sound idea, but there are caveats. This would require, for example, that either we double-buffer, or we make the content process wait for the asynchronous update to complete. The latter would involve adding asynchronous shadow layer transactions. Not an easy task. If we double-buffer, we then double the system memory cost of storing a layer (and bear in mind, that layer is mirrored in graphics memory, so we’re talking 1.5 times the cost vs. basic layers). We also have to synchronise the updating of the layer coordinates with asynchronous updating to avoid what would otherwise be a huge and visible rendering glitch, and if we want the update to not be viewable while it’s happening, we have to double-buffer the layer’s texture too. We now have twice the memory cost we had before, and these tend to be quite large buffers!
Altogether, not an easy problem to solve. So I’ve given up for now. There are other, easier and less disrupting changes that can be made, that I’ll be trying out next. I’m disappointed that this didn’t pan out as I thought it would, but I’m pleased to have learnt something. I hope this is useful/interesting to someone.]]>