I’m starting to type this up as EdgeConf draws to a close. I spoke on the performance panel, with Shane O’Sullivan, Rowan Beentje and Pavel Feldman, moderated by Matt Delaney, and tried to bring a platform perspective to the affair. I found the panel very interesting, and it reminded me how little I know about the high-level of web development. Similarly, I think it also highlighted how little consideration there usually is for the platform when developing for the web. On the whole, I think that’s a good thing (platform details shouldn’t be important, and they have a habit of changing), but a little platform knowledge can help in structuring things in a way that will be fast today, and as long as it isn’t too much of a departure from your design, it doesn’t hurt to think about it. At one point in the panel, I listed a few things that are particularly slow from a platform perspective today. While none of these were intractable problems, they may not be fixed in the near future and feedback indicated that they aren’t all common knowledge. So what follows are a few things to avoid, and a few things to do that will help make your pages scroll smoothly on both desktop and mobile. I’m going to try not to write lies, but I hope if I get anything slightly or totally wrong, that people will correct me in the comments and I can update the post accordingly 🙂
When I mentioned this at the conference, I prefaced it with a quick explanation of how rendering a web page works. It’s probably worth reiterating this. After network and such have happened and the DOM tree has been created, this tree gets translated into what we call the frame tree. This tree is similar to the DOM tree, but it’s structured in a way that better represents how the page will be drawn. This tree is then iterated over and the size and position of these frames are calculated. The act of calculating these positions and sizes is referred to as reflow. Once reflow is done, we translate the frame tree into a display list (other engines may skip this step, but it’s unimportant), then we draw the display list into layers. Where possible, we keep layers around and only redraw parts that have changed/newly become visible.
Really, reflow is actually quite fast, or at least it can be, but it often forces things to be redrawn (and drawing is often slow). Reflow happens when the size or position of things changes in such a way that dependent positions and sizes of elements need to be recalculated. Reflow usually isn’t something that will happen to the entire page at once, but incautious structuring of the page can result in this. There are quite a few things you can do to help avoid large reflows; set widths and heights to absolute values where possible, don’t reposition or resize things, don’t unnecessarily change the style of things. Obviously these things can’t always be avoided, but it’s worth thinking if there are other ways to achieve the result you want that don’t force reflow. If positions of things must be changed, consider using a CSS translate transform, for example – transforms don’t usually cause reflow.
This sounds silly, but you should really only make the browser do as little drawing as is absolutely necessary. Most of the time, drawing will happen on reflow, when new content appears on the screen and when style changes. Some practical advice to avoid this would be to avoid making DOM changes near the root of the tree, avoid changing the size of things and avoid changing text (text drawing is especially slow). While repositioning doesn’t always force redrawing, you can ensure this by repositioning using CSS translate transforms instead of top/left/bottom/right style properties. Especially avoid causing redraws to happen while the user is scrolling. Browsers try their hardest to keep up the refresh rate while scrolling, but there are limits on memory bandwidth (especially on mobile), so every little helps.
Thinking of things that are slow to draw, radial gradients are very slow. This is really just a bug that we should fix, but if you must use CSS radial gradients, try not to change them, or put them in the background of elements that frequently change.
Avoid unnecessary layers
One of the reasons scrolling can be fast at all on mobile is that we reduce the page to a series of layers, and we keep redrawing on these layers down to a minimum. When we need to redraw the page, we just paste these layers that have already been drawn. While the GPU is pretty great at this, there are limits. Specifically, there is a limit to the amount of pixels that can be drawn on the screen in a certain time (fill-rate) – when you draw to the same pixel multiple times, this is called overdraw, and counts towards the fill-rate. Having lots of overlapping layers often causes lots of overdraw, and can cause a frame’s maximum fill-rate to be exceeded.
This is all well and good, but how does one avoid layers at a high level? It’s worth being vaguely aware of what causes stacking contexts to be created. While layers usually don’t exactly correspond to stacking contexts, trying to reduce stacking contexts will often end up reducing the number of resulting layers, and is a reasonable exercise. Even simpler, anything with position: fixed, background-attachment: fixed or any kind of CSS transformed element will likely end up with its own layer, and anything with its own layer will likely force a layer for anything below it and anything above it. So if it’s not necessary, avoid those if possible.
What can we do at the platform level to mitigate this? Firefox already culls areas of a layer that are made inaccessible by occluding layers (at least to some extent), but this won’t work if any of the layers end up with transforms, or aren’t opaque. We could be smarter with culling for opaque, transformed layers, and we could likely do a better job of determining when a layer is opaque. I’m pretty sure we could be smarter about the culling we already do too.
Another thing that slows down drawing is blending. This is when the visual result of an operation relies on what’s already there. This requires the GPU (or CPU) to read what’s already there and perform a calculation on the result, which is of course slower than just writing directly to the buffer. Blending also doesn’t interact well with deferred rendering GPUs, which are popular on mobile.
This alone isn’t so bad, but combining it with text rendering is not so great. If you have text that isn’t on a static, opaque background, that text will be rendered twice (on desktop at least). First we render it on white, then on black, and we use those two buffers to maintain sub-pixel anti-aliasing as the background varies. This is much slower than normal, and also uses up more memory. On mobile, we store opaque layers in 16-bit colour, but translucent layers are stored in 32-bit colour, doubling the memory requirement of a non-opaque layer.
We could be smarter about this. At the very least, we could use multi-texturing and store non-opaque layers as a 16-bit colour + 8-bit alpha, cutting the memory requirement by a quarter and likely making it faster to draw. Even then, this will still be more expensive than just drawing an opaque layer, so when possible, make sure any text is on top of a static, opaque background when possible.
Avoid overflow scrolling
The way we make scrolling fast on mobile, and I believe the way it’s fast in other browsers too, is that we render a much larger area than is visible on the screen and we do that asynchronously to the user scrolling. This works as the relationship between time and size of drawing is not linear (on the whole, the more you draw, the cheaper it is per pixel). We only do this for the content document, however (not strictly true, I think there are situations where whole-page scrollable elements that aren’t the body can take advantage of this, but it’s best not to rely on that). This means that any element that isn’t the body that is scrollable can’t take advantage of this, and will redraw synchronously with scrolling. For small, simple elements, this doesn’t tend to be a problem, but if your entire page is in an iframe that covers most or all of the viewport, scrolling performance will likely suffer.
On desktop, currently, drawing is synchronous and we don’t buffer area around the page like on mobile, so this advice doesn’t apply there. But on mobile, do your best to avoid using iframes or having elements that have overflow that aren’t the body. If you’re using overflow to achieve a two-panel layout, or something like this, consider using position:fixed and margins instead. If both panels must scroll, consider making the largest panel the body and using overflow scrolling in the smaller one.
I hope we’ll do something clever to fix this sometime, it’s been at the back of my mind for quite a while, but I don’t think scrolling on sub-elements of the page can ever really be as good as the body without considerable memory cost.
Take advantage of the platform
This post sounds all doom and gloom, but I’m purposefully highlighting what we aren’t yet good at. There are a lot of things we are good at (or reasonable, at least), and having a fast page need not necessarily be viewed as lots of things to avoid, so much as lots of things to do.
Although computing power continues to increase, the trend now is to bolt on more cores and more hardware threads, and the speed increase of individual cores tends to be more modest. This affects how we improve performance at the application level. Performance increases, more often than not, are about being smarter about when we do work, and to do things concurrently, more than just finding faster algorithms and micro-optimisation.
This relates to the asynchronous scrolling mentioned above, where we do the same amount of work, but at a more opportune time, and in a way that better takes advantage of the resources available. There are other optimisations that are similar with regards to video decoding/drawing, CSS animations/transitions and WebGL buffer swapping. A frequently occurring question at EdgeConf was whether it would be sensible to add ‘hints’, or expose more internals to web developers so that they can instrument pages to provide the best performance. On the whole, hints are a bad idea, as they expose platform details that are liable to change or be obsoleted, but I think a lot of control is already given by current standards.
I hope some of this is useful to someone. I’ll try to write similar posts if I find out more, or there are significant platform changes in the future. I deliberately haven’t mentioned profiling tools, as there are people far more qualified to write about them than I am. That said, there’s a wiki page about the built-in Firefox profiler, some nice documentation on Opera’s debugging tools and Chrome’s tools look really great too.