WebKit frame delivery

Part of my work on WebKit at Igalia is keeping an eye on performance, especially when it concerns embedded devices. In my experience, there tend to be two types of changes that cause the biggest performance gains – either complete rewrites (e.g. WebRender for Firefox) or tiny, low-hanging fruit that get overlooked because the big picture is so multi-faceted and complex (e.g. making unnecessary buffer copies, removing an errant sleep(5) call). I would say that it’s actually the latter bug that’s harder to diagnose and fix, even though the code produced at the end of it tends to be much smaller. What I’m writing about here falls firmly into the latter group.

For a good while now, Alejandro GarcĂ­a has been producing internal performance reports for WPE WebKit. A group of us will gather and do some basic testing of WPE WebKit on the same hardware – a Raspberry Pi 3B+. This involves running things like MotionMark and going to popular sites like YouTube and Google Maps and noting how well they work. We do this periodically and it’s a great help in tracking down regressions or obvious, user-facing issues. Another part of it is monitoring things like memory and CPU usage during this testing. Alex noted that we have a lot of idle CPU time during benchmarks, and at the same time our benchmark results fall markedly behind Chromium. Some of this is expected, Chromium has a more advanced rendering architecture that better makes use of the GPU, but a well-written benchmark should certainly be close to saturating the CPU and we certainly have CPU time to spare to improve our results. Based on this idea, Alex crafted a patch that would pre-emptively render frames (this isn’t quite what it did, but for the sake of brevity, that’s how I’m going to describe it). This showed significant improvements in MotionMark results and proved at the very least that it was legitimately possible to improve our performance in this synthetic test without any large changes.

This patch couldn’t land as it was due to concerns with it subtly changing the behaviour of how requestAnimationFrame callbacks work, but the idea was sound and definitely pointed towards an area where there may be a relatively small change that could have a large effect. This laid the groundwork for us to dig deeper and debug what was really happening here. Being a fan of computer graphics and video games, frame delivery/cadence is quite a hot topic in that area. I had the idea that something was unusual, but the code that gets a frame to the screen in WebKit is spread across many classes (and multiple threads) and isn’t so easy to reason about. On the other hand, it isn’t so hard to write a tool to analyse it from the client side and this would also give us a handy comparison with other browsers too.

So I went ahead and wrote a tool that would help us analyse exactly how frames are delivered from the client side, including under load, and the results were illuminating. The tool visualises the time between frames, the time between requesting a frame and receiving the corresponding callback and the time passed between performance.now() called at the start of a callback and the timestamp received as the callback parameter. To simulate ‘load’, it uses performance.now() and waits until the given amount of time has elapsed since the start of the callback before returning control to the main loop. If you want to sustain a 60fps update, you have about 16ms to finish whatever you’re doing in your requestAnimationFrame callback. If you exceed that, you won’t be able to maintain 60fps. Here’s Firefox under a 20ms load:

Firefox frame-times under a 20ms rendering load

And here’s Chrome under the same parameters:

Firefox frame-times under a 20ms rendering load

Now here’s WPE WebKit, using the Wayland backend, without any patches:

WebKit/WPE Wayland under a 20ms rendering load

One of these graphs does not look like the others, right? We can immediately see that when a frame exceeds the 16.67ms/60fps rendering budget in WebKit/WPE under the Wayland backend, that it hard drops to 30fps. Other browsers don’t wait for a vsync to kick off rendering work and so are able to achieve frame-rates between 30fps and 60fps, when measured over multiple frames (all of these browsers are locked to the screen refresh, there is no screen tearing present). The other noticeable thing in these is that the green line is missing on the WebKit test – this shows that the timestamp delivered to the frame callback is exactly the same as performance.now(), where as the timestamps in both Chrome and Firefox appear to be the time of the last vsync. This makes sense from an animation point of view and would mean that animations that are timed using this timestamp would move at a rate that’s more consistent with the screen refresh when under load.

What I’m omitting from this write-up is the gradual development of this tool, the subtle and different behaviour of different WebKit backends and the huge amount of testing and instrumentation that was required to reach these conclusions. I also wrote another tool to visualise the WebKit render pipeline, which greatly aided in writing potential patches to fix these issues, but perhaps that’s the topic of another blog post for another day.

In summary, there are two identified bugs, or at least, major differences in behaviour with other browsers here, both of which affect both fluidity and synthetic test performance. I’m not too concerned about the latter, but that’s a hard sell to a potential client that’s pointing at concrete numbers that say WebKit is significantly worse than some competing option. The first bug is that if a frame goes over budget and we miss a screen refresh (a vsync signal), we wait for the next one before kicking off rendering again. This is what causes the hard drop from 60fps to 30fps. As it concerns Linux, this only affects the Wayland WPE backend because that’s the only backend that implements vsync signals fully, so this doesn’t affect GTK or other WPE backends. The second bug, which is less of a bug, as a reading of the spec (steps 9 and 11.10) would certainly indicate that WebKit is doing the correct thing here, is that the timestamp given to requestAnimationFrame callbacks is the current time and not the vsync time as it is in other browsers (which makes more sense for timing animations). I have patches for both of these issues and they’re tracked in bug 233312 and bug 238999.

With these two issues fixed, this is what the graph looks like.

Patched WebKit/WPE with a 20ms rendering load

And another nice consequence is that MotionMark 1.2 has gone from:

WebKit/WPE MotionMark 1.2 results

to:

Patched WebKit/WPE MotionMark 1.2 results

Much better 🙂

No ETA on these patches landing; perhaps I’ve drawn some incorrect conclusions, or done something in a way that won’t work long term, or is wrong in some fashion that I’m not aware of yet. Also, this will most affect users of WPE/Wayland, so don’t get too excited if you’re using GNOME Web or similar. Fingers crossed though! A huge thanks to Alejandro GarcĂ­a who worked with me on this and did an awful lot of debugging and testing (as well as the original patch that inspired this work).