OffscreenCanvas, jobs, life

Hoo boy, it’s been a long time since I last blogged… About 2 and a half years! So, what’s been happening in that time? This will be a long one, so if you’re only interested in a part of it (and who could blame you), I’ve titled each section.

Leaving Impossible

Well, unfortunately my work with Impossible ended, as we essentially ran out of funding. That’s really a shame, we worked on some really cool, open-source stuff, and we’ve definitely seen similar innovations in the field since we stopped working on it. We took a short break (during which we also, unsuccessfully, searched for further funding), after which Rob started working on a cool, related project of his own that you should check out, and I, being a bit less brave, starting seeking out a new job. I did consider becoming a full-time musician, but business wasn’t picking up as quickly as I’d hoped it might in that down-time, and with hindsight, I’m glad I didn’t (Covid-19 and all).

I interviewed with a few places, which was certainly an eye-opening experience. The last ‘real’ job interview I did was for Mozilla in 2011, which consisted mainly of talking with engineers that worked there, and working through a few whiteboard problems. Being a young, eager coder at the time, this didn’t really phase me back then. Turns out either the questions have evolved or I’m just not quite as sharp as I used to be in that very particular environment. The one interview I had that involved whiteboard coding was a very mixed bag. It seemed a mix of two types of questions; those that are easy to answer (but unless you’re in the habit of writing very quickly on a whiteboard, slow to write down) and those that were pretty impossible to answer without specific preparation. Perhaps this was the fault of recruiters, but you might hope that interviews would be catered somewhat to the person you’re interviewing, or the work they might actually be doing, neither of which seemed to be the case? Unsurprisingly, I didn’t get past that interview, but in retrospect I’m also glad I didn’t. Igalia’s interview process was much more humane, and involved mostly discussions about actual work I’ve done, hypothetical situations and ethics. They were very long discussions, mind, but I’m very glad that they were happy to hire me, and that I didn’t entertain different possibilities. If you aren’t already familiar with Igalia, I’d highly recommend having a read about them/us. I’ve been there a year now, and the feeling is quite similar to when I first joined Mozilla, but I believe with Igalia’s structure, this is likely to stay a happier and safer environment. Not that I mean to knock Mozilla, especially now, but anyone that has worked there will likely admit that along with the giddy highs, there are also some unfortunate lows.

Igalia

I joined Igalia as part of the team that works on WebKit, and that’s what I’ve been doing since. It almost makes perfect sense in a way. Surprisingly, although I’ve spent overwhelmingly more time on Gecko, I did actually work with WebKit first while at OpenedHand, and for a short period at Intel. While celebrating my first commit to WebKit, I did actually discover it wasn’t my first commit at all, but I’d contributed a small embedding-related fix-up in 2008. So it’s nice to have come full-circle! My first work at Igalia was fixing up some patches that Žan Doberšek had prototyped to allow direct display of YUV video data via pixel shaders. Later on, I was also pleased to extend that work somewhat by fixing some vc3 driver bugs and GStreamer bugs, to allow for hardware decoding of YUV video on Raspberry Pi 3b (this, I believe, is all upstream at this point). WebKit Gtk and WPE WebKit may be the only Linux browser backends that leverage this pipeline, allowing for 1080p30 video playback on a Pi3b. There are other issues making this less useful than you might think, but either way, it’s a nice first achievement.

OffscreenCanvas

After that introduction, I was pointed at what could be fairly described as my main project, OffscreenCanvas. This was also a continuation of Žan’s work (he’s prolific!), though there has been significant original work since. This might be the part of this post that people find most interesting or relevant, but having not blogged in over 2 years, I can’t be blamed for waffling just a little. OffscreenCanvas is a relatively new web standard that allows the use of canvas API disconnected from the DOM, and within Workers. It also makes some provisions for asynchronously updated rendering, allowing canvas updates in Workers to bypass the main thread entirely and thus not be blocked by long-running processes on that thread. The most obvious use-case for this, and I think the most practical, is essentially non-blocking rendering of generated content. This is extremely handy for maps, for example. There are some other nice use-cases for this as well – you can, for example, show loading indicators that don’t stop animating while performing complex DOM manipulation, or procedurally generate textures for games, asynchronously. Any situation where you might want to do some long-running image processing without blocking the main thread (image editing also springs to mind).

Currently, the only complete implementation is within Blink. Gecko has a partial implementation that only supports WebGL contexts (and last time I tried, crashed the browser on creation…), but as far as I know, that’s it. I’ve been working on this, with encouragement and cooperation from Apple, on and off for the past year. In fact, as of August 12th, it’s even partially usable, though there is still a fair bit missing. I’ve been concentrating on the 2d context use-case, as I think it’s by far the most useful part of the standard. It’s at the point where it’s mostly usable, minus text rendering and minus some edge-case colour parsing. Asynchronous updates are also not yet supported, though I believe that’s fairly close for Linux. OffscreenCanvas is enabled with experimental features, for those that want to try it out.

My next goal, after asynchronous updates on Linux, is to enable WebGL context support. I believe these aren’t particularly tough goals, given where it is now, so hopefully they’ll happen by the end of the year. Text rendering is a much harder problem, but I hope that between us at Igalia and the excellent engineers at Apple, we can come up with a plan for it. The difficulty is that both styling and font loading/caching were written with the assumption that they’d run on just one thread, and that that thread would be the main thread. A very reasonable assumption in a pre-Worker and pre-many-core-CPU world of course, but increasingly less so now, and very awkward for this particular piece of work. Hopefully we’ll persevere though, this is a pretty cool technology, and I’d love to contribute to it being feasible to use widely, and lessen the gap between native and the web.

And that’s it from me. Lots of non-work related stuff has happened in the time since I last posted, but I’m keeping this post tech-related. If you want to hear more of my nonsense, I tend to post on Twitter a bit more often these days. See you in another couple of years 🙂

My Impossible Story

Keeping up my bi-yearly blogging cadence, I thought it might be fun to write about what I’ve been doing since I left Mozilla. It’s also a convenient time, as it coincides with our work being open-sourced and made public (and of course, developed in public, because otherwise what’s the point, right?) Somewhat ironically, I’ve been working on another machine-learning project, though I’m loathe to call it that, as it uses no neural networks so far, and most people I’ve encountered consider those to be synonymous. I did also go on a month’s holiday to the home of bluegrass music, but that’s a story for another post. I’m getting ahead of myself here.

Some time in March I met up with some old colleagues/friends and of course we all got to chatting about what we’re working on at the moment. As it happened, Rob had just started working at a company run by a friend of our shared former boss, Matthew Allum. What he was working on sounded like it would be a lot of fun, and I had to admit that I was a little jealous of the opportunity… But it so happened that they were looking to hire, and I was starting to get itchy feet, so I got to talk to Kwame Ferreira and one thing lead to another.

I started working for Impossible Labs in July, on an R&D project called ‘glimpse’. The remit for this work hasn’t always been entirely clear, but the pitch was that we’d be working on augmented reality technology to aid social interaction. There was also this video:

How could I resist?

What this has meant in real terms is that we’ve been researching and implementing a skeletal tracking system (think motion capture without any special markers/suits/equipment). We’ve studied Microsoft’s freely-available research on the skeletal tracking system for the Kinect, and filling in some of the gaps, implemented something that is probably very similar. We’ve not had much time yet, but it does work and you can download it and try it out now if you’re an adventurous Linux user. You’ll have to wait a bit longer if you’re less adventurous or you want to see it running on a phone.

I’ve worked mainly on implementing the tools and code to train and use the model we use to interpret body images and infer joint positions. My prior experience on the DeepSpeech team at Mozilla was invaluable to this. It gave me the prerequisite knowledge and vocabulary to be able to understand the various papers around the topic, and to realistically implement them. Funnily, I initially tried using TensorFlow for training, with the mind that it’d help us to easily train on GPUs. It turns out re-implementing it in native C was literally 1000x faster and allowed us to realistically complete training on a single (powerful) machine, in just a couple of days.

My take-away for this is that TensorFlow isn’t necessarily the tool for all machine-learning tasks, and also to make sure you analyse the graphs that it produces thoroughly and make sure you don’t have any obvious bottlenecks. A lot of TensorFlow nodes do not have GPU implementations, for example, and it’s very easy to absolutely kill performance by requiring frequent data transfers to happen between CPU and GPU. It’s also worth noting that a large graph has a huge amount of overhead that will be unrelated to the actual operations you’re trying to run. I’m no TensorFlow expert, but it’s definitely a particular tool for a particular job and it’s worth being careful. Experts can feel free to look at our repository history and tell me all the stupid mistakes I was making before we rewrote it 🙂

So what’s it like working at Impossible on a day-to-day basis? I think a picture says a thousand words, so here’s a picture of our studio:

Though I’ve taken this from the Impossible website, this is seriously what it looks like. There is actually a piano there, and it’s in tune and everything. There are guitars. We have a cat. There’s a tree. A kitchen. The roof is glass. As amazing as Mozilla (and many of the larger tech companies) offices are, this is really something else. I can’t overstate how refreshing an environment this is to be in, and how that impacts both your state of mind and your work. Corporations take note, I’ll take sunlight and life over snacks and a ball-pit any day of the week.

I miss my 3-day work-week sometimes. I do have less time for music than I had, and it’s a little harder to fit everything in. But what I’ve gained in exchange is a passion for my work again. This is code I’m pretty proud of, and that I think is interesting. I’m excited to see where it goes, and to get it into people’s hands. I’m hoping that other people will see what I see in it, if not now, sometime in the near future. Wish us luck!

Goodbye Mozilla

Today is effectively my last day at Mozilla, before I start at Impossible on Monday. I’ve been here for 6 years and a bit and it’s been quite an experience. I think it’s worth reflecting on, so here we go; Fair warning, if you have no interest in me or Mozilla, this is going to make pretty boring reading.

I started on June 6th 2011, several months before the (then new, since moved) London office opened. Although my skills lay (lie?) in user interface implementation, I was hired mainly for my graphics and systems knowledge. Mozilla was in the region of 500 or so employees then I think, and it was an interesting time. I’d been working on the code-base for several years prior at Intel, on a headless backend that we used to build a Clutter-based browser for Moblin netbooks. I wasn’t completely unfamiliar with the code-base, but it still took a long time to get to grips with. We’re talking several million lines of code with several years of legacy, in a language I still consider myself to be pretty novice at (C++).

I started on the mobile platform team, and I would consider this to be my most enjoyable time at the company. The mobile platform team was a multi-discipline team that did general low-level platform work for the mobile (Android and Meego) browser. When we started, the browser was based on XUL and was multi-process. Mobile was often the breeding ground for new technologies that would later go on to desktop. It wasn’t long before we started developing a new browser based on a native Android UI, removing XUL and relegating Gecko to page rendering. At the time this felt like a disappointing move. The reason the XUL-based browser wasn’t quite satisfactory was mainly due to performance issues, and as a platform guy, I wanted to see those issues fixed, rather than worked around. In retrospect, this was absolutely the right decision and lead to what I’d still consider to be one of Android’s best browsers.

Despite performance issues being one of the major driving forces for making this move, we did a lot of platform work at the time too. As well as being multi-process, the XUL browser had a compositor system for rendering the page, but this wasn’t easily portable. We ended up rewriting this, first almost entirely in Java (which was interesting), then with the rendering part of the compositor in native code. The input handling remained in Java for several years (pretty much until FirefoxOS, where we rewrote that part in native code, then later, switched Android over).

Most of my work during this period was based around improving performance (both perceived and real) and fluidity of the browser. Benoit Girard had written an excellent tiled rendering framework that I polished and got working with mobile. On top of that, I worked on progressive rendering and low precision rendering, which combined are probably the largest body of original work I’ve contributed to the Mozilla code-base. Neither of them are really active in the code-base at the moment, which shows how good a job I didn’t do maintaining them, I suppose.

Although most of my work was graphics-focused on the platform team, I also got to to do some layout work. I worked on some over-invalidation issues before Matt Woodrow’s DLBI work landed (which nullified that, but I think that work existed in at least one release). I also worked a lot on fixed position elements staying fixed to the correct positions during scrolling and zooming, another piece of work I was quite proud of (and probably my second-biggest contribution). There was also the opportunity for some UI work, when it intersected with platform. I implemented Firefox for Android’s dynamic toolbar, and made sure it interacted well with fixed position elements (some of this work has unfortunately been undone with the move from the partially Java-based input manager to the native one). During this period, I was also regularly attending and presenting at FOSDEM.

I would consider my time on the mobile platform team a pretty happy and productive time. Unfortunately for me, those of us with graphics specialities on the mobile platform team were taken off that team and put on the graphics team. I think this was the start in a steady decline in my engagement with the company. At the time this move was made, Mozilla was apparently trying to consolidate teams around products, and this was the exact opposite happening. The move was never really explained to me and I know I wasn’t the only one that wasn’t happy about it. The graphics team was very different to the mobile platform team and I don’t feel I fit in as well. It felt more boisterous and less democratic than the mobile platform team, and as someone that generally shies away from arguments and just wants to get work done, it was hard not to feel sidelined slightly. I was also quite disappointed that people didn’t seem particular familiar with the graphics work I had already been doing and that I was tasked, at least initially, with working on some very different (and very boring) desktop Linux work, rather than my speciality of mobile.

I think my time on the graphics team was pretty unproductive, with the exception of the work I did on b2g, improving tiled rendering and getting graphics memory-mapped tiles working. This was particularly hard as the interface was basically undocumented, and its implementation details could vary wildly depending on the graphics driver. Though I made a huge contribution to this work, you won’t see me credited in the tree unfortunately. I’m still a little bit sore about that. It wasn’t long after this that I requested to move to the FirefoxOS systems front-end team. I’d been doing some work there already and I’d long wanted to go back to doing UI. It felt like I either needed a dramatic change or I needed to leave. I’m glad I didn’t leave at this point.

Working on FirefoxOS was a blast. We had lots of new, very talented people, a clear and worthwhile mission, and a new code-base to work with. I worked mainly on the home-screen, first with performance improvements, then with added features (app-grouping being the major one), then with a hugely controversial and probably mismanaged (on my part, not my manager – who was excellent) rewrite. The rewrite was good and fixed many of the performance problems of what it was replacing, but unfortunately also removed features, at least initially. Turns out people really liked the app-grouping feature.

I really enjoyed my time working on FirefoxOS, and getting a nice clean break from platform work, but it was always bitter-sweet. Everyone working on the project was very enthusiastic to see it through and do a good job, but it never felt like upper management’s focus was in the correct place. We spent far too much time kowtowing to the desires of phone carriers and trying to copy Android and not nearly enough time on basic features and polish. Up until around v2.0 and maybe even 2.2, the experience of using FirefoxOS was very rough. Unfortunately, as soon as it started to show some promise and as soon as we had freedom from carriers to actually do what we set out to do in the first place, the project was cancelled, in favour of the whole Connected Devices IoT debacle.

If there was anything that killed morale for me more than my unfortunate time on the graphics team, and more than having FirefoxOS prematurely cancelled, it would have to be the Connected Devices experience. I appreciate it as an opportunity to work on random semi-interesting things for a year or so, and to get some entrepreneurship training, but the mismanagement of that whole situation was pretty epic. To take a group of hundreds of UI-focused engineers and tell them that, with very little help, they should organised themselves into small teams and create IoT products still strikes me as an idea so crazy that it definitely won’t work. Certainly not the way we did it anyway. The idea, I think, was that we’d be running several internal start-ups and we’d hopefully get some marketable products out of it. What business a not-for-profit company, based primarily on doing open-source, web-based engineering has making physical, commercial products is questionable, but it failed long before that could be considered.

The process involved coming up with an idea, presenting it and getting approval to run with it. You would then repeat this approval process at various stages during development. It was, however, very hard to get approval for enough resources (both time and people) to finesse an idea long enough to make it obviously a good or bad idea. That aside, I found it very demoralising to not have the opportunity to write code that people could use. I did manage it a few times, in spite of what was happening, but none of this work I would consider myself particularly proud of. Lots of very talented people left during this period, and then at the end of it, everyone else was laid off. Not a good time.

Luckily for me and the team I was on, we were moved under the umbrella of Emerging Technologies before the lay-offs happened, and this also allowed us to refocus away from trying to make an under-featured and pointless shopping-list assistant and back onto the underlying speech-recognition technology. This brings us almost to present day now.

The DeepSpeech speech recognition project is an extremely worthwhile project, with a clear mission, great promise and interesting underlying technology. So why would I leave? Well, I’ve practically ended up on this team by a series of accidents and random happenstance. It’s been very interesting so far, I’ve learnt a lot and I think I’ve made a reasonable contribution to the code-base. I also rewrote python_speech_features in C for a pretty large performance boost, which I’m pretty pleased with. But at the end of the day, it doesn’t feel like this team will miss me. I too often spend my time finding work to do, and to be honest, I’m just not interested enough in the subject matter to make that work long-term. Most of my time on this project has been spent pushing to open it up and make it more transparent to people outside of the company. I’ve added model exporting, better default behaviour, a client library, a native client, Python bindings (+ example client) and most recently, Node.js bindings (+ example client). We’re starting to get noticed and starting to get external contributions, but I worry that we still aren’t transparent enough and still aren’t truly treating this as the open-source project it is and should be. I hope the team can push further towards this direction without me. I think it’ll be one to watch.

Next week, I start working at a new job doing a new thing. It’s odd to say goodbye to Mozilla after 6 years. It’s not easy, but many of my peers and colleagues have already made the jump, so it feels like the right time. One of the big reasons I’m moving, and moving to Impossible specifically, is that I want to get back to doing impressive work again. This is the largest regret I have about my time at Mozilla. I used to blog regularly when I worked at OpenedHand and Intel, because I was excited about the work we were doing and I thought it was impressive. This wasn’t just youthful exuberance (he says, realising how ridiculous that sounds at 32), I still consider much of the work we did to be impressive, even now. I want to be doing things like that again, and it feels like Impossible is a great opportunity to make that happen. Wish me luck!

Free Ideas for UI Frameworks, or How To Achieve Polished UI

Ever since the original iPhone came out, I’ve had several ideas about how they managed to achieve such fluidity with relatively mediocre hardware. I mean, it was good at the time, but Android still struggles on hardware that makes that look like a 486… It’s absolutely my fault that none of these have been implemented in any open-source framework I’m aware of, so instead of sitting on these ideas and trotting them out at the pub every few months as we reminisce over what could have been, I’m writing about them here. I’m hoping that either someone takes them and runs with them, or that they get thoroughly debunked and I’m made to look like an idiot. The third option is of course that they’re ignored, which I think would be a shame, but given I’ve not managed to get the opportunity to implement them over the last decade, that would hardly be surprising. I feel I should clarify that these aren’t all my ideas, but include a mix of observation of and conjecture about contemporary software. This somewhat follows on from the post I made 6 years ago(!) So let’s begin.

1. No main-thread UI

The UI should always be able to start drawing when necessary. As careful as you may be, it’s practically impossible to write software that will remain perfectly fluid when the UI can be blocked by arbitrary processing. This seems like an obvious one to me, but I suppose the problem is that legacy makes it very difficult to adopt this at a later date. That said, difficult but not impossible. All the major web browsers have adopted this policy, with caveats here and there. The trick is to switch from the idea of ‘painting’ to the idea of ‘assembling’ and then using a compositor to do the painting. Easier said than done of course, most frameworks include the ability to extend painting in a way that would make it impossible to switch to a different thread without breaking things. But as long as it’s possible to block UI, it will inevitably happen.

2. Contextually-aware compositor

This follows on from the first point; what’s the use of having non-blocking UI if it can’t respond? Input needs to be handled away from the main thread also, and the compositor (or whatever you want to call the thread that is handling painting) needs to have enough context available that the first response to user input doesn’t need to travel to the main thread. Things like hover states, active states, animations, pinch-to-zoom and scrolling all need to be initiated without interaction on the main thread. Of course, main thread interaction will likely eventually be required to update the view, but that initial response needs to be able to happen without it. This is another seemingly obvious one – how can you guarantee a response rate unless you have a thread dedicated to responding within that time? Most browsers are doing this, but not going far enough in my opinion. Scrolling and zooming are often catered for, but not hover/active states, or initialising animations (note; initialising animations. Once they’ve been initialised, they are indeed run on the compositor, usually).

3. Memory bandwidth budget

This is one of the less obvious ideas and something I’ve really wanted to have a go at implementing, but never had the opportunity. A problem I saw a lot while working on the platform for both Firefox for Android and FirefoxOS is that given the work-load of a web browser (which is not entirely dissimilar to the work-load of any information-heavy UI), it was very easy to saturate memory bandwidth. And once you saturate memory bandwidth, you end up having to block somewhere, and painting gets delayed. We’re assuming UI updates are asynchronous (because of course – otherwise we’re blocking on the main thread). I suggest that it’s worth tracking frame time, and only allowing large asynchronous transfers (e.g. texture upload, scaling, format transforms) to take a certain amount of time. After that time has expired, it should wait on the next frame to be composited before resuming (assuming there is a composite scheduled). If the composited frame was delayed to the point that it skipped a frame compared to the last unladen composite, the amount of time dedicated to transfers should be reduced, or the transfer should be delayed until some arbitrary time (i.e. it should only be considered ok to skip a frame every X ms).

It’s interesting that you can see something very similar to this happening in early versions of iOS (I don’t know if it still happens or not) – when scrolling long lists with images that load in dynamically, none of the images will load while the list is animating. The user response was paramount, to the point that it was considered more important to present consistent response than it was to present complete UI. This priority, I think, is a lot of the reason the iPhone feels ‘magic’ and Android phones felt like junk up until around 4.0 (where it’s better, but still not as good as iOS).

4. Level-of-detail

This is something that I did get to partially implement while working on Firefox for Android, though I didn’t do such a great job of it so its current implementation is heavily compromised from how I wanted it to work. This is another idea stolen from game development. There will be times, during certain interactions, where processing time will be necessarily limited. Quite often though, during these times, a user’s view of the UI will be compromised in some fashion. It’s important to understand that you don’t always need to present the full-detail view of a UI. In Firefox for Android, this took the form that when scrolling fast enough that rendering couldn’t keep up, we would render at half the resolution. This let us render more, and faster, giving the impression of a consistent UI even when the hardware wasn’t quite capable of it. I notice Microsoft doing similar things since Windows 8; notice how the quality of image scaling reduces markedly while scrolling or animations are in progress. This idea is very implementation-specific. What can be dropped and what you want to drop will differ between platforms, form-factors, hardware, etc. Generally though, some things you can consider dropping: Sub-pixel anti-aliasing, high-quality image scaling, render resolution, colour-depth, animations. You may also want to consider showing partial UI if you know that it will very quickly be updated. The Android web-browser during the Honeycomb years did this, and I attempted (with limited success, because it’s hard…) to do this with Firefox for Android many years ago.

Pitfalls

I think it’s easy to read ideas like this and think it boils down to “do everything asynchronously”. Unfortunately, if you take a naïve approach to that, you just end up with something that can be inexplicably slow sometimes and the only way to fix it is via profiling and micro-optimisations. It’s very hard to guarantee a consistent experience if you don’t manage when things happen. Yes, do everything asynchronously, but make sure you do your book-keeping and you manage when it’s done. It’s not only about splitting work up, it’s about making sure it’s done when it’s smart to do so.

You also need to be careful about how you measure these improvements, and to be aware that sometimes results in synthetic tests will even correlate to the opposite of the experience you want. A great example of this, in my opinion, is page-load speed on desktop browsers. All the major desktop browsers concentrate on prioritising the I/O and computation required to get the page to 100%. For heavy desktop sites, however, this means the browser is often very clunky to use while pages are loading (yes, even with out-of-process tabs – see the point about bandwidth above). I highlight this specifically on desktop, because you’re quite likely to not only be browsing much heavier sites that trigger this behaviour, but also to have multiple tabs open. So as soon as you load a couple of heavy sites, your entire browsing experience is compromised. I wouldn’t mind the site taking a little longer to load if it didn’t make the whole browser chug while doing so.

Don’t lose sight of your goals. Don’t compromise. Things might take longer to complete, deadlines might be missed… But polish can’t be overrated. Polish is what people feel and what they remember, and the lack of it can have a devastating effect on someone’s perception. It’s not always conscious or obvious either, even when you’re the developer. Ask yourself “Am I fully satisfied with this” before marking something as complete. You might still be able to ship if the answer is “No”, but make sure you don’t lose sight of that and make sure it gets the priority it deserves.

One last point I’ll make; I think to really execute on all of this, it requires buy-in from everyone. Not just engineers, not just engineers and managers, but visual designers, user experience, leadership… Everyone. It’s too easy to do a job that’s good enough and it’s too much responsibility to put it all on one person’s shoulders. You really need to be on the ball to produce the kind of software that Apple does almost routinely, but as much as they’d say otherwise, it isn’t magic.