State of Firefox Mobile platform

It’s been way too long since I last blogged, so here’s something to try and get into the mood again. I have a few other things I’d like to write about, but work here at Mozilla is probably the most pressing one at the moment, so the others will have to wait.

Those who have been following Nightly, and/or who attended our talk at FOSDEM would be aware that we’re currently rewriting the Android version of Firefox Mobile. The major part of this change is that we’re now a ‘native’ Android app, rather than a XUL app. Consequently, we fit in and behave much better than the older version of Firefox Mobile, and we get a few perks too; namely start-up performance.

Alongside this new version of Firefox Mobile, we’ve also taken the opportunity to overhaul the platform side of things. Where as the old version was multi-process, we’ve now switched to a multi-threaded application model, with the input, content processing/rendering and UI rendering all taking separate threads. Just recently, we switched from a Java-based view compositor to a native-code off-main-thread-compositor (or OMTC, for short). While previously we drew the entire page into a buffer (with extra bits around the edges) and composited that, we now directly composite the layers that make up the page.

This gives us the speed and ease of using a single process (to some extent), but with the power afforded to us by an asynchronous rendering process. Most of what I said previously about shadow layers still applies, just replace ‘process’ with ‘thread’. Most of the problems still apply too, but the graphics team has put in some phenomenal work in fixing a lot of it.

So we’re now working our figurative nuts off fixing all the bugs that have cropped up, but I do believe we’re on the road to success. This rendering model gives us the power to retain a lot more content than before, and any saved drawing tends to translate into massive benefits on mobile – we appear to be mostly memory-bandwidth limited, so every little helps. What this translates to for users is smooth, 60fps updates, excellent interactive performance, excellent web standards support and a polished, ‘native’ feeling application. You can sample some of this work by downloading a Nightly build, it’s *almost* at the point of being a daily driver (not quite, but almost).

Maybe you want to help? Running a nightly build and providing feedback (either via the built-in methods, or by filing bugs) is a great way to start, and all the mobile developers hang out on IRC too, in #mobile. We’ve also just opened a new office in London, and I do believe we’re hiring, so if you think you’d like to get in on some of this, do contact us!

Update: The built-in method for feedback is far less obvious than I thought… You can access it via the ‘About’ page, (accessible via the settings menu), then following the links ‘Support’, ‘Ask a Question’, and finally ‘Give us feedback’. I think we should make this better, so I’ve filed a bug.

]]>

Accelerated layer-rendering, and learning by (some) success

Perhaps the title of my last blog post seemed a little negative, so I wanted to write on this topic again, on some of the things I’ve learnt since then, and some of the success I’ve had since then too. Failure was probably too strong a word, but better to be too negative than too positive about these things, especially when surrounded by the amazing talent there is at Mozilla…

I finished off previously by saying there are other, easier problems to solve, and I think I’m making some decent progress in those areas. I described before how shadow layers work, and how the chrome process can use GL-accelerated layer compositing, but the content process is always restricted to basic (unaccelerated) layers. This introduces the bottleneck of getting the image data from system memory to video memory. I was probably over-zealous in my previous approach. While asynchronous updates would be great, we could try to minimise those updates first. This is almost certainly something we should be doing anyway.

One of the ways we do this is by something that (confusingly) gets called ‘rotation’, in the source code. I mentioned scrolling before. To reiterate, we render to a larger buffer than is visible on the screen and when panning, we move that buffer and ask the content process to re-render the newly exposed area. We then update again when that’s finished. Hopefully, that happens quickly, but when it doesn’t, you may see some checker-boarding. When the content process re-renders, theoretically it only needs to re-render the newly exposed pixels, as it already has the rest of the page rendered. This could involve copying all the existing pixels upwards (assuming we’re scrolling downwards) and then rendering in the newly exposed area, but instead of doing this, we say ‘the origin of this buffer is now at these coordinates’, and we treat the buffer as if it wrapped around (thus ‘rotation’).

There are problems with this, however. For example, if you were to zoom into a rotated buffer whose rotation coordinates are visible on the screen, you may see a ‘seam’ at that position. Similarly, when re-using the existing pixels on the buffer, if the new scroll coordinates meant that the sample grid is no longer aligned with the previous sample grid, you may see odd artifacts on scaled images and text that was cut-off in a previous render. The following example demonstrates this:


The results of a misaligned sample grid

On the left is the original image (a checkerboard, purposefully chosen as it’s sort-of a worst case scenario), and on the right, the same image with a 1-pixel border added on the left and upper edges. They both have the same, bi-linear scale applied to them, and the border is then cropped on the right image. You can immediately see that the same image does not result, and putting them together draws extra attention to this. This is what happens when you try to combine the results of two sampling image operations that have misaligned sample grids.

The code makes some attempt at separating out situations where this will happen and marking them so that in those situations, rotation doesn’t occur and the entire buffer is re-rendered. I don’t know what assumptions you can make about cairo’s sampling, or indeed how we drive it to draw pages, but certainly this code is over-zealous with marking when resampling will occur. For example, we zoom pages to fit the width of the screen by default. And any zoom operation marks the surface to say it will be resampled. We also update the content process’s scroll co-ordinates every 20 pixels. So, for the overwhelmingly common case, we re-render the entire buffer every 20 pixels. On a dual-core (or more) machine, assuming your cores aren’t saturated, this doesn’t matter so much without hardware-acceleration, as the chrome process oughtn’t be affected by what’s happening in the content process, and when it finishes, it just does a simple page-flip anyway. Unfortunately, this isn’t the case in practice, I guess due to the memory bandwidth required to re-render such a large surface, and perhaps due to non-ideal scheduling (remember, these are guesses. I’ve been terribly lazy when it comes to testing these theories).

Even more unfortunately, this is a terrible hit for GL-accelerated layers, as we don’t do page-swapping, we do synchronous buffer uploads. Also, the default shadow layer size is 200%x300% of the visible area. So let’s say you have 1280×752 pixels visible (as is the case on a 1280×800 Honeycomb tablet), every 20 pixels you scroll you’re doing a synchronous, 9.4mb upload from system memory to ‘video memory’ (I put this in quotes, as I don’t want to go down the path of explaining shared memory architecture and how it ends up working on android. It would be long and I’d probably be wrong). Even worse, most android devices have a maximum texture size of 2048×2048, so we have to tile these textures – so you’re then splitting up these uploads, with texture-binds in-between, making it even slower.

Well, you might then say, “well at least in some cases you’ll still get the benefit of rotation, right?” Well, unfortunately, you’d be wrong. We disable buffer rotation entirely on shadow layers. So we have a number of problems here. I discovered this when I noticed how frequently we were doing whole-buffer updates, both on GL and software. The first thing I thought was to just disable the marking of possibly-resampling surfaces (you can do this by either not setting PAINT_WILL_RESAMPLE in BasicLayers.cpp, or ignoring it in ThebesLayerBuffer.cpp – you’ll notice that it checks MustRetainContent, which returns TRUE for shadow layers). This ought to get you the benefit of rotation, at the expense of some visible artifacing. The bug for enabling buffer rotation is here. But then I ran into this bug, which I fixed. This gets you buffer rotation being used more frequently with software rendering, but when using hardware acceleration, things now appear very broken. Doubly-so if you use tiles.

So next, I investigated why things were broken when using hardware acceleration. First was to alter desktop to use tiles. After doing this, and picking a small tile-size, I noticed that a lot of drawing was then broken. This ended up being this bug, which I fixed. Now more of the screen is visible, but rotation is still broken. This ended up being a two-fold problem. The first being that we don’t handle uploads to GL layers correctly when there’s rotation, and the second being we don’t handle rendering of rotated GL layers when we have tiles. I fix both of these in this bug.

So after hacking away the resampling check and fixing the various rendering bugs that rotation then exposes, you can see the benefit it would get you. Unfortunately, there’s still a lot of work to be done, and even when this works perfectly, it isn’t going to benefit all situations (we could still do with a fast-path for texture upload on android, and asynchronous updates or page-flipping). But on some sites (my own, for example, and my favourite test-site, engadget.com), the difference is pretty big. So four bugs fixed and a deeper knowledge of how layers are put together, I count this one as a success 🙂

]]>

Desktop Summit 2011 Thoughts

Another year, another great desktop summit. This year I went courtesy of Mozilla, and I’m very grateful they deemed it worthwhile. Having been though, I think attendance of events like this is invaluable for open-source hackers. Not only for the chance to present your work and attend talks, not only for the numerous networking opportunities, but purely for the inspiration. Every time I attend Desktop Summit/Guadec/FOSDEM, I never fail to come away with new ideas and fresh inspiration to hopefully do more and better work in the future.

Some great talks this year, though I won’t go into naming them as the list would be too long and I’d likely leave some out. One of the things that really left an impression though, was one of the things that I think perhaps was missing slightly. On the way to the beach party, I met some Spanish KDE users who were also on their way (and props to the KDE community by the way, you guys know how to party!) They said it was their first conference like this and they just came to see what it was like. They’d noticed that the summit was very developer-centric though. This got me to thinking, why is this?

Certainly, I wouldn’t argue for a complete change of focus, as as a developer, as I mentioned, I find these things invaluable. On the other hand, perhaps we ought to do more to include our users? Guadec does stand for the Gnome *Users* and Developers European Conference, after all. I think we’ve done a lot more to be inclusive of the non-programming parts of Gnome development (UX/visual design, documentation, community management, distribution) over the years, but maybe we need to extend that effort and start targeting users who haven’t yet begun contributing.

With that in mind, I have a few ideas to help include users more in the future:

  • High-level feature talks – We could have talks that deal with new features of applications, the desktop, maybe even libraries, but at a high level. Less jargon, more screenshots, videos and demonstrations. It’s easy for a developer to see what the new latest features of Gnome are, as they can just check it out, build it, fix the inevitable problems with that build and try it out. I think it might be interesting and fun to prepare talks that are purely high-level presentations and demonstrations. Off of the back of that, you’d perhaps get more people interested in the project.
  • Beginners tutorials – We could run beginner classes on using, and perhaps developing the Gnome desktop environment, but aimed at people with little to no experience. This is pretty difficult of course, but then I’ve never seen the Gnome community fail to rise to and conquer a difficult challenge. Maybe a beginners guide to writing a Gnome Shell extension in JavaScript, or setting up a JavaScript development environment. Perhaps a beginners guide to establishing a useful work-flow in Gnome 3, for common tasks like document editing or web-browsing. Even more useful perhaps, a beginners guide on filing useful bugs?
  • Install-fests – I ran this idea by Emmanuele Bassi, and he brought up the very good point that it’s hard to find the resources to run things like this. I also get the feeling that until we have more people interested, more general members of the public and novice users, this may be quite poorly attended. Still an idea to think about though.
While these ideas may be of limited use and I might be completely wrong, I do think getting more involved with our users at events like this could benefit us. Taking the main theme of Dirk’s keynote this year, we should probably be making a greater effort to listen to our users.

]]>

Desktop Summit 2011

I’ll be at Berlin tomorrow, for the Desktop Summit. I’ll be presenting a talk, Clutter Everywhere, with Damien Lespiau and Neil Roberts. It’s right after Emmanuele Bassi’s talk, Heart of Blingness: Clutter and GNOME. I highly recommend you attend both talks!

This will be my first Guadec/Desktop Summit as part of Mozilla, so if you have any questions about Firefox Mobile, I’ll do my best to answer them. I hope that I’ll see some of my new colleagues too – do come and say hello, you can’t miss me (I’m the one with the ridiculous hair)!

]]>

Shadow Layers, and learning by failing

A hot topic for Firefox at the moment is the new out-of-process rendering, but is it common knowledge that this has already been in Firefox Mobile for a long time? For mobile, there’s what we call a ‘chrome’ process (this processes and renders the main UI) and then ‘content’ processes, which handle the rendering of the pages in your tabs. There are lots of fun and tricky issues when you choose to do things like this, mostly centering around synchronisation – and recently, I was trying to add a feature that’s lead me to writing this post.

You may have already heard about how Firefox accelerates the rendering of web content. In a nut-shell, a page is separated into a series of layers (say, background, content, canvases, plug-ins, etc.) These layers are then pasted onto each other, in what tends to get called composition. If you’re lucky and have decent drivers, or you run on Windows, this process of composition is accelerated by your video card. Turns out video cards are very good at composition, so this is often a nice bonus. We also try to accelerate the rendering of these layers too, but that’s another topic…

These layers are arranged in what’s known as a layer-tree – when something on the screen needs to update, this tree is traversed, and painted to the screen. But how is this affected by out-of-process rendering? You can’t have both processes painting to the screen simultaneously without some kind of coordination, and often there are various rules on memory sharing/protection that limit how sharing happens too. We choose to let the chrome process handle getting things to the screen. It’s important, however, that the content process not be able to hold up the chrome process too readily. But if we want the page to render correctly and respond to user input, we need the page’s layer tree… So how do we go about solving this?

We use what we’ve called ‘shadow’ layers – the chrome process has a mirror-copy of the content process’s layer tree, and the content process can update it when it’s ready. In the meantime, we have something we can paint and the page continues to be reactive, to the extent at least that you can read it, you can scroll it and you can zoom it. We render a larger area of the page than is visible so that while the content process is busy rendering, we don’t appear to ‘fall behind’ (when we do, you see the checker-board background, similar to the iPhone).

We have various implementations of these layers for different platforms, so we can take advantage of platform-specific features. There’s a GL implementation (GL[X] and EGL), a Direct3D implementation (9 and 10) and a ‘basic’ implementation that uses cairo and runs in software. When the content process changes its layer tree, it sends a transaction representing that change over to the chrome process. Part of this transaction is likely to involve updating of visible buffers. If both processes use basic layers (the default case, on android at least), we use shared memory and page-flipping. That is, the content process renders into one buffer while the chrome process renders out of another buffer, and when the content process updates, they swap around.

For accelerated layers, this is a slightly different and more complicated story. As we can’t share textures across processes and we don’t currently have a remote cairo implementation, the content process always uses basic layers and renders into memory (though there is work going on to allow remote access to acceleration). The chrome process is free to use whatever implementation it likes though, and not all of these implementations allow for page-flipping. The GL layers implementation only uses a single buffer on the content side, and when this is updated, it is synchronously uploaded to the GPU on the chrome side (and the content has to wait). Thankfully, on Maemo and X11, there are extensions that make this very fast (EglLockSurface on Maemo, texture-from-pixmap on GL/X11), though it’s still quite a large, synchronous copy. On Android, this copy is very slow – we have no fast-path due to the API we need not currently being advertised (and possibly not implemented yet).

There are things that we could do to avoid this speed hit though. I thought, for example, we could use EGLImage (which, thankfully, is available on Android) and asynchronously update textures in a thread (or even in chunks in the main-loop). I still think this is a sound idea, but there are caveats. This would require, for example, that either we double-buffer, or we make the content process wait for the asynchronous update to complete. The latter would involve adding asynchronous shadow layer transactions. Not an easy task. If we double-buffer, we then double the system memory cost of storing a layer (and bear in mind, that layer is mirrored in graphics memory, so we’re talking 1.5 times the cost vs. basic layers). We also have to synchronise the updating of the layer coordinates with asynchronous updating to avoid what would otherwise be a huge and visible rendering glitch, and if we want the update to not be viewable while it’s happening, we have to double-buffer the layer’s texture too. We now have twice the memory cost we had before, and these tend to be quite large buffers!

Altogether, not an easy problem to solve. So I’ve given up for now. There are other, easier and less disrupting changes that can be made, that I’ll be trying out next. I’m disappointed that this didn’t pan out as I thought it would, but I’m pleased to have learnt something. I hope this is useful/interesting to someone.

]]>

My First Firefox Mobile Bug

One thing I regret over the past couple of years, is reducing my blog output. I think I used to blog fairly regularly in the past, and that seemed to stop when I joined Intel (though I’d love to blame someone other than myself, unfortunately, it is entirely my fault). So I’d like to get back into the habit of writing again, by writing about what I’m doing here at Mozilla.

Always good to start with an easy one, so I’m starting with the first bug I fixed. Bug #661843, “GeckoSurfaceView may double memory requirement for painting”. Doug Turner assigned this to me as I joined, and I’m still very grateful as it turned out to be pretty easy to fix and a massive win for Firefox Mobile on android.

Getting stuff to the screen from a native app on Android was quite difficult up until Gingerbread and we target Android 2.0 and up, so moving to the new native app SDK isn’t currently an option. It’s a lot easier if you cheat (by using undocumented interfaces), but winners don’t cheat. Or at least they don’t get caught. Or something. To get around the lack of ‘native’ interfaces to the Android app components, Firefox Mobile on android consists of a small Java shim and the main application. This shim acts as our input/output to the device and interfaces via JNI to the various internal services.

For drawing to the screen, our Java shim builds up a simple Android application and provides a buffer for the native code to draw into. When the native code wants to draw, it calls the Java methods to get the buffer, does its thing and sends a signal back to the Java code to let it know that it’s finished drawing – this can be seen mostly in nsWindow.cpp, in the OnDraw method. Prior to Android 2.2, there was no way for native code to draw straight into an Android Bitmap, and no way to copy a raw data buffer onto the application’s Canvas. The only option in this case is to create a Bitmap based on the data buffer (which ends up copying that buffer), then blit that Bitmap onto the Canvas.

Android 2.2 added native access to the Bitmap class, allowing native code to directly manipulate the memory backing it – this is exactly what we needed. Unfortunately, this pushes our requirement up, which isn’t something we want to do just yet. My fix for this patch involved loading the new native graphics access library at runtime and using it if it’s available. To make things easier, I reshuffled the code on the Java side (which can be seen in GeckoSurfaceView.java) so that the two paths share most of the code. The slow path backs the browser canvas with a ByteBuffer (which allows direct access via JNI, but can’t be copied directly to the Canvas), the fast path uses a Bitmap and Android’s libjnigraphics. This halved the memory usage required for updates to the screen and reduced the amount of allocation/copying going on, providing a nice speed boost.

I believe you should see this if you’re running Firefox Mobile Beta, available on the Android Market and it’ll be incorporated in Firefox Mobile 6.

]]>

Time for a change

As some may have already heard through the grape(/twitter)vine, I’ve left my job at Intel. I very much enjoyed my time there, and of course at OpenedHand before it – but after what is pretty much 6 years, I think it’s time for me to take on some new challenges and meet some new people.

So, it’s my great pleasure to inform that I am now working for Mozilla, on the mobile team. I think this is a better fit for me now and I appreciate working for a foundation after a couple of years of corporate life. Although I’ve worked on Mozilla code before (mozilla-headless), this job does take me a fair bit outside of my comfort-zone, and I’m very much looking forward to the challenge! Hopefully I’ll be up to it and I’ll be able to blog about the cool things I’ll be helping out with 🙂

I am very sad to leave such a great team, however, and I’ll be watching with great anticipation to see what they produce now that I no longer have a front-row seat! So thanks and goodbye to all my friends at Intel, and hello to my new friends at Mozilla; I’m sure we’ll get on great.

]]>

Presenting, a media explorer!

Finally, after over a year of closed development (much to our chagrin), we’ve finally released what we’ve been working on! Unfortunately, this release has caveats – in the form of not having a name (some unresolved legal gubbins). We’ve referred to it internally as Sofatron, and though this is not a name endorsed by my employer, I’ll call it that for convenience.

So, what is Sofatron? In short, it’s a local and network video and picture browser. It also has some search ability. The whole thing is backed by Clutter, Mx, Grilo and Tracker, the latter three of which it requires bleeding-edge versions of (you’ll also need Tumbler if you want thumbnails). A picture speaks a thousand words, so a video must be even more valuable. So, a couple follow, along with the repository address:

Download video
Watch on YouTube
Alternative video
Download the source!

While there’s still a lot missing and there are still bugs (in all levels of the stack, I’m afraid to say), I think it’s a pretty good example of what you can do, harnessing some of the best open-source libraries around. It’s released under the LGPL2.1, and we don’t have copyright assignment, so please do feel free to contribute back, to fork it, to do what you like, with respect to the licences involved. I notice that Gnome doesn’t really have an app equivalent to this, maybe it could be a good candidate for a later version of Gnome?

If you’d like to contribute, there are several good projects to take on. Of course, there’s general bug-fixing, polishing and optimisation work. We could also use a music plugin. If you’re looking for something even bigger, maybe a TV-centric web-browser? (I hear there’s even a Clutter back-end of WebKit these days!) If you’d like to help and you don’t know where to start, my e-mail address is on the side of this page (click through if you’re reading this via RSS/an aggregator). I’d love to hear what you think in the comments!

]]>

More lessons to learn from games development

I’ve been increasingly thinking, as our applications become more and more animated and interactive, that we’re going about certain things in a fundamentally wrong way. Often, with the (unfortunately not quite public, but hopefully coming soon!) project I’m working on, I find that after I’ve crafted a nifty new animation or feature, that it works great in a limited test-case, but has some worst-case performance scenarios that render it pointless. Of course, you can always go back and optimise things, and it’s an oft-cited mistake to optimise early, or to micro-optimise, but surely there’s a way of going about things, or a way of thinking, that would limit these situations in the first place?

This post follows on somewhat from my section of our talk at last year’s Guadec.

As I highlighted back then, I think that there are very different types of developers, depending on what you’re making. Going on how often I hear people talking about kernel developers, I think I’m safe in saying it’s widely believed. I identify as an application developer. I occasionally dabble in middleware/infrastructure, and I’d like to think that my background prior to working in open-source was more graphics/games development, but it’s ‘app’ development where I think my heart lies. And there’s something wrong with how I (and perhaps others) go about this.

It usually starts with some kind of basic prototype, quickly hacked together (of course, everything starts with an idea, but I’m taking that as given). This prototype may evolve into a final product, or in some cases, it leads to thinking and re-architecting things and going from the ground up in a new, clean base. I’ve found that myself, and others, usually break the app into objects and scenes, where an object is usually either an interactive widget, the interface to some data or a container for either, and scenes usually represent the interface to a particular task in our application.

We tend to be quite good at abstracting data access so as not to expose the nasty internals of the various backing stores. Sometimes we go a step further, and we make some of these objects or scenes pluggable, and we separate them into interfaces and implementations, allowing them to be easily replaced when the code gets out of control, or becomes obsolete. We tend to be quite good at breaking high-level tasks into logical blocks that can be implemented in bite-size chunks, without things becoming overwhelming.

Unfortunately, we tend to stop at that level of abstraction, and now that computing is becoming more pervasive, good hardware is becoming more commodity and there are many more of us, that isn’t enough anymore. The things I’ve mentioned above can lead to very functional and logical applications, but they don’t guarantee performance. Not being able to guarantee performance means we can’t guarantee consistency or interactivity, which harms usability and the perception of beauty.

This is where games development comes in. Beyond the gameplay and fun elements, games are all about performance. If you don’t guarantee consistency of performance in a game, it becomes frustrating. No one wants to make a frustrating game. This focus on performance seems to reflect on every aspect of games development, and I think now that applications are becoming more dynamic, it’s something we need to learn from.

Where app developers would break things into logical blocks from the point of view of components of a task, games developers tend to break things into blocks that are not interdependent. They also go much further in breaking things up than app developers tend to go. This makes things much easier to parallel-ise, an important feature now that even phones are becoming dual-core.

Games developers also go a step further with this separation, by separating large tasks into component parts. A common problem you see in app development is not spreading load enough. An app developer will think ‘task B requires data A to execute, so create data A’. A games developer may think ‘task B needs to execute in X time, so ensure data A before X time’. This sort of thinking much more commonly leads to breaking up tasks over time, and minimises blocking.

App developers like to do everything on the fly. JIT is the name of the game. We tend to only think about dependencies when we need them. Games developers aren’t afraid to have loading screens if it means they can guarantee the performance of what comes ahead. When they don’t want loading, they think ahead and set things up so that they get streamed in the background and are ready before they’re needed. Being prepared like this minimises the time it takes for a task to respond and complete.

As well as guaranteeing performance, pre-loading also guarantees memory consumption. Memory consumption is so often an afterthought for app developers, but often becomes an issue in a desktop environment. By pre-loading (and pre-allocating), you can not only guarantee memory consumption, but you can think of cache coherency and memory fragmentation, both of which can have a huge effect on fluidity (and therefore consistency).

Another technique I seem to read about a lot in modern games development, is the idea of budget. Once you have broken down your game/application into blocks, you take the high level blocks (for games, say, graphics, AI, physics, audio, input) and you allocate them a time budget. This comes down again to the consistency of performance. If you’re aiming for 60fps, you have about 16.5ms to have your screen ready. Spend any longer and you either have to sacrifice a frame, or you have to allow visible artifacts (tearing) to appear on the screen (and then you have even less time to prepare the next frame). This is the major area where I think application development is lacking. I see very few applications that try to guarantee a particular refresh with techniques like this. In fairness, games developers usually have a target hardware too, but it’s still a different way of thinking, and that doesn’t stop an app developer from targeting their own machine.

I can’t say that I’ve followed too much of this advice myself, but I hope to in the future. Some of the recent performance improvements I’ve made in our app aren’t really optimisations, but just spreading the load over time. For those writing Clutter-based animations, I’ve also written a new component for Mx to help, MxActorManager. This allows batches of actor creation/addition/removal to be spread over a set time-slice and I’ve used it in our current project with a reasonable amount of success.

Now that Gnome 3.0 is out (congratulations btw, it’s awesome!), I can see that other applications may want to raise the visual bar a bit. I hope this post serves as a reminder that if you want a highly interactive and animated application, that it may take more than just optimisation of our old applications and refinement of our old techniques to get there.

]]>

Why ‘gestures’ suck

I’ve not blogged in a while, and though I’ve said I’d try to make my blog less of a platform for public bitching and whining, I figure it’s Christmas, I should get to do what I want. So this is a blog post on why all ‘gestures’ in applications suck, ‘gestures’ are always a bad idea and if you’re implementing ‘gestures’ in your application, you’re doing it wrong. Of course, this is all my personal opinion and I’ve done only the most cursory amount of HCI study, so take it with a pitcher of salt.

Great user-interfaces are made great by working on a user’s familiarities. This makes a lot of sense. If someone designs an icon to represent an action, they find the nearest every-day analogy that has a clear and identifiable visual, and base it off of that. Mail icons involve envelopes, print icons involve printers, search icons involve magnifying glasses (ok, that last one relies pretty heavily on cultural knowledge which is probably questionable nowadays, but bear with me). This should follow on to all aspects of HCI. People will find things easier if they can apply a skill they already have, or they can relate it to something they’re already familiar with.

Touch-screens are becoming a much more common input-device these days, and they’re one I’ve been interested in for a very, very long time. Now that they’re becoming more common, more people are trying to retro-fit their applications to work better with this new interface. And this seems to be where ‘gestures’ come in. People see pinch-to-zoom, or dragging on the iPhone/Pad/Pod (and I’m just going to reference those, as as far as I’m concerned, they’re the only devices that have gotten touch-interaction close to being right), and they seem to think “Hey, that’s cool, I should put those actions in my application!” STOP.

I have a newsflash – and I’m sure this is just pointless ranting for a lot of people, but I’ll say it anyway – pinch-to-zoom and dragging are not ‘gestures’. They are physical manipulations that have a logical result. You don’t ‘execute a pinch-to-zoom gesture’ when you zoom in on a web-page or photo on an iPad. You put two fingers on the screen and you move them closer or further apart, because it makes physical sense. When you put your finger on the surface, it responds instantly and with minimal latency – it immediately establishes that placing your finger on this surface attaches your finger to that point on the surface. From there, pinch-to-zoom makes perfect sense and follows logically. These aren’t ‘gestures’, these are direct and logical manipulations of a surface. And that works. Having instant and reliable response to an action is a very powerful device.

If you’re a gestures fan, you may now be thinking “Well, the difference is academic, surely?” and I would disagree very strongly with that. A gesture, by definition, is when you make a movement to express an idea. With a gesture, it’s ok that you would do one thing, and then, afterwards, something happens. With a gesture, it’s ok that whatever gesture you make, what follows may not be directly linked with that gesture. And this is often the feeling you get when you use an application that has ‘gestures’. You make a gesture, and then, after the application has considered things, it does something. There is no guarantee that what you do will have an instant and well-defined reaction. And as long as we continue to call these actions ‘gestures’, this will always be ok, because this is the definition of a gesture. A gesture does not imply any kind of reaction, or make any implications about latency or reliability.

I bring this up now, as my Android phone (see, I’m not an Apple fanboy!) recently updated to the latest Android market, and this is a damn good example of bad HCI (and bad several other things too, but I want to focus my bitching). For those that have the application, open it up and check this out – There’s a carousel at the top of the application. You can drag this to scroll it, and when you release, it sort-of maintains your momentum and sets it spinning. Except there’s a problem (which is why I said sort-of) – When I drag it, there’s no relation between where my finger is and what’s under my finger. I’m not physically dragging the carousel, I’m performing a ‘drag gesture’. Similarly, when I perform a quick drag gesture and I let go, there’s a small pause, and then the carousel starts spinning with the momentum I gave it – except it isn’t the momentum I gave it, it’s a similar, but not quite right, momentum. The list at the bottom of the application is better (due to it being a stock scrolling widget I imagine), though not much, because they seem to do blocking I/O while you dragging, breaking the direct relation between your physical interaction and the on-screen response.

I don’t mean to pick on Android Market especially, as it’s something you can see in touch-based interfaces all over the place (Android is bad, but feature-phones are often far worse). But in my eyes, this sort of thing shouldn’t be acceptable. Apple proved that it isn’t that hard several years ago now – it’s not an innovation anymore, someone’s gone and done it – we can just copy them!

So, if you have an application that you expect to work on a touch-screen, or you’re planning on writing one, think first, “What physical analogy am I making here?” What common familiarity are you taking advantage of? And if your application involves taking advantage of the fact that most people are used to manipulating things with their hands, then do try to realise just how important making the feedback instant, reliable and logical are. Then realise that you must NOT call these physical interactions ‘gestures’.

]]>