The product design challenges of AR on smartphones

With the launch of ARKit, we are going to see augmented reality apps become available for about 500 million iPhones in the next 12 months, and at least triple that in the following 12 months — as we can now include the numbers of ARCore-supporting devices from Google.

Matt MiesnieksContributor

Matt Miesnieks is a partner at Super Ventures.

This has attracted the broader developer community to AR, and we’ll see many, many, many experiments as developers figure out that AR is an entirely new medium. In fact, it could be even more profound than that, as through all of history we have consumed visual content through a rectangle (from stone tablets, to cinema, to smartphones, etc.) and AR is the first medium that is completely unbound.

It’s a new medium the way the web was different from print, different in kind, not in scale. In the same way that most of the first commercial websites were “brochure-ware,” where an existing print brochure was uploaded onto a website, the first AR apps will be mobile apps copied over to AR. They’ll be just as terrible as brochure-ware was, though they will be novel!

The initial products in a new medium are successful products from the prior medium copied and forced into the new medium. It’s easier for people to imagine and fit a mental model of the new medium that is still developing. I often hear that AR will be great “to see my Uber approaching” (even though a 2D map does a great job of this). The winning AR apps will take advantage of the new capabilities of the new medium and do things that weren’t even possible in the old medium.

When I worked at Openwave (which invented the mobile phone web browser) in the early 2000s, we used to talk about “mobile native” experiences, as distinct from “small screen websites.” In 2010, Fred Wilson described this concept far better than I could in his widely cited mobile-first post. Hopefully this post will be an echo of his, but for AR.

Apple also recently published some Human Interface Guidelines for ARKit, which look great. I couldn’t find anything similar from Google for ARCore, but will happily add a link here if someone can point me to one. This post goes into a fair bit more detail on the same types of topics based on our real-world learnings.

So what makes an app “AR native”?

I’m going to dig into a bunch of factors that are “special” to smartphone AR, like you’d build with ARKit or ARCore. I am specifically avoiding AR apps for head mounted displays, as HMDs open a whole new range of possibilities for products that aren’t possible on a smartphone (hands-free apps, longer session times and much deeper immersion potential for starters.

So here’s a list in no particular order of things to think about as you consider concepts for your new ARKit app. I’m sure there’s more to be discovered; my friend Helen Papagiannis’ book Augmented Human goes even broader and deeper than I do here, but this list should be pretty comprehensive for now, considering it’s the result of more than 50 years of combined AR smartphone app development experience.

When we are pitched at Super Ventures on an AR app (which is pretty much constantly), it’s easy to tell if the developer hasn’t built an AR-native product, as the simple question “Why do this in AR, wouldn’t a regular app be better for the user?” is often enough to cause a rethink of the entire premise.

Think outside the phone

One of the biggest mental leaps to make is that as a designer/developer, you now need to start thinking outside the phone. You also need to rethink the basics of how a user exercises intent with smart devices in general. The structure of the real world, the types of motion that the phone might make, the other people nearby, the types of objects or sounds nearby, all now play a part in AR product design. Everything happens in 3D (even if your content is only 2D).

It’s a big conceptual leap to design content that “lives” outside the phone in the real world, even though the user can only see it and interact with it via the smartphone glass.

Various interactions/transitions/animations/updates that happen within the app are only one context to consider, as there could be a lot going on “outside the app” to take into account. Once you get your head around this, the rest becomes easier to grasp.

Biomechanics of wrist versus head (versus torso)

The fact that we hold our phones in our hands is a huge design constraint and opportunity. Our wrists and arms can move in ways that our heads just can’t. An app that you can use with one hand can allow movements that an app that needs two hands can’t (as you have to rotate your torso plus have very limited translation without walking).

You aren’t going to slide your head along the table to get a measurement! This type of use case should always make sense for smartphone AR.

This means AR apps that don’t require you to be always looking through the display (as your wrist might be pointing away from your eyes) can work well. Two great examples are the “Tape Measure” app and AR apps that let you 3D-scan a small object.

A real-time fast-twitch shooter game could be difficult to make work with mobile AR, as if you are moving the phone around quickly (if not, then why even do it in AR) then it might be hard to see what’s going on.

Why get your phone out of your pocket?

AR developers often forget that we usually have our phones in our pocket/bag when we are moving about. Our imagination of the “user experience” of our app begins when the user taps our app icon (or even once you are in AR-view looking through the camera). It should start much earlier, as one of the big unique benefits of AR is that it can provide real-time context around our location. If the user starts your app 30 seconds too late, there may be no benefit, as they’ve walked past the location the app needs. Facebook or SMS notifications don’t have this problem of being place-sensitive.

How would I even think to get my phone out right now? This is a huge design problem for smartphone AR (and beacons before this). (Image: Estimote)

So there’s a very real and difficult problem in getting a user to get their phone out while they are in the best place to use your app. Notifications could come via traditional push messages, or the user might think to use the app by seeing something in the real world that they want more information on, and they already know your app can help with this (e.g. Google Word Lens to translate languages on menus).

Otherwise, your app just needs to work anywhere, either through using unstructured content, or being able to tap into content that is very, very common. This problem is the No. 1 challenge for all the “AR graffiti” type apps that let people drop notes for others to find. It’s almost impossible for users to be aware that there’s content to find. FYI — this is just another version of the same problem that all the “beacon” hardware companies have, getting the shopper to pull out their phone to discover beneficial content.

This challenge also manifests in a different way when AR app developers think about “competition.” As AR is so wide open, it’s easy to think that the opportunity is there to win. However, AR is always competing with the minutes in our day. We already spend time doing something, so how is my AR app going to compete against that.

AR as a feature, e.g. Yelp with Monocle, Google Lens image search

When we start thinking about ARKit apps, we often immediately think of the entire app experience taking place in “AR mode” while looking at the real world. This just doesn’t make sense, as many actions will work better as if they were a regular smartphone app, e.g. entering a password. Indeed, 2D maps often are a better way to show location data due to the user being able to zoom in/out and having a 360-degree view. Think hard about which parts of the app work natively for AR, and consider building a hybrid-app where AR mode is a feature. Yelp did this with their Monocle feature, as an example.

Going into “AR mode” is a feature of the Yelp app (Monocle), not the purpose of the app. There’s wisdom here.

Interact with the world

At the simplest level, what makes a native AR app is that it enables a degree of interaction between digital content and the physical world (or a physical person). If there’s no digital/physical interaction, then it should be a regular app. Putting this even more strongly, as smartphone apps are our “default” UX, then your AR app should only exist if it does something that can only be done in AR. Interaction with the world can come in three broad ways:

3D geometry: This means the digital content can be occluded by and physically react to the real world structure. Lots of good Tango demos of this, our Dekko r/c jumping off physical ramps, the Magic Leap little robot under the desk are all examples.

The structure of the physical space becomes a key element of the app.

Other people sharing your view and physical space: If your app needs other people physically near you in order to give benefit, then it’s probably AR-native. Good examples of this are an architect showing a client a 3D model on the coffee table, or Star Wars Holochess.

Being able to “share a hallucination” with a friend or colleague in the same physical space is a great native use that AR enables.

Controller motion: This means the content can be static but you physically move your phone around, or the phone could act as a “controller,” like a Wii. This one is borderline IMO. If the content just sits on a tabletop and you can view the content from different points of view by moving your phone around, then personally I don’t see this as AR-native, as the only benefit is novelty value. A smartphone app will probably get better use using pinch/zoom, etc.

Some people argue that the sense of physical scale the user can get by moving around makes it AR-native… The question you need to ask is this: is there any benefit to this interaction beyond novelty value. One way in which the controller (phone) can be used in a native way is in a similar way to a Wii controller, where specific movements of the controller cause action in the content. One example would be a virtual balloon hovering over a desk, and you could “push” it around by physically pushing the controller against the balloon.

Think about whether your use case needs these types of interactions to give the user the benefit you want. If not, then accept this is a novelty (not a bad thing in itself, just set expectations of re-use appropriately around zero) or build a regular app.