How to design for voice UI

Profile picture of Michael Craig




Illustrations by {name}

The rapidly growing field of voice UI requires us to design for interfaces that are much more auditory than they are visual.

8 min read

A 3D visualization of a voice UI interface, reading “How can I help you?”

Stay informed on all things design.

Thanks for submitting!

Shaping Design is created on Editor X, the advanced web design platform for professionals. Create your next project on Editor X. 

Get our latest stories delivered straight to your inbox →

"Hey Siri, Do I need an umbrella today?"

"Alexa, remind me to call mom tomorrow at 9 AM."

"Hey Google, What's the fastest way to get to the airport from here?"

Voice User Interfaces (VUIs) have vastly improved the ways in which humans interact with computers. While voice UI has been around for some time now, people are more recently beginning to see the endless possibilities of what voice user interface can do.

Often, when we think of “user interface,” we think visually. We’ve grown familiar with screens we can swipe and buttons we can press. However, voice UI is not at all visual. Instead, it allows us humans to interact with a machine using our voice.

Well-known companies like Apple, Amazon, and Google have made everyday tasks easier with simple voice commands such as the ones mentioned earlier. Other companies have started incorporating these existing virtual assistants in their own products or are creating their own cutting-edge voice UI.

When it comes to designing interfaces for web and mobile, there are many factors to keep in mind, such as who you’re designing for, or that validating your design will improve user experience. Similarly, those designing voice user interfaces have to consider various factors and how to overcome obstacles in order to design a voice UI that is truly second nature.

Designing for humans

When designing a voice UI, it’s important to think about who will be interacting with your system.

How do they think?

How do they communicate on a day-to-day basis?

It's very possible that you have to design your interface with multiple audiences in mind.

For example, what if we were building a system that allowed people to book flights using speech? We would want to consider the steps required to accomplish this task on any platform. Once we understand the process, we can then apply it to a voice interface. Let’s book a flight from Atlanta, GA, to New York City.

That would involve the following steps:

  1. Choose dates to fly.

  2. Search for flights within the specified date range.

  3. Choose either one-way or round-trip.

  4. Choose the departing flight based on price and/or flight time.

  5. Choose the returning flight based on price and/or flight time.

  6. Choose flight or fare upgrades.

  7. Select trip protection.

  8. Confirm and pay.

A personal voice assistant in a home setting

Voice interfaces are by no means a replacement for visual interfaces. The two can complement each other rather than compete, leading to a better product.

Use natural language

Natural language is the ordinary speech that we use everyday in conversation. It doesn’t involve any planning or premeditation. It comes to us naturally, and adopting it in voice interfaces allows for a more intuitive experience.

Since mastering natural language requires advanced computational linguistics and semantics, there are still many not-so-great examples out there. In many voicemail assistants you receive a new voice message and after listening to it the first time you’re prompted:

“To hear the message again, say “Repeat”, to reply to the message, say “Reply”, to delete the message, say “Delete”.

This interaction doesn’t use intuitive speech and can actually be confusing. As the system is literally teaching commands on the fly, you actually have to think about what you’re doing, even if you already knew what action you were trying to perform.

In our