Voice assistants conquer the world

Angelika Zerbe

19.2.2018

Language assistants have become extremely popular in recent years. Almost all companies have developed an assistant or are in the process of doing so. Among the best known are Siri, which was launched in 2011, Google's Ok Google, Cortana from Microsoft, Amazon's Alexa and Bixby from Samsung.

The providers pursue different specialisations. OK Google, for example, aims to present situation-specific information to the user in as personalised a manner as possible. Alexa, on the other hand, leads the field in the field of smart home and in the extension of voice commands (skills).

Voice control has managed to become a new interface, the Voice User Interface (VUI). This also brings new challenges for the user experience. At the moment you quickly notice that voice assistants are still in their infancy - for example, when we hear the sentence "Sorry, I didn't quite understand you" for the felt 10th time or wonder why the music goes out as soon as we use Alexa as a night light.

A good reason to research what is important when developing a speech assistant and what the differences are to conventional interactions such as mouse, keyboard, touch screens or buttons.

Low barriers to entry

I can talk to that?

We are used to writing on our PC with a keyboard. Typing on the virtual keyboard of our smartphones has also become second nature to us. But what we are not yet familiar with is talking with our devices. Which speed and volume are optimal? It feels strange.

Voice assistants are increasingly conquering our everyday life. Nevertheless, many people feel a certain shyness when dealing with them. Of course it takes some time to get used to these eloquent assistants. As concept developers, we can certainly help you there. For example, with good onboarding, which takes away the user's fear of contact and gives the first feelings of success.

Does everyone know what a Hamburg icon is?

These and similar questions are familiar to us as user experience designers in connection with screen interfaces. With a VUI there are no such problems. Every user knows how to speak. The user therefore does not have to acquire new knowledge or familiarise himself with an unknown platform. At least this is the best case.

But we cannot assume this ideal situation!

If you install an ordinary iOS app and start it for the first time, a start screen appears. You usually find some buttons, a menu and some text to help you find your way around. With a language wizard without a screen like the Alexa Dot, you won't see anything. Matty Mariansky and his team have created a language assistant that can maintain calendar entries. He had it tested by users who did not know what it was. He did not wait for the users to react accordingly:

‍

„This thing can do whatever I ask him, so I’m going to ask him to make me a sandwich.“

„I have no idea what I’m supposed to do now, so I’m just going to freeze and stare at the screen.“

‍

Users who do not know what the Language Assistant is supposed to be able to do will be disappointed at first. The solution to the problem is simple. The assistant first introduces itself with one or two introductory sentences. After that, the user should be asked to speak a command to get started immediately. It is a success for the user if a voice command immediately leads to the desired reaction. It is generally recommended to test the onboarding several times and to keep the hurdles as low as possible.

A study by Comscore came to astonishing results. It was found that users of voice assistants perform these very simple tasks. For example, 57 percent of users ask about the weather, so it is important that users learn how to expand and customise their commands if they want to remain satisfied and get the full benefit of their assistant.

Offer One-Shots

It becomes problematic when language assistants read out long lists, for example timetables or recipes. A common solution is to present the first headings and then ask if the user wants to hear others. If you pick a random recipe, this can work well. If you are looking for something more specific, it can be tedious and tedious. On a screen, you can browse through the offer within seconds and also gain a much more detailed insight. Every display, even a small one, has a clear advantage here.

Example: Recipe search once with screen and once with speech:

‍

Me: Alexa, open up Chef.
Alexa: Welcome to Chef, what are you up for today?
Me: Alexa, looking for bread dumplings.
Alexa: I didn't find anything to go with high-flour dumplings.
Me: (louder) Alexa, looking for bread dumplings.
Alexa: Here is the search result, I have Sivi's bread dumplings, bread dumplings excellent, (..) found. For more details say e.g. open recipe 1
Me: Open recipe 1.
Alexa: The chosen recipe takes 20 minutes and has 4.7 out of 5 stars.

Then I am asked if Alexa should send me the recipe. This will appear immediately in my Alexa app. However, I don't see a picture of the recipe there either.

For recipes I prefer to go directly to the website. But what works perfectly are one-shots. These are tasks that are completed with one instruction: "Alexa, timer five minutes." Just as good are commands that start a task: "Alexa, play music" or "Alexa, how high is the Zugspitze?

The less the user has to choose, the better.

All in all, experience with Alexa shows that speech alone is not sufficient as a means of interaction. That's why products such as Echo Show - a combination of a speech assistant and a touch screen monitor - are increasingly being developed. For example, if you want to buy handkerchiefs, you say: "Alexa, I need handkerchiefs". You then see a selection on the screen and can choose a product. In product development, I think it's essential to decide early on whether it's going to be an assistant with screen support or without.

Follow the course of the conversation

A good Conversational User Interface (CUI) should be able to conduct an understandable and continuous conversation. For example, if you ask who is the 16th President of the United States, Ok Google will answer "Abraham Lincoln" reliably. I want to get more information about him and ask "How old was he?" and then "Where was he born? The answers are correct and refer to the 16th President. It's great not to have to start all over again from scratch.

But if there is an abrupt change of topic, the assistant may not be sure whether the question refers to a new topic. Before he gives the wrong answer, it is more pleasant if the assistant asks nicely for once.

Asking questions is especially important for sensitive commands such as "Delete this". The wizard must understand the user's intentions or ask what to delete. Is it an e-mail or your own Facebook profile?

Care for personality

Chat bots need a personality - a tip from Bettina from her blog article Gute UX für Chatbots. The same applies to language assistants. A detailed persona with name and behaviour must be defined. Both characteristics can also be related. Cortana from Microsoft got its name and personality from the futuristic PC game "Halo". Alexa's name, on the other hand, is historical, as it is a tribute to Alexandria's library and therefore stands for large amounts of knowledge.

One point that stands out: at present, language assistants are predominantly female. Even if this fact can be justified, it is still far from being established. The only thing that matters is that the voice is sympathetic and understandable. Another interesting consideration can be made about the behaviour of the digital assistant: should our assistant react differently when a 20-year-old orders a pizza than when an older lady has questions about a medication? In principle, these two people expect a different tone in the conversation. The first one could be funny and casual, while the old lady is addressed more calmly and precisely. In order to find a suitable behaviour, a lot of time should be given to this task.

This topic also influences the development of the professional profile of UX designers. In the future, we will design less visual elements, but rather conceive much more artificial personalities. Exciting, I think!

Our language is complex

Scrolling through the Alexa Skills is still quite sobering. There is an unbelievable amount on offer, but most of the applications do not find much favour. 4 or 5 rating stars are rare. Not really surprising - after all, the first apps for the smartphone were not a hit either. It took time for the technology to mature and the developers to become familiar with the possibilities.

It is quite similar with voice assistants. Here too, UX designers and developers have to approach a completely new use case. There is also another challenge: understanding human speech.

We are used to clicking on buttons or images and having texts displayed. If you look at well-rated skills, you will notice that especially skills that do not really need speech and fall into the category of smart home work:

"Alexa, start sleep sounds."
"Alexa, lower the temperature five degrees."

Even though these commands work, there's air at the top. The command is like a special syntax to remember. It's a long way from free speech. These commands can be improved for example like this:

"Alexa, please adjust the heat so that I'm no longer cold."
"Alexa, turn the heat off before the milk boils over."

We need to learn that language works differently than a website or language search.

In a skill description you will find commands like "timer to 5 minutes". But even if it seems trivial, there should always be instructions on how to turn the timer off or change it. In the best case the command is as simple as possible so that the user can just talk. The timer then gives a short confirmation and changes as desired.

Conclusion: Integration into everyday life?

How the language assistants integrate into our lives will become apparent in the coming years. Many topics are still unclear overall. How to deal with difficult questions, for example. In many cases, this is not being solved well today:

Me: "I want to quit smoking."
Siri: "Ok, here is what I found [Search results for nearby tobacco shops]".

One thing is clear here: rules such as ethical design guidelines and ethical criteria on health must be integrated even more closely.

In any case, I am looking forward to exciting projects in connection with language assistants and the opportunity to delve deeper into the subject.