Is a mobile phone you talk to easier to use than one you just talk through? Saying “call my nephew” has got to be easier than typing in a ten digit number. However large the buttons on your big button mobile phone talking to it seems a lot less hit and miss than typing in the right one of ten billion combinations of numbers.

Unfortunately there is a difference between talking and listening.

A quick question for you. What’s the square root of 4356? Not an easy one? It’s something the computer you are reading this on could do in fractions of a second. The simplest of simple tasks.

Yet ask a computer to identify someone by voice and it really struggles. Computers that have faultless voice recognition only exist in Tom Cruise movies. A person has no problem with this at all, we can identify a voice in a tenth of a second and understand it with very little trouble. We assign not hearing something to being an auditory problem not one of cognition. Those people who’ve told you that a “computer is like a brain” are lying the computers are bad at talking and worse at listening. For all the amazing advances in technology we are past the twenty fifth anniversary of 1984 and Winston Smith’s voice writer being in daily use is as much science fiction as when George Orwell wrote it in 1949.

It’s something we find hard to grasp. As human beings we find voice recognition simple. We think of computers as being so ‘clever’ we find it hard to credit them as being bad at understanding speech. It’s not the audio side which lets them down but the lack of experience. When we heard something we add a huge amount of interpretation to what we hear. This is best demonstrated by the Two Ronnies “Four Candles” sketch which is funny precisely because the interpretation is confused.

Voted the funniest sketch ever the Two Ronnies "Four Candles"

Do you mean Four Candles or Fork Handles, speech recognition is tough and relies on context.

Asking your easy to use mobile phone to “call my nephew”, means it needs to explicitly know who your nephew is. This isn’t the same as knowing what a nephew is and knowing who your siblings are and their children’s names.

The whole area of computer voice recognition has a long and inglorious history. Many companies have come and gone, often with scandal and corruption, promising and failing to deliver. For the last decade I’ve been told “yes we know that we’ve failed to deliver in the past but the technology has caught up now”. Of course the phone companies have been very interested. In the early 1990s I had an analogue NEC carphone which if you said to it loud and slowly ONE, TWO, THREE would ask you so say a phone number and then dial it. Within the limitations it worked pretty well but it only had to recognise ten digits and a few words like “delete”, “cancel”, and   “dial”.

 

The wildfire swoosh logo

This logo is from the company Wildfire which was later bought by Orange, one of the founders Rich Miner went on to found a company called "Android" which was bought by Google.

Perhaps the highpoint in mobile voice recognition was Wildfire, a service ought by Orange and promoted with great gusto until they got bored and shut it down. It’s lamented by the blind community and by me. Wildfire was wonderful. It did a tremendous job of managing your calls and voicemail. If you had a bunch of voicemails waiting you could turn a long traffic jam into a solidly productive time. The original vision Orange had was to combine Wildfire with the Annonova news service it bought and provide voice recognition news and information services. Indeed in the beta of Wildfire you could say “tell me a joke”. A bit of this survived into service with an easter egg in the form of being able to say “what does a cow say” and getting “moo” or “I’m depressed” and Wildfire offering one of a number of conciliatory replies, but she never lived up to her full potential.

I loved Wildfire and was very successful using her to make calls and manage my messages. Adding callers to the phonebook. It was as easy for me to use as speaking to a receptionist and asking them to place my calls, but it wasn’t 100% Wildfire doing this, I was very practiced. I had learnt over hours of using Wildfire to modulate my speed and tone to achieve that success. I could demonstrate Wildfire in a way that would leave the person watching in no doubt that this was the holy grail of voice recognition and I knew I was a fraud.

Siri might be the holy garil of user interfaces

Apple has made many of the promises for Siri that Orange made for Wildfire. It's more than a decade on so maybe it can deliver

Today we have Siri, the Apple technology which promises everything that Hans Snook, the managing director of Orange, did when he stood on a stage in Vinopolis and announced Wildfire more than a decade ago. There are other similarities, the charismatic leader in a black outfit who is no longer there, the company named after fruit. Orange then was as Apple is today, visionary; sometimes overstretching itself, sometimes changing the world.

It’s very possible that Siri is the technology that will make voice recognition possible for seniors but it will need to cope with people who can only think like people not like computers. “Call my nephew, Mike”, “call Michael”, “Call Joyce’s Mike”. It will need to cope with faint voices – many people, particularly with Parkinsons can be very faint, and if it requires either the system or the user to be trained, and pick up the knack, it has failed.

I’d like to see Siri trialled with seniors, but until it has I see the view that it is “just the technology for my Grandma” as underestimating the problem. Voice recognition remains a much harder task than working out that the square root of 4356 is 66.

 

Simon Rockman