These days it seems impossible to go anywhere without encountering technology with voice control embedded in it. It’s in our phones and tablets, on our laptops, in our cars and now in our speakers. Banks are allowing you to make payments by voice , and even law enforcement agencies using recordings made by them to identify crimes that have taken place.
There are more and more voice controlled speakers coming on the market, with big names such as Amazon (echo), Google (Home) and Apple (HomePod) to name but a few. This new technology can be connected to your cloud music accounts, or stream music from your phone or NAS drive, as well as tell you the weather, organise your day, your shopping, or entertain you with games. You can even control your central heating and your lights through speakers using a number of additional connected ‘SmartHome’ platforms.
So how does all this actually work?
One or more microphones are embedded in the speaker and are always listening out for a trigger word – say ‘Alexa’ or ‘Google’ (depending on which speaker you have) and your speaker will wake up and will record the next words that you say – ‘turn on the lights’ or ‘play my workout playlist’ for example.
This recording is sent to a server on the internet (in the ‘cloud’) where it is filtered of any ambient background noise, normalised, and then divided into small segments, which are then matched to known phonemes (the smallest possible element of language – there are in the region of 40 phonemes in the English language that are used to make all of our words)
The next part is the most difficult – recognising the phonemes in context of the other phonemes either side of them, and comparing the results to an enormous list of possibilities in order to decipher which words have actually been spoken!
Once the command has been deciphered, it is passed to the relevant ‘cloud service’ – so in the example of ‘turn on the lights’ it could be connecting to a third party web portal, passing your user information and the command to it, which in turn sends a message to the particular piece of equipment in your home that your lights are plugged into, telling them to turn on.
In the example of ‘play my workout playlist’ the command is sent along with your user information to whichever cloud service you use for your music – could be Spotify for example – and then Spotify connects directly to your speaker and starts playing the music.
That any of this works at all is a wonder. That it works in a matter of milliseconds is truly amazing.
The next time you speak to Alexa, or Siri, or Google, think about that complex chain of reactions you have set off, and marvel at how much has gone on behind the scenes to bring your desired result!