How Amazon brought Big B’s voice to Alexa – Times of India

[ad_1]

BENGALURU: Bringing Amitabh Bachchan’s voice to Alexa involved two huge technological challenges for Amazon.
The voice had to sound exactly like Bachchan’s, since it’s a voice that Indians recognise only too well.
As Manoj Sindhwani, vice president of Alexa Speech at Amazon, says, “My mom’s a big fan of Mr Bachchan. I was worried that if there was even a single flaw, I would not hear the end of it.”
This was further complicated, Sindhwani says, by the manner in which Bachchan speaks – it’s extremely rich, and he speaks with a lot of emotion, intonation. That’s hard for text-to-speech voice systems to perfect.
The second big challenge was to use ‘Amit ji’ as the wake word. Wake word is the word you use to activate Alexa – which so far has been `Alexa’.
The Amazon team considered other wake words like `Mr Bachchan’, ‘Bachchan ji’, ‘Amitabh Bachchan ji’, ‘Amitabh ji’. But none sounded as exciting as ‘Amit ji’. But it’s so short, practically a single syllable, that many other words we use in everyday speech sound similar to it.
You could even have an older person in your home who is an Amit or an Ajit. It would be irritating to have Alexa frequently waking up to things it shouldn’t.
We’ll know how well Amazon solved these issues only when people start using `Amit ji’ on a large scale. Inside Amazon, they are delighted with what they have accomplished.
Bachchan is only the fourth celebrity, and the first outside the US, to be part of the Alexa voice feature. The first celebrity voice used was that of American actor Samuel L Jackson, which launched in December 2019.
The work with Bachchan involved tech teams in India, Poland, UK and US, and the actor recording his voice across many sessions, so that the artificial intelligence (AI) systems could then work on it.
Puneesh Kumar, country leader for Alexa in Amazon India, says – to the amusement of all of us – that a sound engineer in Poland is on a first name basis with Bachchan, given all the interactions they had. Bachchan, he says, is a stickler for standards.
“There were so many occasions where we felt, Oh, this sounds pretty good, it’s pretty close to your voice. And he was like, no, let’s give it another try, I want to get it perfect.”
The predominant technology used to perfect Bachchan’s speech is called the neural text-to-speech system. When you ask a question, the system first converts it into text, searches for the answer, and then converts the answer from text into Bachchan’s voice.
“There are multiple ways to do text-to-speech, but the latest and the greatest is based on deep neural networks or deep learning,” says Sindhwani. This is one of the most advanced forms of machine learning or AI.
“These training methods are able to produce models that not only reproduce Mr Bachchan’s voice, but also his style of speaking – the way he may stress certain words, go fast on certain occasions, slow on some others. A lot of innovation and thought went into this,” he says.
Another complication was to ensure Alexa would recognise both ‘Alexa’ and `Amit ji’ as wake words. That normally would take too much memory and computation.
“So, we used what we call multi target learning, where you have one input, and you try to predict multiple outputs. It’s super complex, requires a lot of thought into how we build the models. And on top of that, it’s Covid, I cannot collect a lot of data, yet it has to work for unique environments in India, with all the noise that is typically around,” Sindhwani says.
To overcome the paucity of data, Amazon used what’s called transfer learning, where you take the skills learnt from one domain and transfer that learning to a different domain. “We transferred learning from something that works for medium vocabulary recognition, to very specific recognition, which is amazing,” Sindhwani says.

[ad_2]

Source link