The AI that is capable of creating a portrait of a person just by listening to their voice

Now the artificial intelligence is able to know what your face looks like simply by briefly listening to you speak.

Today we have technological advances that not only help us have a better life, but can also be used to perform actions that we previously only thought were possible in science fiction movies. like the reconstruction of a face only listening to the voice of the subject.

And it is that now the artificial intelligence scientists of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT have shown the latest advances in their artificial intelligence algorithm called Speech2Face, which was originally released in 2019.

This algorithm is capable of reconstructing a person’s face using only a first audio recording of that person speakingalthough we already told you that it is not infallible.

To arrive at this technology, the researchers first designed and trained a deep neural network using millions of YouTube videos showing people talking. In this first phase of training, the artificial intelligence was able to learn the correlations between the sound of the voices and the appearance of the speaker.

These correlations allowed the best guesses to be made when it comes to age, gender, and also ethnicity.

To clarify that there was no human participation in this first process, as the researchers did not need to manually label any subset of the data. In this way, the AI simply received a large number of videos and discovered the correlations between voice features and facial features.

To further explore this precision in face reconstruction, they created a face decoder that is capable of forming a standardized reconstruction of a person’s face from a still frame while ignoring irrelevant variations such as pose and lighting. .

Also Read How often and how you have to clean your espresso coffee machine so that it looks like new

This led to allowing scientists to more easily compare the voice reconstructions with the real characteristics of the speaker. And after the first original phase, in this second phase also the results of artificial intelligence came surprisingly close to the real speaker.

However, it is not an infallible method, since there were other cases where the AI had a hard time figuring out what the opening speaker actually looked like. And it is that factors such as accent, language and tone of voice caused discrepancies between speech and face where gender, age and ethnicity were totally incorrect.

“Our model is designed to reveal the statistical correlations that exist between facial features and speaker voices in the training data. The training data we use is a collection of educational YouTube videos and does not represent the entire world population equally.” they affirm.

“So the model, as is the case with any machine learning model, is affected by this uneven distribution of data.”, can be read in the study.

As regards applications of this new algorithm in the real world, this AI could end up creating a cartoon representation of a person in a phone call or video conference when their identity is unknown, a feature that could be added to many applications.

It could also personalize the various voice assistants much more, even giving it the facial image of the person who owns the device.

Although perhaps the most controversial is that the forces of order and security they could use this artificial intelligence to create a portrait showing what a suspect likely looks like if the only evidence they have is the voice.

Also Read So are the Bowie M2 headphones, from Baseus

George Lanington

George is Digismak’s reported cum editor with 13 years of experience in Journalism

Leave a Reply Cancel reply