Microsoft researchers have created a new model of artificial intelligence, Vall-E, capable of reproducing a voice identical to a human. Vall-E is said to be trained on “discrete codes derived from a standard neural audio codec model” as well as 60,000 hours of conversation (100 times more than existing systems) from more than 7,000 speakers. Most of the dialogue is taken from the public LibriVox audiobook sites.
Vall-E is based on the EnCodec technology that Meta announced in October 2022. It analyzes a person’s voice, breaks down information into components and synthesizes variations of its sound in different phrases. Even after listening to just a three-second sample, Vall-E can reproduce the timbre and emotional tone of the speaker.
“The results of the experiment show that Vall-E is significantly superior to the current TTS system [ИИ, воспроизводящий голоса, которых он никогда не слышал] from the point of view of the naturalness of speech and similarity to the speaker,” the researchers’ article states.
You can listen to examples of Vall-E voices playing on GitHub. Most sound identical to the recordings, despite the fact that only short fragments are used. Several voices sound more robotic and reminiscent of traditional text-to-speech voices.
Microsoft researchers believe that Vall-E could be used in the future as a text-to-speech tool, speech editing method, and audio generation system by combining it with other generative AI, such as GPT-3.
Course
FINANCIAL MANAGER
Become a professional financial manager and earn from $500 in 2 months.
REGISTER!
As with other AI models, there are concerns about Vall-E being misused—for example, to mimic the voices of public figures, politicians, or celebrities (especially when used in conjunction with deepfakes). Criminals will also be able to obtain sensitive data if they make a person believe that they are talking to family, friends or officials. Some security systems also use voice recognition. As for its impact on jobs, Vall-E is likely to be a cheaper alternative for dubbing actors.
But Vall-E researchers say all these risks can be reduced:
“A model can be built that will determine whether the audio was synthesized by Vall-E.”
Microsoft, it seems, has decidedly taken up the development of AI technologies and their implementation in its own products. OpenAI’s GPT language model will try to be integrated with Word, Outlook and PowerPoint, and ChatGPT – a chatbot that generates human-like texts and gives detailed answers to questions – will be added to the version of the Bing search engine from March.
According to media reports, Microsoft is also in talks to invest $10 billion in OpenAI. The agreement stipulates that the company will receive 75% of II-lab’s profits until it recoups its investment. After reaching that goal, Microsoft will receive 49% of the startup’s shares, other investors will receive another 49%, and OpenAI’s non-profit parent organization will receive 2%.
Media: Microsoft invests $10 billion in OpenAI – the developer of the chatbot ChatGPT, which generates frighteningly human texts
Source: Techspot