Vall-E is Microsoft’s new AI technology that very accurately imitates a person’s voice based on a 3-second sample

Vall-E is Microsoft's new AI technology that very accurately imitates a person's voice based on a 3-second sample

Microsoft researchers have created a new model of artificial intelligence, Vall-E, capable of reproducing a voice identical to a human. Vall-E is said to be trained on “discrete codes derived from a standard neural audio codec model” as well as 60,000 hours of conversation (100 times more than existing systems) from more than 7,000 speakers. Most of the dialogue is taken from the public LibriVox audiobook sites.

We help

How fighters of the Kherson direction choose drones and how you can do it

Vall-E is based on the EnCodec technology that Meta announced in October 2022. It analyzes a person’s voice, breaks down information into components and synthesizes variations of its sound in different phrases. Even after listening to just a three-second sample, Vall-E can reproduce the timbre and emotional tone of the speaker.

“The results of the experiment show that Vall-E is significantly superior to the current TTS system [ИИ, воспроизводящий голоса, которых он никогда не слышал] from the point of view of the naturalness of speech and similarity to the speaker,” the researchers’ article states.

You can listen to examples of Vall-E voices playing on GitHub. Most sound identical to the recordings, despite the fact that only short fragments are used. Several voices sound more robotic and reminiscent of traditional text-to-speech voices.

Microsoft researchers believe that Vall-E could be used in the future as a text-to-speech tool, speech editing method, and audio generation system by combining it with other generative AI, such as GPT-3.

Course

FINANCIAL MANAGER

Become a professional financial manager and earn from $500 in 2 months.

REGISTER!finmanager

Vall-E is Microsoft's new AI technology that extremely accurately imitates the human voice based on a 3-second sample.

As with other AI models, there are concerns about Vall-E being misused—for example, to mimic the voices of public figures, politicians, or celebrities (especially when used in conjunction with deepfakes). Criminals will also be able to obtain sensitive data if they make a person believe that they are talking to family, friends or officials. Some security systems also use voice recognition. As for its impact on jobs, Vall-E is likely to be a cheaper alternative for dubbing actors.

But Vall-E researchers say all these risks can be reduced:

“A model can be built that will determine whether the audio was synthesized by Vall-E.”

Microsoft, it seems, has decidedly taken up the development of AI technologies and their implementation in its own products. OpenAI’s GPT language model will try to be integrated with Word, Outlook and PowerPoint, and ChatGPT – a chatbot that generates human-like texts and gives detailed answers to questions – will be added to the version of the Bing search engine from March.

According to media reports, Microsoft is also in talks to invest $10 billion in OpenAI. The agreement stipulates that the company will receive 75% of II-lab’s profits until it recoups its investment. After reaching that goal, Microsoft will receive 49% of the startup’s shares, other investors will receive another 49%, and OpenAI’s non-profit parent organization will receive 2%.

Media: Microsoft invests $10 billion in OpenAI – the developer of the chatbot ChatGPT, which generates frighteningly human texts

Source: Techspot

Related Posts

UK to regulate cryptocurrency memes: illegal advertising

Britain’s financial services regulator has issued guidance to financial services companies and social media influencers who create memes about cryptocurrencies and other investments to regulate them amid…

unofficial renders of the Google Pixel 9 and information about the Pixel 9 Pro XL

The whistleblower @OnLeaks and the site 91mobiles presented the renders of the Google Pixel 9 phone. Four images and a 360° video show a black smartphone with…

Embracer to sell Gearbox (Borderlands) to Take-Two (Rockstar and 2K) for $460 million

Embracer continues to sell off assets – the Swedish gaming holding has just confirmed the sale of The Gearbox Entertainment studio to Take-Two Interactive. The sum is…

photo of the new Xbox X console

The eXputer site managed to get a photo of a new modification of the Microsoft Xbox game console. The source reports that it is a white Xbox…

Israel Deploys Massive Facial Recognition Program in Gaza, – The New York Times

The Technology section is powered by Favbet Tech The images are matched against a database of Palestinians with ties to Hamas. According to The New York Times,…

Twitch has banned chest and buttock broadcasts of gameplay

Twitch has updated its community rules and banned the focus of streams on breasts and buttocks. According to the update, starting March 29, “content that focuses on…

Leave a Reply

Your email address will not be published. Required fields are marked *