With the rising improvement around Artificial Intelligence (AI), Microsoft has developed an AI model that can mimic human voice. Reportedly, named VALL-E 2, is a text-to-speech generator that can mimic a voice based on just a few seconds of audio.
VALL-E-2 is expected to be trained to recognize concepts without being provided any examples of those concepts beforehand in a scenario called zero-shot learning.
Decoding VALL-E-2
According to developers at Microsoft Research, VALL-E 2 can produce ‘accurate, natural speech in the exact voice of the original speaker, comparable to human performance.’ They also claimed that the AI model can synthesize complex sentences in addition to short phrases.
The tool takes advantage of two features called Repetition Aware Sampling and Grouped Code Modeling, as explained by the Microsoft researchers.
But how does this feature make VALL-E-2 more potential to pose risk for humans? Repetition Aware Sampling helps in addressing the pitfalls of repetitive tokens, or the smallest units of data a language model can process. It helps an AI model to understand human language by words or parts of words. In addition to this this also prevents recurring sounds or phrases during the decoding process.
This eventually helps vary the system’s speech and make it sound more natural. On the other hand Grouped Code Modeling can help to limit the number of tokens the model processes at once to generate faster results.
AI mimics Humans: Boon or Bane
The tech giant claims that VALL-E 2 is the first of its kind to achieve ‘human voice.’ However, early reports suggest that there has been a rise in “Vishing,” a portmanteau of “voice” and “phishing,” which is a type of attack where scammers pose as friends, family, or other trusted parties on the phone.Voice spoofing could even pose a national security risk.
For example, in January, a robocall using President Joe Biden’s voice urged Democrats not to vote in New Hampshire primaries. Reportedly, the man behind the plot was later indicted on charges of voter suppression and impersonation of a candidate.
In response to this Microsoft claims that VALL-E 2 will not be released to the public anytime soon, as i it is purely a research project. “Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public,” the company explained further in its official website.
It looks like earlier Microsoft has come under increased scrutiny over its implementation of AI, on both the antitrust and data privacy fronts. According to sources, regulators have voiced concern about the tech giant’s $13 billion partnership with OpenAI and resulting control over the startup. Notably, to avoid further cases Microsoft plans to keep VALL-E-2 just as a research project and not more than that.
Follow FE Tech Bytes on Twitter, Instagram, LinkedIn, Facebook.