Jun 26, 2023
AudiopaLM is a large language model for voice production and comprehension.
(Photo Credits: Reuters)
Text-based and voice-based language models, PaLM-2 and AudioLM and AudioPaLM, respectively, are combined into a single multimodal architecture.
(Photo Credits: Reuters)
This multimodal architecture can process and generate both text and speech for use in speech recognition and speech-to-speech translation applications.
(Photo Credits: Reuters)
The linguistic information found solely in large language models like PaLM-2 and AudioLM is passed down to AudioPaLM.
(Photo Credits: Reuters)
The capacity to preserve paralinguistic information like speaker identification and tone is also passed down to AudiopaLM.
(Photo Credits: Reuters)
The model performs voice translation tasks substantially better and and it can execute zero-shot speech-to-text translation for numerous language.
(Photo Credits: Reuters)
AudioPaLM shows how audio language models work by transferring voices between languages in response to a brief spoken prompt.
(Photo Credits: Reuters)