Oxford scientists have developed a new artificial intelligence programme that can lip-read more accurately than people, an advance that will help those who suffer from hearing loss. The Watch, Attend and Spell (WAS) software system uses computer vision and machine learning methods to learn how to lip-read from a dataset made up of more than 5,000 hours of TV footage, gathered from six different programmes. The videos contained more than 118,000 sentences in total, and a vocabulary of 17,500 words. Researchers from the Oxford University in the UK compared the ability of the machine and a human expert to work out what was being said in the silent video by focusing solely on each speaker’s lip movements.
They found that the software system was more accurate compared to the professional. The human lip-reader correctly read 12 per cent of words, while the WAS software recognised 50 per cent of the words in the dataset, without error. The machine’s mistakes were small, including things like missing an “s” at the end of a word, or single letter misspellings. The software could support a number of developments, including helping the hard of hearing to navigate the world around them. “Lip-reading is an impressive and challenging skill, so WAS can hopefully offer support to this task – for example, suggesting hypotheses for professional lip readers to verify using their expertise,” said Joon Son Chung, a graduate student at Oxford University.
“There are also a host of other applications, such as dictating instructions to a phone in a noisy environment, dubbing archival silent films, resolving multi-talker simultaneous speech and improving the performance of automated speech recognition in general,” said Son Chung.