Gmail has announced a big upgrade to its spam detection system, marking one of the platform’s biggest security enhancements ever. The new technology, named RETVec (Resilient & Efficient Text Vectorizer), is said to revolutionise how Gmail combats spam and phishing attempts.
Google informs that systems such as Gmail, YouTube, and Google Play use text classification models to identify harmful content, such as phishing attacks, inappropriate comments, and scams. However, these texts are challenging for machine learning models to classify because bad actors actively manipulate text to evade detection, using techniques like homoglyphs, invisible characters, and keyword stuffing.
To address this challenge, Google has introduced a new, multilingual text vectorizer called RETVec (Resilient & Efficient Text Vectorizer) in Gmail. RETVec tackles the challenge of “adversarial text manipulations,” which are emails disguised with special characters, emojis, typos, and other similar techniques. These tactics often dodge traditional spam filters, allowing malicious emails to slip through the cracks and potentially harm users.
ALSO READ l Microsoft makes AI Copilot available to all users; here’s everything you need to know about it
Google says that RETVec, a text vectorizer, has underwent extensive testing within the company over the past year and has shown significant effectiveness in security and anti-abuse applications.
It has shown 38% improvement in spam detection rate and a 19.4% reduction in false positive rate compared to the text vectorizer in Gmail’s spam classifier. Further, deploying RETVec led to an 83% reduction in TPU (Tensor Processing Unit) usage for the model, making it a substantial upgrade in defense capabilities.
“RETVec is a novel open-source text vectorizer that allows you to build more resilient and efficient server-side and on-device text classifiers. The Gmail spam filter uses it to help protect Gmail inboxes against malicious emails,” says Google.
RETVec achieves significant improvements in text classification by employing a novel architecture which includes a highly-compact character encoder, an augmentation-driven training approach, and the incorporation of metric learning.
It is designed to work seamlessly across all languages and UTF-8 characters without requiring extensive text preprocessing.
The out-of-the-box compatibility of RETVec makes it suitable for on-device, web, and large-scale text classification deployments. The compact representation of models trained with RETVec leads to faster inference speeds while the smaller models generated by RETVec result in reduced computational costs and decreased latency.
Follow FE Tech Bytes on Twitter, Instagram, LinkedIn, Facebook.