According to Cointelegraph, a team of researchers from artificial intelligence (AI) firm AutoGPT, Northeastern University and Microsoft Research have developed a tool that monitors large language models (LLMs). This tool is expected to monitor potentially harmful outputs from LLMs and prevent them from executing.
Sources revealed that the agent is explained in a research paper titled “Testing Language Model Agents Safely in the Wild.” The agent is expected to be flexible enough to monitor existing LLMs and can stop harmful outputs, such as code attacks before they happen.
“Agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behaviour ranked and logged to be examined by humans,” an agent explained.
To train the monitoring agent, the researchers are believed to have built a data set of nearly 2,000 safe human-AI interactions. Furthermore, they are available for about 29 different tasks ranging from simple text-retrieval tasks and coding corrections to developing entire webpages from scratch, Cointelegraph concluded.
(With insights from Cointelegraph)