Researchers find just 250 malicious documents can leave LLMs vulnerable to backdoors


Artificial intelligence companies have been working at breakneck speeds to develop the best and most powerful tools, but that rapid development hasn’t always been coupled with clear understandings of AI’s limitations or weaknesses. Today, Anthropic released a report on how attackers can influence the development of a large language model.

The study centered on a type of attack called poisoning, where an LLM is pretrained on malicious content intended to make it learn dangerous or unwanted behaviors. The key finding from this study is that a bad actor doesn’t need to control a percentage of the pretraining materials to get the LLM to be poisoned. Instead, the researchers found that a small and fairly constant number of malicious documents can poison an LLM, regardless of the size of the model or its training materials. The study was able to successfully backdoor LLMs based on using only 250 malicious documents in the pretraining data set, a much smaller number than expected for models ranging from 600 million to 13 billion parameters. 

“We’re sharing these findings to show that data-poisoning attacks might be more practical than believed, and to encourage further research on data poisoning and potential defenses against it,” the company said. Anthropic collaborated with the UK AI Security Institute and the Alan Turing Institute on the research.



Source link

Latest

Elon Musk misled investors during his Twitter takeover, jury finds

A group of former Twitter investors have prevailed...

Dune Imperium, Forager, Bloons TD 6, more

Reg. $1+/FREE+ Friday’s lineup of the best Android game...

Elon Musk misled Twitter investors while trying to get out of acquisition, jury says

A civil jury in California on Friday ruled...

Newsletter

Don't miss

Elon Musk misled investors during his Twitter takeover, jury finds

A group of former Twitter investors have prevailed...

Dune Imperium, Forager, Bloons TD 6, more

Reg. $1+/FREE+ Friday’s lineup of the best Android game...

Elon Musk misled Twitter investors while trying to get out of acquisition, jury says

A civil jury in California on Friday ruled...

Pinterest CEO says teens under 16 should be banned from social media (but not Pinterest)

Pinterest's CEO has thrown his support behind an...

New court filing reveals Pentagon told Anthropic the two sides were nearly aligned — a week after Trump declared the relationship kaput

Anthropic submitted two sworn declarations to a California federal court late Friday afternoon, pushing back on the Pentagon’s assertion that the AI company...

Elon Musk misled investors during his Twitter takeover, jury finds

A group of former Twitter investors have prevailed at a federal civil trial over Elon Musk's actions amid his $44 billion acquisition of...

Dune Imperium, Forager, Bloons TD 6, more

Reg. $1+/FREE+ Friday’s lineup of the best Android game and app deals awaits below, including titles like Bloons TD 6, Unreal Life, Candleman, Wings...

LEAVE A REPLY

Please enter your comment!
Please enter your name here