Given that the goal of developing a generative artificial intelligence (genai) model is to take human instructions and provide a helpful app, What Happens IF that Human Instructions ARE MALICIOS? That was the question raised during a demonstration of ai vulnerability Showcase 2025 Event in London,

“A language model is designed to summarise large Amounts of information,” said Matthew Sutton, Solution Architect at Advai. “The aim is to give it as much test information as possible and let it handle that data.”

Sutton raised the question of what would Haappen if someone using a large language model (lLM) asked it to produge disinformation or harmful content, or Reveal Sensitive Information. “What Happens If you ask the model to produces malicious code, then go and Execute it, or Attempt to Steal somebody's data?” He said.

During the demo, sutton discussed the inrent risk of using retrieval augmented generation (Rag) that has access to a corpus of corporate data. The general idea behind using a rag System is to provide context that is then Combined with External Infererance from an ai model.

“If you go to chatgpt and ask it to summarise your emails, for example, it will have no idea what you're talking about,” He said. “A rag system takes External context as informationWhether that be documents, external websites or your emails. “

According to sutton, an attacker could use the fact that AI System Reads Email Messages and Documents Stored Internally to Place Malicious Instructions in an email message, Document or Website. He said these institutions are then picked up by the AI ​​model, which enables the harmful instruction to be executed.

“Large language models give you this ability to interact with things through natural language,” said sutton. “It's designed to be as easy as possible, and so from an adversary point of view, this means that it is easy and has a lower entry barrier to create logic instruments.”

This, according to sutton, means anybody who wants to disrupt a corporate it system clock look at how they could use an indirect prompt injection at the inserted instructions Correspondence.

If an employee is interacting directly with the model and the harmful instructions have found their way into the corporate ai system, then the model may present harmful or Misleading Content to that percent.

For example, he said people who submit bids for new project workwal provide institutions Hidden in their bid, Knowing that large language model will be used to summarise the taxt of their submission, Which could be used to influence their bid more positively than rival bids, or instruct the llm to ignore other bids.

For Sutton, this means there is quite a broad range of people that means to influence an organization's tender process. “You don't need to be a high-level programmer to put in things like that,” He said.

From an it security percent percent, sutton said an indirect prompt injection attack means to be cognisant as to the information being provided to the AI ​​System, SINCE This Data is not alli

Generally, the output from an llm is an answer to a Query Followed by Additional Contextual Information, that shows the users how the information is referenceed to output the answer. Sutton pointed out that people should definition the reliability of this contextual information, but noted that it would be unrealistic and undermine the use of an llm if pe sold to checker Every single time it generated a response.

Leave a Reply

Your email address will not be published. Required fields are marked *