in about a year since then large language models Hit in a big way, researchers have demonstrated Very Modalities Of trick them Generating problematic output, including hate jokes, malicious code and phishing emails, or users' personal information. It turns out that misbehavior can happen in the physical world too: LLM-powered robots can be easily hacked so that they behave in potentially dangerous ways.

Researchers at the University of Pennsylvania were able to persuade a simulated self-driving car to ignore stop signs and even go off a bridge, using a wheeler to find the best place to detonate a bomb. Got the robot, and forced a four-legged robot to spy. On people and enter restricted areas.

“We don't see our attack as just an attack on robots,” says george pappasHead of a research laboratory at the University of Pennsylvania who helped free the rebel robots. “Whenever you connect LLM and foundation models to the physical world, you can actually turn harmful text into harmful actions.”

Pappas and his allies planned their attack by moving forward Previous research that explored ways to jailbreak LLM By crafting inputs in clever ways that break their security rules. They tested systems where the LLM is used to transform naturally phrased commands into commands that the robot can execute, and where the LLM receives updates as the robot works in its environment.

The team tested an open source self-driving simulator that included LLM developed by Nvidia called Dolphin; a four-wheel outdoor research vehicle called Jackal, which uses OpenAI's LLM GPT-4O for planning; and a robotic dog called Go2, which uses a previous OpenAI model, GPT-3.5, to interpret commands.

The researchers used a technique called PAIR, developed at the University of Pennsylvania, to automate the process of generating jailbreak signals. His new program, robopairSpecifically, LLM-powered robots will systematically generate signals designed to break their own rules, try different inputs and then refine them to push the system towards misbehavior. The researchers say the technology they created could be used to automate the process of identifying potentially dangerous orders.

“This is a striking example of LLM vulnerabilities in embedded systems,” says yi zhengPhD student at the University of Virginia, who works on the security of AI systems. Zheng says the results are hardly surprising given the problems seen in LLMs, but adds: “It clearly shows that we can use LLMs as standalone control units in security-critical applications without proper guardrails and moderation layers.” Why can't I trust LLM completely?

Robot “jailbreak” highlights a broader risk that is likely to increase as AI models are increasingly used by humans to interact with physical systems or as a way to enable AI agents on computers autonomously. Yes, say researchers.

Leave a Reply

Your email address will not be published. Required fields are marked *