研究人员揭示谷歌Gemini AI的弱点

admin 2024年3月20日01:27:24评论5 views字数 4257阅读14分11秒阅读模式

研究人员揭示谷歌Gemini AI的弱点

Google's Gemini large language model (LLM) is susceptible to security threats that could cause it to divulge system prompts, generate harmful content, and carry out indirect injection attacks.

谷歌的Gemini大型语言模型(LLM)容易受到安全威胁,可能导致其泄露系统提示,生成有害内容,并执行间接注入攻击。

The findings come from HiddenLayer, which said the issues impact consumers using Gemini Advanced with Google Workspace as well as companies using the LLM API.

这些发现来自HiddenLayer,该公司表示这些问题影响了使用谷歌工作区Gemini Advanced的消费者以及使用LLM API的公司。

The first vulnerability involves getting around security guardrails to leak the system prompts (or a system message), which are designed to set conversation-wide instructions to the LLM to help it generate more useful responses, by asking the model to output its "foundational instructions" in a markdown block.

第一个漏洞涉及绕过安全防护栏,泄露系统提示(或系统消息),这些提示旨在为LLM设置全局对话指令,以帮助其生成更有用的响应,方法是要求模型输出其在markdown块中的“基础指令”。

"A system message can be used to inform the LLM about the context," Microsoft notes in its documentation about LLM prompt engineering.

微软在其有关LLM提示工程的文档中指出:“系统消息可用于通知LLM有关上下文。”

"The context may be the type of conversation it is engaging in, or the function it is supposed to perform. It helps the LLM generate more appropriate responses."

这是因为模型容易受到所谓的同义词攻击的影响,以规避安全防御和内容限制。

This is made possible due to the fact that models are susceptible to what's called a synonym attack to circumvent security defenses and content restrictions.

第二类漏洞涉及使用“狡猾的越狱”技术,使Gemini模型生成围绕选举等主题的错误信息,并输出可能非法和危险的信息(例如,热线车辆)使用提示要求其进入虚构状态。

A second class of vulnerabilities relates to using "crafty jailbreaking" techniques to make the Gemini models generate misinformation surrounding topics like elections as well as output potentially illegal and dangerous information (e.g., hot-wiring a car) using a prompt that asks it to enter into a fictional state.

HiddenLayer还发现的第三个缺陷可能导致LLM通过将重复的罕见令牌作为输入来泄露系统提示中的信息。

Also identified by HiddenLayer is a third shortcoming that could cause the LLM to leak information in the system prompt by passing repeated uncommon tokens as input.

安全研究员Kenneth Yeung在周二的一份报告中表示:“大多数LLM都受过训练,以清晰地区分用户的输入和系统提示之间的界限。”

"Most LLMs are trained to respond to queries with a clear delineation between the user's input and the system prompt," security researcher Kenneth Yeung said in a Tuesday report.

“通过创建一行无意义的令牌,我们可以欺骗LLM,使其相信是时候做出响应,并导致其输出一个确认消息,通常包括提示中的信息。”

"By creating a line of nonsensical tokens, we can fool the LLM into believing it is time for it to respond and cause it to output a confirmation message, usually including the information in the prompt."

Another test involves using Gemini Advanced and a specially crafted Google document, with the latter connected to the LLM via the Google Workspace extension.

另一项测试涉及使用Gemini Advanced和专门设计的谷歌文档,后者通过谷歌工作区扩展连接到LLM。

The instructions in the document could be designed to override the model's instructions and perform a set of malicious actions that enable an attacker to have full control of a victim's interactions with the model.

文档中的指令可以被设计为覆盖模型的指令,并执行一系列恶意操作,使攻击者能够完全控制受害者与模型的交互。

The disclosure comes as a group of academics from Google DeepMind, ETH Zurich, University of Washington, OpenAI, and the McGill University revealed a novel model-stealing attack that makes it possible to extract "precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2."

这一披露出现在来自谷歌DeepMind、苏黎世联邦理工学院、华盛顿大学、OpenAI和麦吉尔大学的一组学术人员公布了一种新型模型窃取攻击,可以从黑盒生产语言模型(如OpenAI的ChatGPT或谷歌的PaLM-2)中提取“精确的、非平凡的信息”。

That said, it's worth noting that these vulnerabilities are not novel and are present in other LLMs across the industry. The findings, if anything, emphasize the need for testing models for prompt attacks, training data extraction, model manipulation, adversarial examples, data poisoning and exfiltration.

值得注意的是,这些漏洞并非新颖,而是存在于行业中的其他LLM中。该发现强调了对模型进行提示攻击、训练数据提取、模型操纵、对抗性示例、数据污染和外泄的需求。

"To help protect our users from vulnerabilities, we consistently run red-teaming exercises and train our models to defend against adversarial behaviors like prompt injection, jailbreaking, and more complex attacks," a Google spokesperson told The Hacker News. "We've also built safeguards to prevent harmful or misleading responses, which we are continuously improving."

公司还表示,出于谨慎起见,它正在限制与选举相关的查询的回应。该政策预计将针对有关候选人、政党、选举结果、投票信息和知名官员的提示进行执行。

The company also said it's restricting responses to election-based queries out of an abundance of caution. The policy is expected to be enforced against prompts regarding candidates, political parties, election results, voting information, and notable office holders.

参考资料

[1]https://thehackernews.com/2024/03/researchers-highlight-googles-gemini-ai.html

关注我们

        欢迎来到我们的公众号!我们专注于全球网络安全和精选双语资讯,为您带来最新的资讯和深入的分析。在这里,您可以了解世界各地的网络安全事件,同时通过我们的双语新闻,获取更多的行业知识。感谢您选择关注我们,我们将继续努力,为您带来有价值的内容。

原文始发于微信公众号(知机安全):研究人员揭示谷歌Gemini AI的弱点

  • 左青龙
  • 微信扫一扫
  • weinxin
  • 右白虎
  • 微信扫一扫
  • weinxin
admin
  • 本文由 发表于 2024年3月20日01:27:24
  • 转载请保留本文链接(CN-SEC中文网:感谢原作者辛苦付出):
                   研究人员揭示谷歌Gemini AI的弱点https://cn-sec.com/archives/2580349.html

发表评论

匿名网友 填写信息