使用古老的 XML 绕过 DOMPurify

2024年4月22日02:58:49评论2 views字数 5599阅读18分39秒阅读模式

这篇文章详细介绍了在使用 DOMPurify 对 XML 文档进行清理时发现的绕过漏洞，并介绍了作者发现的两种新的 XML/HTML 混淆绕过方法。文章解释了 HTML 和 XML 解析规则之间的差异，指出了这种差异导致对处理指令的不同处理方式，从而为绕过 DOMPurify 提供了机会。作者还提及了用于修复这些漏洞的补丁，并探讨了如何进一步利用 CDATA 节段来绕过 DOMPurify。

介绍

大家好，我是 RyotaK ( @ryotkak )，Flatt Security Inc. 的安全工程师。

最近，@slonser_ 在使用 DOMPurify 清理 XML 文档时发现了一种绕过方法。查看该补丁后，我发现了另外两个绕过 XML/HTML 混淆的方法，因此我将其记录在此处。

HTML != XML

正如@slonser_ 在他的文章中所写，HTML 和 XML 的解析规则略有不同。

例如，以下文本在 XML 解析器中被解析为单个节点，但 HTML 解析器可识别该h1标记。

<?xml-stylesheet ><h1>Hello</h1>)"> ?>

这是因为 XML 定义了处理指令的结构如下：

https://www.w3.org/TR/xml/#sec-pi

'<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'

然而，当HTML遇到以下情况时，就会进入伪注释状态<?：

https://html.spec.whatwg.org/#tag-open-state

U+003F QUESTION MARK (?)

This is an unexpected-question-mark-instead-of-tag-name parse error. Create a comment token whose data is the empty string. Reconsume in the bogus comment state.

由于虚假注释状态使用>而不是?>结束标记，因此 HTML 解析器和 XML 解析器解析处理指令的方式不匹配。

https://html.spec.whatwg.org/#bogus-comment-state

U+003E GREATER-THAN SIGN (>)
   Switch to the data state. Emit the current comment token.

由于这种差异，如果稍后在 HTML 文档中使用经过净化的 XML 文档，则注入处理指令可以绕过净化器。

由于 DOMPurify 没有扫描处理指令，@slonser_ 设法通过插入以下有效负载来绕过过滤程序：

<?xml-stylesheet > <img src=x onerror="alert('DOMPurify bypassed!!!')"> ?>

看一下补丁

为了正确处理处理指令，DOMPurify 应用了以下补丁：

diff --git a/src/purify.js b/src/purify.js
index 4594ba09..5b7bc2aa 100644
--- a/src/purify.js
+++ b/src/purify.js
@@ -909,7 +909,10 @@ function createDOMPurify(window = getGlobal()) {
       root.ownerDocument || root,
       root,
       // eslint-disable-next-line no-bitwise
-      NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT,
+      NodeFilter.SHOW_ELEMENT |
+        NodeFilter.SHOW_COMMENT |
+        NodeFilter.SHOW_TEXT |
+        NodeFilter.SHOW_PROCESSING_INSTRUCTION,
       null
     );
   };

指定该NodeFilter.SHOW_PROCESSING_INSTRUCTION选项后，DOMPurify 现在可以正确扫描处理指令，如果不允许，则将其删除。那么，这个补丁可能有什么问题呢？

令人困惑的节点名称

事实证明，处理指令返回在中指定的<?tag值nodeName。

https://dom.spec.whatwg.org/#dom-node-nodename

The nodeName getter steps are to return the first matching statement, switching on the interface this implements:
[...]
ProcessingInstruction
    Its target.

例如，当访问可表示为的处理指令属性tag时，将返回。nodeName<?tag ?>

由于 DOMPurify 高度依赖nodeName节点来确定是否允许该节点，因此在清理节点时会造成混乱：

src/purify.js 第 992-1013 行

/* Now let's check the element's type and name */
    const tagName = transformCaseFunc(currentNode.nodeName);
    [...]
    /* Remove element if anything forbids its presence */
    if (!ALLOWED_TAGS[tagName] || FORBID_TAGS[tagName]) {

再次使用处理指令绕过 DOMPurify

我们可以将任意节点名称与处理指令一起使用，因此我们要做的就是使用允许的标签名称创建处理指令。

例如，以下处理指令在清理为 XML 文档时绕过 DOMPurify：

<?img a ?>

正如我们之前看到的，HTML 和 XML 对处理指令的解析不一致。

因此，通过使用以下 XML，我们可以绕过 DOMPurify 并执行（alert(1)如果稍后在 HTML 文档中使用它）：

<?img ><img src onerror=alert(1)>?>

您可以通过在 DOMPurify 3.0.10 中使用以下脚本来确认：

document.documentElement.innerHTML = DOMPurify.sanitize("<?img ><img src onerror=alert(1)>?>", {PARSER_MEDIA_TYPE: "application/xhtml+xml"})

寻找另一条旁路

为了防止上述问题，应用以下补丁来删除所有处理指令。

diff --git a/src/purify.js b/src/purify.js
index 061ba1a8..1d984685 100644
--- a/src/purify.js
+++ b/src/purify.js
@@ -1009,6 +1009,12 @@ function createDOMPurify(window = getGlobal()) {
       return true;
     }
+    /* Remove any ocurrence of processing instructions */
+    if (currentNode.nodeType === 7) {
+      _forceRemove(currentNode);
+      return true;
+    }
+
     /* Remove element if anything forbids its presence */
     if (!ALLOWED_TAGS[tagName] || FORBID_TAGS[tagName]) {
       /* Check if we have a custom element to handle */

由于它完全删除了处理指令，因此不再可能使用处理指令的解析器不一致。

但是，还有其他不一致的解析吗？

阅读 XML 规范后，我注意到有一个有趣的部分：

https://www.w3.org/TR/xml/#sec-cdata-sect

CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters that would otherwise be recognized as markup. CDATA sections begin with the string " <![CDATA[ " and end with the string " ]]> "

对我来说幸运的是，CDATA 部分有一个单独的 NodeFilter 选项，该选项在 DOMPurify 上未启用。

https://dom.spec.whatwg.org/#callbackdef-nodefilter

const unsigned long SHOW_CDATA_SECTION = 0x8;

所以，我要做的就是找到 XML 和 HTML 解析器之间的不一致。

乍一看，HTML 解析器似乎以与 XML 兼容的方式解析 CDATA 部分：

https://html.spec.whatwg.org/#cdata-sections

CDATA sections must consist of the following components, in this order:
   1. The string "<![CDATA[".
   2. Optionally, text, with the additional restriction that the text must not contain the string "]]>".
   3. The string "]]>".

然而，经过进一步调查，结果发现 HTML 仅支持 SVG 和 MathML 命名空间内的 CDATA 部分，而不支持 HTML 命名空间中的 CDATA 部分。

https://html.spec.whatwg.org/#markup-declaration-open-state

The string "[CDATA[" (the five uppercase letters "CDATA" with a U+005B LEFT SQUARE BRACKET character before and after)

Consume those characters. If there is an adjusted current node and it is not an element in the HTML namespace, then switch to the CDATA section state. Otherwise, this is a cdata-in-html-content parse error. Create a comment token whose data is the "[CDATA[" string. Switch to the bogus comment state.

如果 CDATA 部分出现在 HTML 命名空间中，它将切换到虚假注释状态，该状态使用>代替]]>作为结束标记。

https://html.spec.whatwg.org/#bogus-comment-state

U+003E GREATER-THAN SIGN (>)
   Switch to the data state. Emit the current comment token.

因此，与处理指令类似，以下 XMLh1在使用 HTML 解析器解析时会创建标签：

<![CDATA[ ><h1>Hello</h1> ]]>

与处理指令一样，这种不一致允许使用以下有效负载绕过 DOMPurify：

<![CDATA[ ><img src onerror=alert(1)> ]]>

您可以通过在 DOMPurify 3.0.11 中使用以下脚本来确认：

document.documentElement.innerHTML = DOMPurify.sanitize("<![CDATA[ ><img src onerror=alert(1)> ]]>", {PARSER_MEDIA_TYPE: "application/xhtml+xml"})

为了修复这种不一致，DOMPurify 应用了以下补丁：

diff --git a/src/purify.js b/src/purify.js
index 1d984685..72c925a0 100644
--- a/src/purify.js
+++ b/src/purify.js
@@ -913,7 +913,8 @@ function createDOMPurify(window = getGlobal()) {
       NodeFilter.SHOW_ELEMENT |
         NodeFilter.SHOW_COMMENT |
         NodeFilter.SHOW_TEXT |
-        NodeFilter.SHOW_PROCESSING_INSTRUCTION,
+        NodeFilter.SHOW_PROCESSING_INSTRUCTION |
+        NodeFilter.SHOW_CDATA_SECTION,
       null
     );
   };

由于 CDATA 部分具有#cdata-section，因此nodeName不能像我在处理指令中所做的那样绕过此补丁，除非#cdata-section明确允许。

https://dom.spec.whatwg.org/#dom-node-nodename

The nodeName getter steps are to return the first matching statement, switching on the interface this implements:[...]CDATASection    "#cdata-section".

原文地址：

https://flatt.tech/research/posts/bypassing-dompurify-with-good-old-xml/

原文始发于微信公众号（Ots安全）：使用古老的 XML 绕过 DOMPurify

左青龙
微信扫一扫

右白虎
微信扫一扫

使用古老的 XML 绕过 DOMPurify

Linux 网络 ELI5 — 第 1 部分，网络和接口

暗网简介：Molerats

kalilinux虚拟机安装完整过程

THM平台|Dogcat靶机记录

Hackerone 被黑，看我如何窃取你的POC！

【免杀】记一次lnk钓鱼小技巧

网络虚拟化技术：VXLAN，与VLAN有啥区别？

ARP:地址解析协议

[AI安全论文] (32)南洋理工大学刘杨教授——网络空间安全和AIGC整合之道学习笔记及强推（InForSec）

任意用户登录漏洞挖掘思路

发表评论

在线咨询

微信