[YA-15] 使用Rust编写的Ripgrep代替grep

admin

145280
文章

119
评论

2024年10月7日19:01:56评论23 views字数 2941阅读9分48秒阅读模式

ripgrep is faster than {grep, ag, git grep, ucg, pt, sift}
Ripgrep 比 {grep， ag， git grep， ucg， pt， sift} 更快
Like other tools specialized to code search, ripgrep defaults to recursive directory search and won’t search files ignored by your .gitignore files. It also ignores hidden and binary files by default. ripgrep also implements full support for .gitignore, whereas there are many bugs related to that functionality in other code search tools claiming to provide the same functionality.
与其他专门用于代码搜索的工具一样，ripgrep 默认为递归目录搜索，不会搜索被 .gitignore 文件忽略的文件。默认情况下，它还会忽略隐藏文件和二进制文件。ripgrep 还实现了对的 .gitignore 完全支持，而在其他声称提供相同功能的代码搜索工具中有许多与该功能相关的错误。
ripgrep can search specific types of files. For example, rg -tpy foo limits your search to Python files and rg -Tjs foo excludes Javascript files from your search. ripgrep can be taught about new file types with custom matching rules.Ripgrep可以搜索特定类型的文件。例如， rg -tpy foo
将搜索限制为 Python 文件，并从 rg -Tjs foo 搜索中排除 Javascript 文件。Ripgrep 可以使用自定义匹配规则来学习新文件类型。
ripgrep supports many features found in grep, such as showing the context of search results, searching multiple patterns, highlighting matches with color and full Unicode support. Unlike GNU grep, ripgrep stays fast while supporting Unicode (which is always on).ripgrep
支持中的 grep 许多功能，例如显示搜索结果的上下文、搜索多个模式、突出显示带有颜色的匹配项以及完整的 Unicode 支持。与GNU grep不同，ripgrep在支持Unicode（始终打开）的同时保持快速。
ripgrep has optional support for switching its regex engine to use PCRE2. Among other things, this makes it possible to use look-around and backreferences in your patterns, which are not supported in ripgrep’s default regex engine. PCRE2 support is enabled with -P.ripgrep
可以选择将其正则表达式引擎切换为 PCRE2。除此之外，这使得在你的模式中使用环顾四周和反向引用成为可能，这在 ripgrep 的默认正则表达式引擎中不受支持。PCRE2 支持通过启用 -P 。
ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)ripgrep
支持以 UTF-8 以外的文本编码搜索文件，例如 UTF-16、拉丁-1、GBK、EUC-JP、Shift_JIS 等。（提供了对自动检测 UTF-16 的一些支持。其他文本编码必须使用 -E/--encoding 标志专门指定。
ripgrep supports searching files compressed in a common format (gzip, xz, lzma, bzip2 or lz4) with the -z/--search-zip flag.Ripgrep
支持使用 -z/--search-zip 标志搜索以通用格式（gzip，xz，lzma，bzip2或lz4）压缩的文件。
ripgrep supports arbitrary input preprocessing filters which could be PDF text extraction, less supported decompression, decrypting, automatic encoding detection and so on.ripgrep
支持任意输入预处理过滤器，可以是PDF文本提取，较少支持的解压缩，解密，自动编码检测等。

In other words, use ripgrep if you like speed, filtering by default, fewer bugs and Unicode support.

换句话说，如果你喜欢速度，使用ripgrep，默认过滤，更少的错误和Unicode支持。

例如：只寻找html和js文件，我在根目录下，我不指定任何文件，即可搜索到所有文件，加上多线程，速度非常快，支持unicode，而且更快。

[YA-15] 使用Rust编写的Ripgrep代替grep

而且比较其他匹配命令更快

[YA-15] 使用Rust编写的Ripgrep代替grep

它真的比其他一切都快吗？

总的来说，是的。有大量的基准测试，每个基准测试都有详细的分析。可在我的博客上找到.

总结一下， ripgrep 之所以快，是因为:

它建在 Rust 的正则表达式发动机. Rust 的正则表达式引擎使用有限自动机、SIMD 和激进的文字优化，使搜索速度非常快。（可选择使用 PCRE2 支持）使用 -P/--pcre2 标志。）
Rust 的正则表达式库通过以下方式在完全支持 Unicode 的情况下保持性能编译UTF-8 直接译码为其确定有限自动机引擎。
它支持使用内存映射或通过增量搜索进行搜索。有一个中间缓冲器。前者更适用于单文档和后者更适合于大型目录。ripgrep 选择最好的搜索自动为你制定策略。
在 .gitignore 文件中使用 RegexSet. 这意味着单个文件路径可以与多个 GLOB 模式进行匹配。同时。
它使用了一个无锁的并行递归目录迭代器，感谢 crossbeam 和 ignore.

参考：https://gitcode.gitcode.host/docs-cn/ripgrep-docs-cn/index.html

原文始发于微信公众号（Eonian Sharp）：[YA-15] 使用Rust编写的Ripgrep代替grep

免责声明:文章中涉及的程序(方法)可能带有攻击性，仅供安全研究与教学之用，读者将其信息做其他用途，由读者承担全部法律及连带责任，本站不承担任何法律及连带责任；如有问题可邮件联系(建议使用企业邮箱或有效邮箱,避免邮件被拦截，联系方式见首页)，望知悉。

左青龙
微信扫一扫

右白虎
微信扫一扫

[YA-15] 使用Rust编写的Ripgrep代替grep

PHP基础-数组相关函数

javaFx 安全开发三

『每周译Go』Go 新增模糊测试系统的内部原理

炼石计划之50套JavaWeb代码审计（二）：基于SpringBoot架构的OA系统

汇编语言Day04

记一次（咸鱼、转转、交易猫）假客服系统的代码审计

PHP基础-数组

Java 反序列化之 C3P0 链学习

ByteBrain团队VLDB25 | 面向不完美工作负载的无数据访问基数估计方法

网络安全对抗演练，扫描器 Nmap 原理概述（源码视角）

发表评论