正则小技巧和数据过滤处理

admin 2022年1月6日01:09:53评论44 views字数 3022阅读10分4秒阅读模式

正则基础

首先讲一下正则的规则只需要管What How即可 其他无需考虑

匹配字符(What)

  • .
  • [abcd] [a-zA-Z] [^abcd]
  • \d \s \t \w

匹配数量(How)

  • *
  • +
  • ?
  • {n}
  • {n,}
  • {n,m}

子匹配

  • ()

其他

  • a* Greedy 贪婪
  • a*? Lazy
  • |
  • ^ $

用法1

使用数据如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2	wangwei	200	false	false	791	true	
8 wangjing 200 false false 791 true
72 lihong 200 false false 791 true
94 wangxin 200 false false 791 true
107 liujuan 200 false false 791 true
119 zhangbo 200 false false 791 true
145 zhanghao 200 false false 791 true
169 zhangbin 200 false false 791 true
185 wangjing 200 false false 791 true
224 liuxin 200 false false 791 true
260 yanglin 200 false false 790 true
354 likai 200 false false 791 true
390 lixiang 200 false false 791 true
435 zhangbo 200 false false 791 true
436 wangxin 200 false false 791 true

推荐一个网站

https://regex101.com/

打开进行正则匹配

image-20210105210427307

直接生成Python脚本

image-20210105210557490

复制对应代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(\d+\s+)(\w+)(\s\d+\s\w+\s\w+\s\d+\s\w+)"

test_str = ("2 wangwei 200 false false 791 true \n"
"8 wangjing 200 false false 791 true \n"
"72 lihong 200 false false 791 true \n"
"94 wangxin 200 false false 791 true \n"
"107 liujuan 200 false false 791 true \n"
"119 zhangbo 200 false false 791 true \n"
"145 zhanghao 200 false false 791 true \n"
"169 zhangbin 200 false false 791 true \n"
"185 wangjing 200 false false 791 true \n"
"224 liuxin 200 false false 791 true \n"
"260 yanglin 200 false false 790 true \n"
"354 likai 200 false false 791 true \n"
"390 lixiang 200 false false 791 true \n"
"435 zhangbo 200 false false 791 true \n"
"436 wangxin 200 false false 791 true ")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1

print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

修改为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(\d+\s+)(\w+)(\s\d+\s\w+\s\w+\s\d+\s\w+)"

test_str = ("2 wangwei 200 false false 791 true \n"
"8 wangjing 200 false false 791 true \n"
"72 lihong 200 false false 791 true \n"
"94 wangxin 200 false false 791 true \n"
"107 liujuan 200 false false 791 true \n"
"119 zhangbo 200 false false 791 true \n"
"145 zhanghao 200 false false 791 true \n"
"169 zhangbin 200 false false 791 true \n"
"185 wangjing 200 false false 791 true \n"
"224 liuxin 200 false false 791 true \n"
"260 yanglin 200 false false 790 true \n"
"354 likai 200 false false 791 true \n"
"390 lixiang 200 false false 791 true \n"
"435 zhangbo 200 false false 791 true \n"
"436 wangxin 200 false false 791 true ")

matches = re.finditer(regex, test_str, re.MULTILINE)

for i in matches:
print(i.group(2))

image-20210105210650979

用法2

使用vim进行正则匹配

1
2
:s/abc/def/g 用def替换abc,g为global全局的意思
%s/200.*//g

image-20210105210939127

1
%s/\(\d\+\s\)//g

image-20210105210951458

参考链接:

https://regex101.com/

https://www.runoob.com/python/python-reg-expressions.html

https://www.cnblogs.com/penseur/archive/2011/02/25/1964522.html

https://gist.github.com/JavaCS3/e36e494e78a02049950bfa7c7ebeb929

FROM :ol4three.com | Author:ol4three

免责声明:文章中涉及的程序(方法)可能带有攻击性,仅供安全研究与教学之用,读者将其信息做其他用途,由读者承担全部法律及连带责任,本站不承担任何法律及连带责任;如有问题可邮件联系(建议使用企业邮箱或有效邮箱,避免邮件被拦截,联系方式见首页),望知悉。
  • 左青龙
  • 微信扫一扫
  • weinxin
  • 右白虎
  • 微信扫一扫
  • weinxin
admin
  • 本文由 发表于 2022年1月6日01:09:53
  • 转载请保留本文链接(CN-SEC中文网:感谢原作者辛苦付出):
                   正则小技巧和数据过滤处理https://cn-sec.com/archives/721212.html
                  免责声明:文章中涉及的程序(方法)可能带有攻击性,仅供安全研究与教学之用,读者将其信息做其他用途,由读者承担全部法律及连带责任,本站不承担任何法律及连带责任;如有问题可邮件联系(建议使用企业邮箱或有效邮箱,避免邮件被拦截,联系方式见首页),望知悉.

发表评论

匿名网友 填写信息