正则小技巧和数据过滤处理

admin 2022年1月6日01:09:53安全博客评论12 views3022字阅读10分4秒阅读模式

正则基础

首先讲一下正则的规则只需要管What How即可 其他无需考虑

匹配字符(What)

  • .
  • [abcd] [a-zA-Z] [^abcd]
  • \d \s \t \w

匹配数量(How)

  • *
  • +
  • ?
  • {n}
  • {n,}
  • {n,m}

子匹配

  • ()

其他

  • a* Greedy 贪婪
  • a*? Lazy
  • |
  • ^ $

用法1

使用数据如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2	wangwei	200	false	false	791	true	
8 wangjing 200 false false 791 true
72 lihong 200 false false 791 true
94 wangxin 200 false false 791 true
107 liujuan 200 false false 791 true
119 zhangbo 200 false false 791 true
145 zhanghao 200 false false 791 true
169 zhangbin 200 false false 791 true
185 wangjing 200 false false 791 true
224 liuxin 200 false false 791 true
260 yanglin 200 false false 790 true
354 likai 200 false false 791 true
390 lixiang 200 false false 791 true
435 zhangbo 200 false false 791 true
436 wangxin 200 false false 791 true

推荐一个网站

https://regex101.com/

打开进行正则匹配

image-20210105210427307

直接生成Python脚本

image-20210105210557490

复制对应代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(\d+\s+)(\w+)(\s\d+\s\w+\s\w+\s\d+\s\w+)"

test_str = ("2 wangwei 200 false false 791 true \n"
"8 wangjing 200 false false 791 true \n"
"72 lihong 200 false false 791 true \n"
"94 wangxin 200 false false 791 true \n"
"107 liujuan 200 false false 791 true \n"
"119 zhangbo 200 false false 791 true \n"
"145 zhanghao 200 false false 791 true \n"
"169 zhangbin 200 false false 791 true \n"
"185 wangjing 200 false false 791 true \n"
"224 liuxin 200 false false 791 true \n"
"260 yanglin 200 false false 790 true \n"
"354 likai 200 false false 791 true \n"
"390 lixiang 200 false false 791 true \n"
"435 zhangbo 200 false false 791 true \n"
"436 wangxin 200 false false 791 true ")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1

print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

修改为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(\d+\s+)(\w+)(\s\d+\s\w+\s\w+\s\d+\s\w+)"

test_str = ("2 wangwei 200 false false 791 true \n"
"8 wangjing 200 false false 791 true \n"
"72 lihong 200 false false 791 true \n"
"94 wangxin 200 false false 791 true \n"
"107 liujuan 200 false false 791 true \n"
"119 zhangbo 200 false false 791 true \n"
"145 zhanghao 200 false false 791 true \n"
"169 zhangbin 200 false false 791 true \n"
"185 wangjing 200 false false 791 true \n"
"224 liuxin 200 false false 791 true \n"
"260 yanglin 200 false false 790 true \n"
"354 likai 200 false false 791 true \n"
"390 lixiang 200 false false 791 true \n"
"435 zhangbo 200 false false 791 true \n"
"436 wangxin 200 false false 791 true ")

matches = re.finditer(regex, test_str, re.MULTILINE)

for i in matches:
print(i.group(2))

image-20210105210650979

用法2

使用vim进行正则匹配

1
2
:s/abc/def/g 用def替换abc,g为global全局的意思
%s/200.*//g

image-20210105210939127

1
%s/\(\d\+\s\)//g

image-20210105210951458

参考链接:

https://regex101.com/

https://www.runoob.com/python/python-reg-expressions.html

https://www.cnblogs.com/penseur/archive/2011/02/25/1964522.html

https://gist.github.com/JavaCS3/e36e494e78a02049950bfa7c7ebeb929

FROM :ol4three.com | Author:ol4three

特别标注: 本站(CN-SEC.COM)所有文章仅供技术研究,若将其信息做其他用途,由用户承担全部法律及连带责任,本站不承担任何法律及连带责任,请遵守中华人民共和国安全法.
  • 我的微信
  • 微信扫一扫
  • weinxin
  • 我的微信公众号
  • 微信扫一扫
  • weinxin
admin
  • 本文由 发表于 2022年1月6日01:09:53
  • 转载请保留本文链接(CN-SEC中文网:感谢原作者辛苦付出):
                  正则小技巧和数据过滤处理 http://cn-sec.com/archives/721212.html

发表评论

匿名网友 填写信息

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: