正则小技巧和数据过滤处理

admin 2022年1月6日01:09:53评论43 views字数 3022阅读10分4秒阅读模式

正则基础

首先讲一下正则的规则只需要管What How即可 其他无需考虑

匹配字符(What)

  • .
  • [abcd] [a-zA-Z] [^abcd]
  • \d \s \t \w

匹配数量(How)

  • *
  • +
  • ?
  • {n}
  • {n,}
  • {n,m}

子匹配

  • ()

其他

  • a* Greedy 贪婪
  • a*? Lazy
  • |
  • ^ $

用法1

使用数据如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2	wangwei	200	false	false	791	true	
8 wangjing 200 false false 791 true
72 lihong 200 false false 791 true
94 wangxin 200 false false 791 true
107 liujuan 200 false false 791 true
119 zhangbo 200 false false 791 true
145 zhanghao 200 false false 791 true
169 zhangbin 200 false false 791 true
185 wangjing 200 false false 791 true
224 liuxin 200 false false 791 true
260 yanglin 200 false false 790 true
354 likai 200 false false 791 true
390 lixiang 200 false false 791 true
435 zhangbo 200 false false 791 true
436 wangxin 200 false false 791 true

推荐一个网站

https://regex101.com/

打开进行正则匹配

image-20210105210427307

直接生成Python脚本

image-20210105210557490

复制对应代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(\d+\s+)(\w+)(\s\d+\s\w+\s\w+\s\d+\s\w+)"

test_str = ("2 wangwei 200 false false 791 true \n"
"8 wangjing 200 false false 791 true \n"
"72 lihong 200 false false 791 true \n"
"94 wangxin 200 false false 791 true \n"
"107 liujuan 200 false false 791 true \n"
"119 zhangbo 200 false false 791 true \n"
"145 zhanghao 200 false false 791 true \n"
"169 zhangbin 200 false false 791 true \n"
"185 wangjing 200 false false 791 true \n"
"224 liuxin 200 false false 791 true \n"
"260 yanglin 200 false false 790 true \n"
"354 likai 200 false false 791 true \n"
"390 lixiang 200 false false 791 true \n"
"435 zhangbo 200 false false 791 true \n"
"436 wangxin 200 false false 791 true ")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1

print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

修改为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(\d+\s+)(\w+)(\s\d+\s\w+\s\w+\s\d+\s\w+)"

test_str = ("2 wangwei 200 false false 791 true \n"
"8 wangjing 200 false false 791 true \n"
"72 lihong 200 false false 791 true \n"
"94 wangxin 200 false false 791 true \n"
"107 liujuan 200 false false 791 true \n"
"119 zhangbo 200 false false 791 true \n"
"145 zhanghao 200 false false 791 true \n"
"169 zhangbin 200 false false 791 true \n"
"185 wangjing 200 false false 791 true \n"
"224 liuxin 200 false false 791 true \n"
"260 yanglin 200 false false 790 true \n"
"354 likai 200 false false 791 true \n"
"390 lixiang 200 false false 791 true \n"
"435 zhangbo 200 false false 791 true \n"
"436 wangxin 200 false false 791 true ")

matches = re.finditer(regex, test_str, re.MULTILINE)

for i in matches:
print(i.group(2))

image-20210105210650979

用法2

使用vim进行正则匹配

1
2
:s/abc/def/g 用def替换abc,g为global全局的意思
%s/200.*//g

image-20210105210939127

1
%s/\(\d\+\s\)//g

image-20210105210951458

参考链接:

https://regex101.com/

https://www.runoob.com/python/python-reg-expressions.html

https://www.cnblogs.com/penseur/archive/2011/02/25/1964522.html

https://gist.github.com/JavaCS3/e36e494e78a02049950bfa7c7ebeb929

FROM :ol4three.com | Author:ol4three

  • 左青龙
  • 微信扫一扫
  • weinxin
  • 右白虎
  • 微信扫一扫
  • weinxin
admin
  • 本文由 发表于 2022年1月6日01:09:53
  • 转载请保留本文链接(CN-SEC中文网:感谢原作者辛苦付出):
                   正则小技巧和数据过滤处理http://cn-sec.com/archives/721212.html

发表评论

匿名网友 填写信息