文章目录:
-
一.RNN文本分类
1.RNN
2.文本分类
-
二.基于传统机器学习贝叶斯算法的文本分类
1.MultinomialNB+TFIDF文本分类
2.GaussianNB+Word2Vec文本分类
-
三.Keras实现RNN文本分类
1.IMDB数据集和序列预处理
2.词嵌入模型训练
3.RNN文本分类
-
四.RNN实现中文数据集的文本分类
1.RNN+Word2Vector文本分类
2.LSTM+Word2Vec文本分类
3.LSTM+TFIDF文本分类
4.机器学习和深度学习对比分析
-
五.总结
-
https://github.com/eastmountyxz/
AI-for-TensorFlow -
https://github.com/eastmountyxz/
AI-for-Keras
学Python近十年,认识了很多大佬和朋友,感恩。作者的本意是帮助更多初学者入门,因此在github开源了所有代码,也在公众号同步更新。深知自己很菜,得拼命努力前行,编程也没有什么捷径,干就对了。希望未来能更透彻学习和撰写文章,也能在读博几年里学会真正的独立科研。同时非常感谢参考文献中的大佬们的文章和分享。
- https://blog.csdn.net/eastmount
一.RNN文本分类
2.文本分类
二.基于传统机器学习的文本分类
# -*- coding: utf-8 -*-
"""
Created on Sat Mar 28 22:10:20 2020
@author: Eastmount CSDN
"""
from jieba import lcut
#--------------------------------载入数据及预处理-------------------------------
data = [
[0, '小米粥是以小米作为主要食材熬制而成的粥,口味清淡,清香味,具有简单易制,健胃消食的特点'],
[0, '煮粥时一定要先烧开水然后放入洗净后的小米'],
[0, '蛋白质及氨基酸、脂肪、维生素、矿物质'],
[0, '小米是传统健康食品,可单独焖饭和熬粥'],
[0, '苹果,是水果中的一种'],
[0, '粥的营养价值很高,富含矿物质和维生素,含钙量丰富,有助于代谢掉体内多余盐分'],
[0, '鸡蛋有很高的营养价值,是优质蛋白质、B族维生素的良好来源,还能提供脂肪、维生素和矿物质'],
[0, '这家超市的苹果都非常新鲜'],
[0, '在北方小米是主要食物之一,很多地区有晚餐吃小米粥的习俗'],
[0, '小米营养价值高,营养全面均衡 ,主要含有碳水化合物'],
[0, '蛋白质及氨基酸、脂肪、维生素、盐分'],
[1, '小米、三星、华为,作为安卓三大手机旗舰'],
[1, '别再管小米华为了!魅族手机再曝光:这次真的完美了'],
[1, '苹果手机或将重陷2016年困境,但这次它无法再大幅提价了'],
[1, '三星想要继续压制华为,仅凭A70还不够'],
[1, '三星手机屏占比将再创新高,超华为及苹果旗舰'],
[1, '华为P30、三星A70爆卖,斩获苏宁最佳手机营销奖'],
[1, '雷军,用一张图告诉你:小米和三星的差距在哪里'],
[1, '小米米聊APP官方Linux版上线,适配深度系统'],
[1, '三星刚刚更新了自家的可穿戴设备APP'],
[1, '华为、小米跨界并不可怕,可怕的打不破内心的“天花板”'],
]
#中文分析
X, Y = [' '.join(lcut(i[1])) for i in data], [i[0] for i in data]
print(X)
print(Y)
#['煮粥 时 一定 要 先烧 开水 然后 放入 洗净 后 的 小米', ...]
#--------------------------------------计算词频------------------------------------
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
#将文本中的词语转换为词频矩阵
vectorizer = CountVectorizer()
#计算个词语出现的次数
X_data = vectorizer.fit_transform(X)
print(X_data)
#获取词袋中所有文本关键词
word = vectorizer.get_feature_names()
print('【查看单词】')
for w in word:
print(w, end = " ")
else:
print("n")
#词频矩阵
print(X_data.toarray())
#将词频矩阵X统计成TF-IDF值
transformer = TfidfTransformer()
tfidf = transformer.fit_transform(X_data)
#查看数据结构 tfidf[i][j]表示i类文本中的tf-idf权重
weight = tfidf.toarray()
print(weight)
#--------------------------------------数据分析------------------------------------
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(weight, Y)
print(len(X_train), len(X_test))
print(len(y_train), len(y_test))
print(X_train)
#调用MultinomialNB分类器
clf = MultinomialNB().fit(X_train, y_train)
pre = clf.predict(X_test)
print("预测结果:", pre)
print("真实结果:", y_test)
print(classification_report(y_test, pre))
#--------------------------------------可视化分析------------------------------------
#降维绘制图形
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
pca = PCA(n_components=2)
newData = pca.fit_transform(weight)
print(newData)
L1 = [n[0] for n in newData]
L2 = [n[1] for n in newData]
plt.scatter(L1, L2, c=Y, s=200)
plt.show()
[
'煮粥 时 一定 要 先烧 开水 然后 放入 洗净 后 的 小米',
'蛋白质 及 氨基酸 、 脂肪 、 维生素 、 矿物质',
...
'三星 刚刚 更新 了 自家 的 可 穿戴 设备 APP',
'华为 、 小米 跨界 并 不 可怕 , 可怕 的 打 不破 内心 的 “ 天花板 ”']
[ ]
【查看单词】
2016 app linux p30 一定 一张 一种 三星 健康 ... 雷军 食品 食材 食物 魅族 鸡蛋
[ ]
[ ]
[ ]
...
[ ]
[ ]
[ ]]
15 6
15 6
[ ]
[ ]
[ ]
...
[ ]
[ ]
[ ]]
预测结果: [0 1 0 1 1 1]
真实结果: [0, 0, 0, 0, 1, 1]
precision recall f1-score support
0 1.00 0.50 0.67 4
1 0.50 1.00 0.67 2
accuracy 0.67 6
macro avg 0.75 0.75 0.67 6
weighted avg 0.83 0.67 0.67 6
2.GaussianNB+Word2Vec文本分类
# -*- coding: utf-8 -*-
"""
Created on Sat Mar 28 22:10:20 2020
@author: Eastmount CSDN
"""
from jieba import lcut
from numpy import zeros
from gensim.models import Word2Vec
from sklearn.model_selection import train_test_split
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
max_features = 20 #词向量维度
maxlen = 40 #序列最大长度
#--------------------------------载入数据及预处理-------------------------------
data = [
[0, '小米粥是以小米作为主要食材熬制而成的粥,口味清淡,清香味,具有简单易制,健胃消食的特点'],
[0, '煮粥时一定要先烧开水然后放入洗净后的小米'],
[0, '蛋白质及氨基酸、脂肪、维生素、矿物质'],
[0, '小米是传统健康食品,可单独焖饭和熬粥'],
[0, '苹果,是水果中的一种'],
[0, '粥的营养价值很高,富含矿物质和维生素,含钙量丰富,有助于代谢掉体内多余盐分'],
[0, '鸡蛋有很高的营养价值,是优质蛋白质、B族维生素的良好来源,还能提供脂肪、维生素和矿物质'],
[0, '这家超市的苹果都非常新鲜'],
[0, '在北方小米是主要食物之一,很多地区有晚餐吃小米粥的习俗'],
[0, '小米营养价值高,营养全面均衡 ,主要含有碳水化合物'],
[0, '蛋白质及氨基酸、脂肪、维生素、盐分'],
[1, '小米、三星、华为,作为安卓三大手机旗舰'],
[1, '别再管小米华为了!魅族手机再曝光:这次真的完美了'],
[1, '苹果手机或将重陷2016年困境,但这次它无法再大幅提价了'],
[1, '三星想要继续压制华为,仅凭A70还不够'],
[1, '三星手机屏占比将再创新高,超华为及苹果旗舰'],
[1, '华为P30、三星A70爆卖,斩获苏宁最佳手机营销奖'],
[1, '雷军,用一张图告诉你:小米和三星的差距在哪里'],
[1, '小米米聊APP官方Linux版上线,适配深度系统'],
[1, '三星刚刚更新了自家的可穿戴设备APP'],
[1, '华为、小米跨界并不可怕,可怕的打不破内心的“天花板”'],
]
#中文分析
X, Y = [lcut(i[1]) for i in data], [i[0] for i in data]
#划分训练集和预测集
X_train, X_test, y_train, y_test = train_test_split(X, Y)
#print(X_train)
print(len(X_train), len(X_test))
print(len(y_train), len(y_test))
"""['三星', '刚刚', '更新', '了', '自家', '的', '可', '穿戴', '设备', 'APP']"""
#--------------------------------Word2Vec词向量-------------------------------
word2vec = Word2Vec(X_train, size=max_features, min_count=1) #最大特征 最低过滤频次1
print(word2vec)
#映射特征词
w2i = {w:i for i, w in enumerate(word2vec.wv.index2word)}
print("【显示词语】")
print(word2vec.wv.index2word)
print(w2i)
"""['小米', '三星', '是', '维生素', '蛋白质', '及', 'APP', '氨基酸',..."""
"""{',': 0, '的': 1, '小米': 2, '、': 3, '华为': 4, ....}"""
#词向量计算
vectors = word2vec.wv.vectors
print("【词向量矩阵】")
print(vectors.shape)
print(vectors)
#自定义函数-获取词向量
def w2v(w):
i = w2i.get(w)
return vectors[i] if i else zeros(max_features)
#自定义函数-序列预处理
def pad(ls_of_words):
a = [[w2v(i) for i in x] for x in ls_of_words]
a = pad_sequences(a, maxlen, dtype='float')
return a
#序列化处理 转换为词向量
X_train, X_test = pad(X_train), pad(X_test)
print(X_train.shape)
print(X_test.shape)
"""(15, 40, 20) 15个样本 40个特征 每个特征用20词向量表示"""
#拉直形状 (15, 40, 20)=>(15, 40*20) (6, 40, 20)=>(6, 40*20)
X_train = X_train.reshape(len(y_train), maxlen*max_features)
X_test = X_test.reshape(len(y_test), maxlen*max_features)
print(X_train.shape)
print(X_test.shape)
#--------------------------------建模与训练-------------------------------
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
#调用GaussianNB分类器
clf = GaussianNB().fit(X_train, y_train)
pre = clf.predict(X_test)
print("预测结果:", pre)
print("真实结果:", y_test)
print(classification_report(y_test, pre))
15 6
15 6
Word2Vec(vocab=126, size=20, alpha=0.025)
【显示词语】
[',', '、', '小米', '的', '华为', '手机', '苹果', '维生素', 'APP', '官方', 'Linux', ... '安卓三大', '旗舰']
{',': 0, '、': 1, '小米': 2, '的': 3, '华为': 4, '手机': 5, '苹果': 6, ..., '安卓三大': 124, '旗舰': 125}
【词向量矩阵】
(126, 20)
[[ 0.02041552 -0.00929706 -0.00743623 ... -0.00246041 -0.00825108
0.02341811]
[-0.00256093 -0.01301112 -0.00697959 ... -0.00449076 -0.00551124
-0.00240511]
[ 0.01535473 0.01690796 -0.00262145 ... -0.01624218 0.00871249
-0.01159615]
...
[ 0.00631155 0.00369085 -0.00382834 ... 0.02468265 0.00945442
-0.0155745 ]
[-0.01198495 0.01711261 0.01097644 ... 0.01003117 0.01074963
0.01960118]
[ 0.00450704 -0.01114052 0.0186879 ... 0.00804681 0.01060277
0.01836049]]
(15, 40, 20)
(6, 40, 20)
(15, 800)
(6, 800)
预测结果: [1 1 1 0 1 0]
真实结果: [0, 1, 1, 0, 1, 0]
precision recall f1-score support
0 1.00 0.67 0.80 3
1 0.75 1.00 0.86 3
accuracy 0.83 6
macro avg 0.88 0.83 0.83 6
weighted avg 0.88 0.83 0.83 6
三.Keras实现RNN文本分类
[list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, ...])
list([1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, ...])
list([1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 5974, 54, 61, 369, 13, 71, 149, 14, 22, 112, 4, 2401, 311, 12, 16, 3711, 33, 75, 43, 1829, 296, 4, 86, 320, 35, ...])
...
list([1, 11, 6, 230, 245, 6401, 9, 6, 1225, 446, 2, 45, 2174, 84, 8322, 4007, 21, 4, 912, 84, 14532, 325, 725, 134, 15271, 1715, 84, 5, 36, 28, 57, 1099, 21, 8, 140, ...])
list([1, 1446, 7079, 69, 72, 3305, 13, 610, 930, 8, 12, 582, 23, 5, 16, 484, 685, 54, 349, 11, 4120, 2959, 45, 58, 1466, 13, 197, 12, 16, 43, 23, 2, 5, 62, 30, 145, ...])
list([1, 17, 6, 194, 337, 7, 4, 204, 22, 45, 254, 8, 106, 14, 123, 4, 12815, 270, 14437, 5, 16923, 12255, 732, 2098, 101, 405, 39, 14, 1034, 4, 1310, 9, 115, 50, 305, ...])] train sequences
{"fawn": 34701, "tsukino": 52006,..., "paget": 18509, "expands": 20597}
keras.preprocessing.sequence.pad_sequences(
maxlen=None,
dtype='int32',
padding='pre',
truncating='pre',
value=0.
)
from keras.preprocessing.sequence import pad_sequences
print(pad_sequences([[1, 2, 3], [1]], maxlen=2))
"""[[2 3] [0 1]]"""
print(pad_sequences([[1, 2, 3], [1]], maxlen=3, value=9))
"""[[1 2 3] [9 9 1]]"""
print(pad_sequences([[2,3,4]], maxlen=10))
"""[[0 0 0 0 0 0 0 2 3 4]]"""
print(pad_sequences([[1,2,3,4,5],[6,7]], maxlen=10))
"""[[0 0 0 0 0 1 2 3 4 5] [0 0 0 0 0 0 0 0 6 7]]"""
print(pad_sequences([[1, 2, 3], [1]], maxlen=2, padding='post'))
"""结束位置补: [[2 3] [1 0]]"""
print(pad_sequences([[1, 2, 3], [1]], maxlen=4, truncating='post'))
"""起始位置补: [[0 1 2 3] [0 0 0 1]]"""
"下 雨 我 加班"]) > tokenizer.texts_to_sequences([
[[4, 5, 6, 7]]
"下 雨 我 加班"]), maxlen=20) > keras.preprocessing.sequence.pad_sequences(tokenizer.texts_to_sequences([
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 6, 7]],dtype=int32)
2.词嵌入模型训练
# -*- coding: utf-8 -*-
"""
Created on Sat Mar 28 17:08:28 2020
@author: Eastmount CSDN
"""
from keras.datasets import imdb #Movie Database
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Flatten, Embedding
#-----------------------------------定义参数-----------------------------------
max_features = 20000 #按词频大小取样本前20000个词
input_dim = max_features #词库大小 必须>=max_features
maxlen = 80 #句子最大长度
batch_size = 128 #batch数量
output_dim = 40 #词向量维度
epochs = 2 #训练批次
#--------------------------------载入数据及预处理-------------------------------
#数据获取
(trainX, trainY), (testX, testY) = imdb.load_data(path="imdb.npz", num_words=max_features)
print(trainX.shape, trainY.shape) #(25000,) (25000,)
print(testX.shape, testY.shape) #(25000,) (25000,)
#序列截断或补齐为等长
trainX = sequence.pad_sequences(trainX, maxlen=maxlen)
testX = sequence.pad_sequences(testX, maxlen=maxlen)
print('trainX shape:', trainX.shape)
print('testX shape:', testX.shape)
#------------------------------------创建模型------------------------------------
model = Sequential()
#词嵌入:词库大小、词向量维度、固定序列长度
model.add(Embedding(input_dim, output_dim, input_length=maxlen))
#平坦化: maxlen*output_dim
model.add(Flatten())
#输出层: 2分类
model.add(Dense(units=1, activation='sigmoid'))
#RMSprop优化器 二元交叉熵损失
model.compile('rmsprop', 'binary_crossentropy', ['acc'])
#训练
model.fit(trainX, trainY, batch_size, epochs)
#模型可视化
model.summary()
(25000,) (25000,)
(25000,) (25000,)
trainX shape: (25000, 80)
testX shape: (25000, 80)
Epoch 1/2
25000/25000 [==============================] - 2s 98us/step - loss: 0.6111 - acc: 0.6956
Epoch 2/2
25000/25000 [==============================] - 2s 69us/step - loss: 0.3578 - acc: 0.8549
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, 80, 40) 800000
_________________________________________________________________
flatten_2 (Flatten) (None, 3200) 0
_________________________________________________________________
dense_2 (Dense) (None, 1) 3201
=================================================================
Total params: 803,201
Trainable params: 803,201
Non-trainable params: 0
_________________________________________________________________
3.RNN文本分类
# -*- coding: utf-8 -*-
"""
Created on Sat Mar 28 17:08:28 2020
@author: Eastmount CSDN
"""
from keras.datasets import imdb #Movie Database
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Flatten, Embedding
from keras.layers import SimpleRNN
#-----------------------------------定义参数-----------------------------------
max_features = 20000 #按词频大小取样本前20000个词
input_dim = max_features #词库大小 必须>=max_features
maxlen = 40 #句子最大长度
batch_size = 128 #batch数量
output_dim = 40 #词向量维度
epochs = 3 #训练批次
units = 32 #RNN神经元数量
#--------------------------------载入数据及预处理-------------------------------
#数据获取
(trainX, trainY), (testX, testY) = imdb.load_data(path="imdb.npz", num_words=max_features)
print(trainX.shape, trainY.shape) #(25000,) (25000,)
print(testX.shape, testY.shape) #(25000,) (25000,)
#序列截断或补齐为等长
trainX = sequence.pad_sequences(trainX, maxlen=maxlen)
testX = sequence.pad_sequences(testX, maxlen=maxlen)
print('trainX shape:', trainX.shape)
print('testX shape:', testX.shape)
#-----------------------------------创建RNN模型-----------------------------------
model = Sequential()
#词嵌入 词库大小、词向量维度、固定序列长度
model.add(Embedding(input_dim, output_dim, input_length=maxlen))
#RNN Cell
model.add(SimpleRNN(units, return_sequences=True)) #返回序列全部结果
model.add(SimpleRNN(units, return_sequences=False)) #返回序列最尾结果
#输出层 2分类
model.add(Dense(units=1, activation='sigmoid'))
#模型可视化
model.summary()
#-----------------------------------建模与训练-----------------------------------
#激活神经网络
model.compile(optimizer = 'rmsprop', #RMSprop优化器
loss = 'binary_crossentropy', #二元交叉熵损失
metrics = ['accuracy'] #计算误差或准确率
)
#训练
history = model.fit(trainX,
trainY,
batch_size=batch_size,
epochs=epochs,
verbose=2,
validation_split=.1 #取10%样本作验证
)
#-----------------------------------预测与可视化-----------------------------------
import matplotlib.pyplot as plt
accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
plt.plot(range(epochs), accuracy)
plt.plot(range(epochs), val_accuracy)
plt.show()
(25000,) (25000,)
(25000,) (25000,)
trainX shape: (25000, 40)
testX shape: (25000, 40)
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, 40, 40) 800000
_________________________________________________________________
simple_rnn_3 (SimpleRNN) (None, 40, 32) 2336
_________________________________________________________________
simple_rnn_4 (SimpleRNN) (None, 32) 2080
_________________________________________________________________
dense_2 (Dense) (None, 1) 33
=================================================================
Total params: 804,449
Trainable params: 804,449
Non-trainable params: 0
_________________________________________________________________
Train on 22500 samples, validate on 2500 samples
Epoch 1/3
- 11s - loss: 0.5741 - accuracy: 0.6735 - val_loss: 0.4462 - val_accuracy: 0.7876
Epoch 2/3
- 14s - loss: 0.3572 - accuracy: 0.8430 - val_loss: 0.4928 - val_accuracy: 0.7616
Epoch 3/3
- 12s - loss: 0.2329 - accuracy: 0.9075 - val_loss: 0.5050 - val_accuracy: 0.7844
四.RNN实现文本分类
data = [
[0, '小米粥是以小米作为主要食材熬制而成的粥,口味清淡,清香味,具有简单易制,健胃消食的特点'],
[0, '煮粥时一定要先烧开水然后放入洗净后的小米'],
[0, '蛋白质及氨基酸、脂肪、维生素、矿物质'],
[0, '小米是传统健康食品,可单独焖饭和熬粥'],
[0, '苹果,是水果中的一种'],
[0, '粥的营养价值很高,富含矿物质和维生素,含钙量丰富,有助于代谢掉体内多余盐分'],
[0, '鸡蛋有很高的营养价值,是优质蛋白质、B族维生素的良好来源,还能提供脂肪、维生素和矿物质'],
[0, '这家超市的苹果都非常新鲜'],
[0, '在北方小米是主要食物之一,很多地区有晚餐吃小米粥的习俗'],
[0, '小米营养价值高,营养全面均衡 ,主要含有碳水化合物'],
[0, '蛋白质及氨基酸、脂肪、维生素、盐分'],
[1, '小米、三星、华为,作为安卓三大手机旗舰'],
[1, '别再管小米华为了!魅族手机再曝光:这次真的完美了'],
[1, '苹果手机或将重陷2016年困境,但这次它无法再大幅提价了'],
[1, '三星想要继续压制华为,仅凭A70还不够'],
[1, '三星手机屏占比将再创新高,超华为及苹果旗舰'],
[1, '华为P30、三星A70爆卖,斩获苏宁最佳手机营销奖'],
[1, '雷军,用一张图告诉你:小米和三星的差距在哪里'],
[1, '小米米聊APP官方Linux版上线,适配深度系统'],
[1, '三星刚刚更新了自家的可穿戴设备APP'],
[1, '华为、小米跨界并不可怕,可怕的打不破内心的“天花板”'],
]
#中文分析
X, Y = [lcut(i[1]) for i in data], [i[0] for i in data]
#划分训练集和预测集
X_train, X_test, y_train, y_test = train_test_split(X, Y)
#print(X_train)
print(len(X_train), len(X_test))
print(len(y_train), len(y_test))
"""['三星', '刚刚', '更新', '了', '自家', '的', '可', '穿戴', '设备', 'APP']"""
#--------------------------------Word2Vec词向量-------------------------------
word2vec = Word2Vec(X_train, size=max_features, min_count=1) #最大特征 最低过滤频次1
print(word2vec)
#映射特征词
w2i = {w:i for i, w in enumerate(word2vec.wv.index2word)}
print("【显示词语】")
print(word2vec.wv.index2word)
print(w2i)
"""['小米', '三星', '是', '维生素', '蛋白质', '及', 'APP', '氨基酸',..."""
"""{',': 0, '的': 1, '小米': 2, '、': 3, '华为': 4, ....}"""
#词向量计算
vectors = word2vec.wv.vectors
print("【词向量矩阵】")
print(vectors.shape)
print(vectors)
#自定义函数-获取词向量
def w2v(w):
i = w2i.get(w)
return vectors[i] if i else zeros(max_features)
#自定义函数-序列预处理
def pad(ls_of_words):
a = [[w2v(i) for i in x] for x in ls_of_words]
a = pad_sequences(a, maxlen, dtype='float')
return a
#序列化处理 转换为词向量
X_train, X_test = pad(X_train), pad(X_test)
15 6
15 6
Word2Vec(vocab=120, size=20, alpha=0.025)
【显示词语】
[',', '的', '、', '小米', '三星', '是', '维生素', '蛋白质', '及',
'脂肪', '华为', '苹果', '可', 'APP', '氨基酸', '在', '手机', '旗舰',
'矿物质', '主要', '有', '小米粥', '作为', '刚刚', '更新', '设备', ...]
{',': 0, '的': 1, '、': 2, '小米': 3, '三星': 4, '是': 5,
'维生素': 6, '蛋白质': 7, '及': 8, '脂肪': 9, '和': 10,
'华为': 11, '苹果': 12, '可': 13, 'APP': 14, '氨基酸': 15, ...}
【词向量矩阵】
(120, 20)
[[ 0.00219526 0.00936278 0.00390177 ... -0.00422463 0.01543128
0.02481441]
[ 0.02346811 -0.01520025 -0.00563479 ... -0.01656673 -0.02222313
0.00438196]
[-0.02253242 -0.01633896 -0.02209039 ... 0.01301584 -0.01016752
0.01147605]
...
[ 0.01793107 0.01912305 -0.01780855 ... -0.00109831 0.02460653
-0.00023512]
[-0.00599797 0.02155897 -0.01874896 ... 0.00149929 0.00200266
0.00988515]
[ 0.0050361 -0.00848463 -0.0235001 ... 0.01531716 -0.02348576
0.01051775]]
#--------------------------------建模与训练-------------------------------
model = Sequential()
#双向RNN
model.add(Bidirectional(GRU(units), input_shape=(maxlen, max_features)))
#输出层 2分类
model.add(Dense(units=1, activation='sigmoid'))
#模型可视化
model.summary()
#激活神经网络
model.compile(optimizer = 'rmsprop', #RMSprop优化器
loss = 'binary_crossentropy', #二元交叉熵损失
metrics = ['acc'] #计算误差或准确率
)
#训练
history = model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs,
verbose=verbose, validation_data=(X_test, y_test))
#----------------------------------预测与可视化------------------------------
#预测
score = model.evaluate(X_test, y_test, batch_size=batch_size)
print('test loss:', score[0])
print('test accuracy:', score[1])
#可视化
acc = history.history['acc']
val_acc = history.history['val_acc']
# 设置类标
plt.xlabel("Iterations")
plt.ylabel("Accuracy")
#绘图
plt.plot(range(epochs), acc, "bo-", linewidth=2, markersize=12, label="accuracy")
plt.plot(range(epochs), val_acc, "gs-", linewidth=2, markersize=12, label="val_accuracy")
plt.legend(loc="upper left")
plt.title("RNN-Word2vec")
plt.show()
# -*- coding: utf-8 -*-
"""
Created on Sat Mar 28 22:10:20 2020
@author: Eastmount CSDN
"""
from jieba import lcut
from numpy import zeros
import matplotlib.pyplot as plt
from gensim.models import Word2Vec
from sklearn.model_selection import train_test_split
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, GRU, Bidirectional
from tensorflow.python.keras.callbacks import EarlyStopping
#-----------------------------------定义参数----------------------------------
max_features = 20 #词向量维度
units = 30 #RNN神经元数量
maxlen = 40 #序列最大长度
epochs = 9 #训练最大轮数
batch_size = 12 #每批数据量大小
verbose = 1 #训练过程展示
patience = 1 #没有进步的训练轮数
callbacks = [EarlyStopping('val_acc', patience=patience)]
#--------------------------------载入数据及预处理-------------------------------
data = [
[0, '小米粥是以小米作为主要食材熬制而成的粥,口味清淡,清香味,具有简单易制,健胃消食的特点'],
[0, '煮粥时一定要先烧开水然后放入洗净后的小米'],
[0, '蛋白质及氨基酸、脂肪、维生素、矿物质'],
[0, '小米是传统健康食品,可单独焖饭和熬粥'],
[0, '苹果,是水果中的一种'],
[0, '粥的营养价值很高,富含矿物质和维生素,含钙量丰富,有助于代谢掉体内多余盐分'],
[0, '鸡蛋有很高的营养价值,是优质蛋白质、B族维生素的良好来源,还能提供脂肪、维生素和矿物质'],
[0, '这家超市的苹果都非常新鲜'],
[0, '在北方小米是主要食物之一,很多地区有晚餐吃小米粥的习俗'],
[0, '小米营养价值高,营养全面均衡 ,主要含有碳水化合物'],
[0, '蛋白质及氨基酸、脂肪、维生素、盐分'],
[1, '小米、三星、华为,作为安卓三大手机旗舰'],
[1, '别再管小米华为了!魅族手机再曝光:这次真的完美了'],
[1, '苹果手机或将重陷2016年困境,但这次它无法再大幅提价了'],
[1, '三星想要继续压制华为,仅凭A70还不够'],
[1, '三星手机屏占比将再创新高,超华为及苹果旗舰'],
[1, '华为P30、三星A70爆卖,斩获苏宁最佳手机营销奖'],
[1, '雷军,用一张图告诉你:小米和三星的差距在哪里'],
[1, '小米米聊APP官方Linux版上线,适配深度系统'],
[1, '三星刚刚更新了自家的可穿戴设备APP'],
[1, '华为、小米跨界并不可怕,可怕的打不破内心的“天花板”'],
]
#中文分析
X, Y = [lcut(i[1]) for i in data], [i[0] for i in data]
#划分训练集和预测集
X_train, X_test, y_train, y_test = train_test_split(X, Y)
#print(X_train)
print(len(X_train), len(X_test))
print(len(y_train), len(y_test))
"""['三星', '刚刚', '更新', '了', '自家', '的', '可', '穿戴', '设备', 'APP']"""
#--------------------------------Word2Vec词向量-------------------------------
word2vec = Word2Vec(X_train, size=max_features, min_count=1) #最大特征 最低过滤频次1
print(word2vec)
#映射特征词
w2i = {w:i for i, w in enumerate(word2vec.wv.index2word)}
print("【显示词语】")
print(word2vec.wv.index2word)
print(w2i)
"""['小米', '三星', '是', '维生素', '蛋白质', '及', 'APP', '氨基酸',..."""
"""{',': 0, '的': 1, '小米': 2, '、': 3, '华为': 4, ....}"""
#词向量计算
vectors = word2vec.wv.vectors
print("【词向量矩阵】")
print(vectors.shape)
print(vectors)
#自定义函数-获取词向量
def w2v(w):
i = w2i.get(w)
return vectors[i] if i else zeros(max_features)
#自定义函数-序列预处理
def pad(ls_of_words):
a = [[w2v(i) for i in x] for x in ls_of_words]
a = pad_sequences(a, maxlen, dtype='float')
return a
#序列化处理 转换为词向量
X_train, X_test = pad(X_train), pad(X_test)
#--------------------------------建模与训练-------------------------------
model = Sequential()
#双向RNN
model.add(Bidirectional(GRU(units), input_shape=(maxlen, max_features)))
#输出层 2分类
model.add(Dense(units=1, activation='sigmoid'))
#模型可视化
model.summary()
#激活神经网络
model.compile(optimizer = 'rmsprop', #RMSprop优化器
loss = 'binary_crossentropy', #二元交叉熵损失
metrics = ['acc'] #计算误差或准确率
)
#训练
history = model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs,
verbose=verbose, validation_data=(X_test, y_test))
#----------------------------------预测与可视化------------------------------
#预测
score = model.evaluate(X_test, y_test, batch_size=batch_size)
print('test loss:', score[0])
print('test accuracy:', score[1])
#可视化
acc = history.history['acc']
val_acc = history.history['val_acc']
# 设置类标
plt.xlabel("Iterations")
plt.ylabel("Accuracy")
#绘图
plt.plot(range(epochs), acc, "bo-", linewidth=2, markersize=12, label="accuracy")
plt.plot(range(epochs), val_acc, "gs-", linewidth=2, markersize=12, label="val_accuracy")
plt.legend(loc="upper left")
plt.title("RNN-Word2vec")
plt.show()
2.LSTM+Word2Vec文本分类
# -*- coding: utf-8 -*-
"""
Created on Sat Mar 28 22:10:20 2020
@author: Eastmount CSDN
"""
from jieba import lcut
from numpy import zeros
import matplotlib.pyplot as plt
from gensim.models import Word2Vec
from sklearn.model_selection import train_test_split
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, LSTM, GRU, Embedding
from tensorflow.python.keras.callbacks import EarlyStopping
#-----------------------------------定义参数----------------------------------
max_features = 20 #词向量维度
units = 30 #RNN神经元数量
maxlen = 40 #序列最大长度
epochs = 9 #训练最大轮数
batch_size = 12 #每批数据量大小
verbose = 1 #训练过程展示
patience = 1 #没有进步的训练轮数
callbacks = [EarlyStopping('val_acc', patience=patience)]
#--------------------------------载入数据及预处理-------------------------------
data = [
[0, '小米粥是以小米作为主要食材熬制而成的粥,口味清淡,清香味,具有简单易制,健胃消食的特点'],
[0, '煮粥时一定要先烧开水然后放入洗净后的小米'],
[0, '蛋白质及氨基酸、脂肪、维生素、矿物质'],
[0, '小米是传统健康食品,可单独焖饭和熬粥'],
[0, '苹果,是水果中的一种'],
[0, '粥的营养价值很高,富含矿物质和维生素,含钙量丰富,有助于代谢掉体内多余盐分'],
[0, '鸡蛋有很高的营养价值,是优质蛋白质、B族维生素的良好来源,还能提供脂肪、维生素和矿物质'],
[0, '这家超市的苹果都非常新鲜'],
[0, '在北方小米是主要食物之一,很多地区有晚餐吃小米粥的习俗'],
[0, '小米营养价值高,营养全面均衡 ,主要含有碳水化合物'],
[0, '蛋白质及氨基酸、脂肪、维生素、盐分'],
[1, '小米、三星、华为,作为安卓三大手机旗舰'],
[1, '别再管小米华为了!魅族手机再曝光:这次真的完美了'],
[1, '苹果手机或将重陷2016年困境,但这次它无法再大幅提价了'],
[1, '三星想要继续压制华为,仅凭A70还不够'],
[1, '三星手机屏占比将再创新高,超华为及苹果旗舰'],
[1, '华为P30、三星A70爆卖,斩获苏宁最佳手机营销奖'],
[1, '雷军,用一张图告诉你:小米和三星的差距在哪里'],
[1, '小米米聊APP官方Linux版上线,适配深度系统'],
[1, '三星刚刚更新了自家的可穿戴设备APP'],
[1, '华为、小米跨界并不可怕,可怕的打不破内心的“天花板”'],
]
#中文分析
X, Y = [lcut(i[1]) for i in data], [i[0] for i in data]
#划分训练集和预测集
X_train, X_test, y_train, y_test = train_test_split(X, Y)
#print(X_train)
print(len(X_train), len(X_test))
print(len(y_train), len(y_test))
"""['三星', '刚刚', '更新', '了', '自家', '的', '可', '穿戴', '设备', 'APP']"""
#--------------------------------Word2Vec词向量-------------------------------
word2vec = Word2Vec(X_train, size=max_features, min_count=1) #最大特征 最低过滤频次1
print(word2vec)
#映射特征词
w2i = {w:i for i, w in enumerate(word2vec.wv.index2word)}
print("【显示词语】")
print(word2vec.wv.index2word)
print(w2i)
"""['小米', '三星', '是', '维生素', '蛋白质', '及', 'APP', '氨基酸',..."""
"""{',': 0, '的': 1, '小米': 2, '、': 3, '华为': 4, ....}"""
#词向量计算
vectors = word2vec.wv.vectors
print("【词向量矩阵】")
print(vectors.shape)
print(vectors)
#自定义函数-获取词向量
def w2v(w):
i = w2i.get(w)
return vectors[i] if i else zeros(max_features)
#自定义函数-序列预处理
def pad(ls_of_words):
a = [[w2v(i) for i in x] for x in ls_of_words]
a = pad_sequences(a, maxlen, dtype='float')
return a
#序列化处理 转换为词向量
X_train, X_test = pad(X_train), pad(X_test)
print(X_train.shape)
print(X_test.shape)
"""(15, 40, 20) 15个样本 40个特征 每个特征用20词向量表示"""
#拉直形状 (15, 40, 20)=>(15, 40*20) (6, 40, 20)=>(6, 40*20)
X_train = X_train.reshape(len(y_train), maxlen*max_features)
X_test = X_test.reshape(len(y_test), maxlen*max_features)
#--------------------------------建模与训练-------------------------------
model = Sequential()
#构建Embedding层 128代表Embedding层的向量维度
model.add(Embedding(max_features, 128))
#构建LSTM层
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
#构建全连接层
#注意上面构建LSTM层时只会得到最后一个节点的输出,如果需要输出每个时间点的结果需将return_sequences=True
model.add(Dense(units=1, activation='sigmoid'))
#模型可视化
model.summary()
#激活神经网络
model.compile(optimizer = 'rmsprop', #RMSprop优化器
loss = 'binary_crossentropy', #二元交叉熵损失
metrics = ['acc'] #计算误差或准确率
)
#训练
history = model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs,
verbose=verbose, validation_data=(X_test, y_test))
#----------------------------------预测与可视化------------------------------
#预测
score = model.evaluate(X_test, y_test, batch_size=batch_size)
print('test loss:', score[0])
print('test accuracy:', score[1])
#可视化
acc = history.history['acc']
val_acc = history.history['val_acc']
# 设置类标
plt.xlabel("Iterations")
plt.ylabel("Accuracy")
#绘图
plt.plot(range(epochs), acc, "bo-", linewidth=2, markersize=12, label="accuracy")
plt.plot(range(epochs), val_acc, "gs-", linewidth=2, markersize=12, label="val_accuracy")
plt.legend(loc="upper left")
plt.title("LSTM-Word2vec")
plt.show()
Model: "sequential_22"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_8 (Embedding) (None, None, 128) 2560
_________________________________________________________________
lstm_8 (LSTM) (None, 128) 131584
_________________________________________________________________
dense_21 (Dense) (None, 1) 129
=================================================================
Total params: 134,273
Trainable params: 134,273
Non-trainable params: 0
_________________________________________________________________
Train on 15 samples, validate on 6 samples
Epoch 1/9
15/15 [==============================] - 8s 552ms/sample - loss: 0.6971 - acc: 0.5333 - val_loss: 0.6911 - val_acc: 0.6667
Epoch 2/9
15/15 [==============================] - 5s 304ms/sample - loss: 0.6910 - acc: 0.7333 - val_loss: 0.7111 - val_acc: 0.3333
Epoch 3/9
15/15 [==============================] - 3s 208ms/sample - loss: 0.7014 - acc: 0.4667 - val_loss: 0.7392 - val_acc: 0.3333
Epoch 4/9
15/15 [==============================] - 4s 261ms/sample - loss: 0.6890 - acc: 0.5333 - val_loss: 0.7471 - val_acc: 0.3333
Epoch 5/9
15/15 [==============================] - 4s 248ms/sample - loss: 0.6912 - acc: 0.5333 - val_loss: 0.7221 - val_acc: 0.3333
Epoch 6/9
15/15 [==============================] - 3s 210ms/sample - loss: 0.6857 - acc: 0.5333 - val_loss: 0.7143 - val_acc: 0.3333
Epoch 7/9
15/15 [==============================] - 3s 187ms/sample - loss: 0.6906 - acc: 0.5333 - val_loss: 0.7346 - val_acc: 0.3333
Epoch 8/9
15/15 [==============================] - 3s 185ms/sample - loss: 0.7066 - acc: 0.5333 - val_loss: 0.7578 - val_acc: 0.3333
Epoch 9/9
15/15 [==============================] - 4s 235ms/sample - loss: 0.7197 - acc: 0.5333 - val_loss: 0.7120 - val_acc: 0.3333
6/6 [==============================] - 0s 43ms/sample - loss: 0.7120 - acc: 0.3333
test loss: 0.712007462978363
test accuracy: 0.33333334
3.LSTM+TFIDF文本分类
# -*- coding: utf-8 -*-
"""
Created on Sat Mar 28 22:10:20 2020
@author: Eastmount CSDN
"""
from jieba import lcut
from numpy import zeros
import matplotlib.pyplot as plt
from gensim.models import Word2Vec
from sklearn.model_selection import train_test_split
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, LSTM, GRU, Embedding
from tensorflow.python.keras.callbacks import EarlyStopping
#-----------------------------------定义参数----------------------------------
max_features = 20 #词向量维度
units = 30 #RNN神经元数量
maxlen = 40 #序列最大长度
epochs = 9 #训练最大轮数
batch_size = 12 #每批数据量大小
verbose = 1 #训练过程展示
patience = 1 #没有进步的训练轮数
callbacks = [EarlyStopping('val_acc', patience=patience)]
#--------------------------------载入数据及预处理-------------------------------
data = [
[0, '小米粥是以小米作为主要食材熬制而成的粥,口味清淡,清香味,具有简单易制,健胃消食的特点'],
[0, '煮粥时一定要先烧开水然后放入洗净后的小米'],
[0, '蛋白质及氨基酸、脂肪、维生素、矿物质'],
[0, '小米是传统健康食品,可单独焖饭和熬粥'],
[0, '苹果,是水果中的一种'],
[0, '粥的营养价值很高,富含矿物质和维生素,含钙量丰富,有助于代谢掉体内多余盐分'],
[0, '鸡蛋有很高的营养价值,是优质蛋白质、B族维生素的良好来源,还能提供脂肪、维生素和矿物质'],
[0, '这家超市的苹果都非常新鲜'],
[0, '在北方小米是主要食物之一,很多地区有晚餐吃小米粥的习俗'],
[0, '小米营养价值高,营养全面均衡 ,主要含有碳水化合物'],
[0, '蛋白质及氨基酸、脂肪、维生素、盐分'],
[1, '小米、三星、华为,作为安卓三大手机旗舰'],
[1, '别再管小米华为了!魅族手机再曝光:这次真的完美了'],
[1, '苹果手机或将重陷2016年困境,但这次它无法再大幅提价了'],
[1, '三星想要继续压制华为,仅凭A70还不够'],
[1, '三星手机屏占比将再创新高,超华为及苹果旗舰'],
[1, '华为P30、三星A70爆卖,斩获苏宁最佳手机营销奖'],
[1, '雷军,用一张图告诉你:小米和三星的差距在哪里'],
[1, '小米米聊APP官方Linux版上线,适配深度系统'],
[1, '三星刚刚更新了自家的可穿戴设备APP'],
[1, '华为、小米跨界并不可怕,可怕的打不破内心的“天花板”'],
]
#中文分词
X, Y = [' '.join(lcut(i[1])) for i in data], [i[0] for i in data]
print(X)
print(Y)
#['煮粥 时 一定 要 先烧 开水 然后 放入 洗净 后 的 小米', ...]
#--------------------------------------计算词频------------------------------------
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
#将文本中的词语转换为词频矩阵
vectorizer = CountVectorizer()
#计算个词语出现的次数
X_data = vectorizer.fit_transform(X)
print(X_data)
#获取词袋中所有文本关键词
word = vectorizer.get_feature_names()
print('【查看单词】')
for w in word:
print(w, end = " ")
else:
print("n")
#词频矩阵
print(X_data.toarray())
#将词频矩阵X统计成TF-IDF值
transformer = TfidfTransformer()
tfidf = transformer.fit_transform(X_data)
#查看数据结构 tfidf[i][j]表示i类文本中的tf-idf权重
weight = tfidf.toarray()
print(weight)
#数据集划分
X_train, X_test, y_train, y_test = train_test_split(weight, Y)
print(X_train.shape, X_test.shape)
print(len(y_train), len(y_test))
#(15, 117) (6, 117) 15 6
#--------------------------------建模与训练-------------------------------
model = Sequential()
#构建Embedding层 128代表Embedding层的向量维度
model.add(Embedding(max_features, 128))
#构建LSTM层
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
#构建全连接层
#注意上面构建LSTM层时只会得到最后一个节点的输出,如果需要输出每个时间点的结果需将return_sequences=True
model.add(Dense(units=1, activation='sigmoid'))
#模型可视化
model.summary()
#激活神经网络
model.compile(optimizer = 'rmsprop', #RMSprop优化器
loss = 'binary_crossentropy', #二元交叉熵损失
metrics = ['acc'] #计算误差或准确率
)
#训练
history = model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs,
verbose=verbose, validation_data=(X_test, y_test))
#----------------------------------预测与可视化------------------------------
#预测
score = model.evaluate(X_test, y_test, batch_size=batch_size)
print('test loss:', score[0])
print('test accuracy:', score[1])
#可视化
acc = history.history['acc']
val_acc = history.history['val_acc']
# 设置类标
plt.xlabel("Iterations")
plt.ylabel("Accuracy")
#绘图
plt.plot(range(epochs), acc, "bo-", linewidth=2, markersize=12, label="accuracy")
plt.plot(range(epochs), val_acc, "gs-", linewidth=2, markersize=12, label="val_accuracy")
plt.legend(loc="upper left")
plt.title("LSTM-TFIDF")
plt.show()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, None, 128) 2560
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 131584
_________________________________________________________________
dense_1 (Dense) (None, 1) 129
=================================================================
Total params: 134,273
Trainable params: 134,273
Non-trainable params: 0
_________________________________________________________________
Train on 15 samples, validate on 6 samples
Epoch 1/9
15/15 [==============================] - 2s 148ms/sample - loss: 0.6898 - acc: 0.5333 - val_loss: 0.7640 - val_acc: 0.3333
Epoch 2/9
15/15 [==============================] - 1s 48ms/sample - loss: 0.6779 - acc: 0.6000 - val_loss: 0.7773 - val_acc: 0.3333
Epoch 3/9
15/15 [==============================] - 1s 36ms/sample - loss: 0.6769 - acc: 0.6000 - val_loss: 0.7986 - val_acc: 0.3333
Epoch 4/9
15/15 [==============================] - 1s 47ms/sample - loss: 0.6722 - acc: 0.6000 - val_loss: 0.8097 - val_acc: 0.3333
Epoch 5/9
15/15 [==============================] - 1s 42ms/sample - loss: 0.7021 - acc: 0.6000 - val_loss: 0.7680 - val_acc: 0.3333
Epoch 6/9
15/15 [==============================] - 1s 36ms/sample - loss: 0.6890 - acc: 0.6000 - val_loss: 0.8147 - val_acc: 0.3333
Epoch 7/9
15/15 [==============================] - 1s 37ms/sample - loss: 0.6906 - acc: 0.6000 - val_loss: 0.8599 - val_acc: 0.3333
Epoch 8/9
15/15 [==============================] - 1s 43ms/sample - loss: 0.6819 - acc: 0.6000 - val_loss: 0.8303 - val_acc: 0.3333
Epoch 9/9
15/15 [==============================] - 1s 40ms/sample - loss: 0.6884 - acc: 0.6000 - val_loss: 0.7695 - val_acc: 0.3333
6/6 [==============================] - 0s 7ms/sample - loss: 0.7695 - acc: 0.3333
test loss: 0.7694947719573975
test accuracy: 0.33333334
4.机器学习和深度学习对比分析
五.总结
考研、考博的童鞋加油,在我心中,你们永远是最棒的。这一年我们一起相互鼓励,陪伴着前行,今年是真的辛苦,祝大家都能考上心仪的学校。最后,合理利用好时间,试卷都写满,专业难题都往知识点靠,加油!
原文始发于微信公众号(娜璋AI安全之家):[Python人工智能] 二十.基于Keras+RNN的文本分类vs基于传统机器学习的文本分类
免责声明:文章中涉及的程序(方法)可能带有攻击性,仅供安全研究与教学之用,读者将其信息做其他用途,由读者承担全部法律及连带责任,本站不承担任何法律及连带责任;如有问题可邮件联系(建议使用企业邮箱或有效邮箱,避免邮件被拦截,联系方式见首页),望知悉。
- 左青龙
- 微信扫一扫
-
- 右白虎
- 微信扫一扫
-
评论