博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
机器学习之路: python 朴素贝叶斯分类器 MultinomialNB 预测新闻类别
阅读量:4515 次
发布时间:2019-06-08

本文共 3976 字,大约阅读时间需要 13 分钟。

 

使用python3 学习朴素贝叶斯分类api

设计到字符串提取特征向量

欢迎来到我的git下载源代码: https://github.com/linyi0604/MachineLearning

 

1 from sklearn.datasets import fetch_20newsgroups 2 from sklearn.cross_validation import train_test_split 3 # 导入文本特征向量转化模块 4 from sklearn.feature_extraction.text import CountVectorizer 5 # 导入朴素贝叶斯模型 6 from sklearn.naive_bayes import MultinomialNB 7 # 模型评估模块 8 from sklearn.metrics import classification_report 9 10 '''11 朴素贝叶斯模型广泛用于海量互联网文本分类任务。12 由于假设特征条件相互独立,预测需要估计的参数规模从幂指数量级下降接近线性量级,节约内存和计算时间13 但是 该模型无法将特征之间的联系考虑,数据关联较强的分类任务表现不好。14 '''15 16 '''17 1 读取数据部分18 '''19 # 该api会即使联网下载数据20 news = fetch_20newsgroups(subset="all")21 # 检查数据规模和细节22 # print(len(news.data))23 # print(news.data[0])24 '''25 1884626 27 From: Mamatha Devineni Ratnam 
28 Subject: Pens fans reactions29 Organization: Post Office, Carnegie Mellon, Pittsburgh, PA30 Lines: 1231 NNTP-Posting-Host: po4.andrew.cmu.edu32 33 I am sure some bashers of Pens fans are pretty confused about the lack34 of any kind of posts about the recent Pens massacre of the Devils. Actually,35 I am bit puzzled too and a bit relieved. However, I am going to put an end36 to non-PIttsburghers' relief with a bit of praise for the Pens. Man, they37 are killing those Devils worse than I thought. Jagr just showed you why38 he is much better than his regular season stats. He is also a lot39 fo fun to watch in the playoffs. Bowman should let JAgr have a lot of40 fun in the next couple of games since the Pens are going to beat the pulp out of Jersey anyway. I was very disappointed not to see the Islanders lose the final41 regular season game. PENS RULE!!!42 '''43 44 '''45 2 分割数据部分46 '''47 x_train, x_test, y_train, y_test = train_test_split(news.data,48 news.target,49 test_size=0.25,50 random_state=33)51 52 '''53 3 贝叶斯分类器对新闻进行预测54 '''55 # 进行文本转化为特征56 vec = CountVectorizer()57 x_train = vec.fit_transform(x_train)58 x_test = vec.transform(x_test)59 # 初始化朴素贝叶斯模型60 mnb = MultinomialNB()61 # 训练集合上进行训练, 估计参数62 mnb.fit(x_train, y_train)63 # 对测试集合进行预测 保存预测结果64 y_predict = mnb.predict(x_test)65 66 '''67 4 模型评估68 '''69 print("准确率:", mnb.score(x_test, y_test))70 print("其他指标:\n",classification_report(y_test, y_predict, target_names=news.target_names))71 '''72 准确率: 0.839770797962648573 其他指标:74 precision recall f1-score support75 76 alt.atheism 0.86 0.86 0.86 20177 comp.graphics 0.59 0.86 0.70 25078 comp.os.ms-windows.misc 0.89 0.10 0.17 24879 comp.sys.ibm.pc.hardware 0.60 0.88 0.72 24080 comp.sys.mac.hardware 0.93 0.78 0.85 24281 comp.windows.x 0.82 0.84 0.83 26382 misc.forsale 0.91 0.70 0.79 25783 rec.autos 0.89 0.89 0.89 23884 rec.motorcycles 0.98 0.92 0.95 27685 rec.sport.baseball 0.98 0.91 0.95 25186 rec.sport.hockey 0.93 0.99 0.96 23387 sci.crypt 0.86 0.98 0.91 23888 sci.electronics 0.85 0.88 0.86 24989 sci.med 0.92 0.94 0.93 24590 sci.space 0.89 0.96 0.92 22191 soc.religion.christian 0.78 0.96 0.86 23292 talk.politics.guns 0.88 0.96 0.92 25193 talk.politics.mideast 0.90 0.98 0.94 23194 talk.politics.misc 0.79 0.89 0.84 18895 talk.religion.misc 0.93 0.44 0.60 15896 97 avg / total 0.86 0.84 0.82 471298 '''

 

转载于:https://www.cnblogs.com/Lin-Yi/p/8970522.html

你可能感兴趣的文章
描述yeild作用
查看>>
wifi万能钥匙自媒体平台开放注册(付注册流程)
查看>>
ovs ovn 学习资料
查看>>
C# string 转 bool
查看>>
iOS视频边下载边播放
查看>>
数据分列将数字转换成文本格式
查看>>
java基础语法
查看>>
把e.printStackTrace的堆栈信息打印在log.error()中
查看>>
Highsoft.Highcharts 5.0.6439.38401 key
查看>>
Kids and Prizes(SGU 495)
查看>>
如何完成dedecms外部数据库调用|跨数据库数据调用
查看>>
二维码扫描ZXing简化
查看>>
Linux Bootloader_转载
查看>>
Bootstrap 3.0正式版发布!
查看>>
spring boot--拦截器实现
查看>>
我的CSS样式记事本(1)
查看>>
事务和异常易出现的错误
查看>>
tesseract-ocr
查看>>
采用Mono进行移动开发图书推荐
查看>>
python---图表的使用
查看>>