基于支持向量机的不均衡文本分类方法Unbalanced text classification method based on support vector machine
高超,许翰林
摘要(Abstract):
目前支持向量机(SVM)对均衡文本数据集进行文本分类时表现十分良好,但如果文本数据集是不均衡的,尤其是当不均衡率很大时,容易导致支持向量机分类失败。提出PSO-SMOTE混合算法,针对不均衡文本数据集问题,运用SMOTE算法生成插值样本均衡数据集,并通过PSO算法迭代进化得到最佳的插值样本,对支持向量机的文本分类能力进行优化。实验结果表明,新算法大幅优化了支持向量机分类不均衡文本数据集的能力。
关键词(KeyWords): 混合算法;支持向量机;不均衡数据集;插值样本;文本分类;迭代进化
基金项目(Foundation):
作者(Author): 高超,许翰林
DOI: 10.16652/j.issn.1004-373x.2018.15.041
参考文献(References):
- [1]周庆平,谭长庚,王宏君,等.基于聚类改进的KNN文本分类算法[J].计算机应用研究,2016,33(11):3374-3377.ZHOU Qingping,TAN Changgeng,WANG Hongjun,et al.Improved KNN text classification algorithm based on clustering[J].Application research of computers,2016,33(11):3374-3377.
- [2]杜选.基于加权补集的朴素贝叶斯文本分类算法研究[J].计算机应用与软件,2014,31(9):253-255.DU Xuan.Research on weighted complement-based naive Bayes text classification algorithm[J].Computer applications and software,2014,31(9):253-255.
- [3]陈斌.SMOTE不平衡数据过采样算法的改进与应用[D].南宁:广西大学,2015.CHEN Bin.The improvement and application of SMOTE algorithm for unbalanced data sampling[D].Nanning:Guangxi University,2015.
- [4]崔建明,刘建明,廖周宇.基于SVM算法的文本分类技术研究[J].计算机仿真,2013,30(2):299-302.CUI Jianming,LIU Jianming,LIAO Zhouyu.Research of text categorization based on support vector machine[J].Computer simulation,2013,30(2):299-302.
- [5]谢娜娜,房斌,吴磊.不均衡数据集上文本分类方法研究[J].计算机工程与应用,2013,49(20):118-121.XIE Nana,FANG Bin,WU Lei.Study of text categorization on imbalanced data[J].Computer engineering and applications,2013,49(20):118-121.
- [6]王超学,张涛,马春森.面向不平衡数据集的改进型SMOTE算法[J].计算机科学与探索,2014,8(6):727-734.WANG Chaoxue,ZHANG Tao,MA Chunsen.Improved SMOTE algorithm for imbalanced datasets[J].Journal of frontiers of computer science&technology,2014,8(6):727-734.
- [7]薛薇.非平衡数据集的改进SMOTE再抽样算法[J].统计研究,2012,29(6):95-98.XUE Wei.An improved SMOTE algorithm for re-sampling imbalanced data sets[J].Statistical research,2012,29(6):95-98.
- [8]王道明,鲁昌华,蒋薇薇,等.基于粒子群算法的决策树SVM多分类方法研究[J].电子测量与仪器学报,2015,29(4):611-615.WANG Daoming,LU Changhua,JIANG Weiwei,et al.Study on PSO-based decision-tree SVM multi-class classification method[J].Journal of electronic measurement and instrumentation,2015,29(4):611-615.
- [9]张钰莎,蒋盛益,谢柏林,等.基于改进的PSO算法的网络社区划分方法[J].计算机应用与软件,2013,30(8):25-27.ZHANG Juesha,JIANG Shengyi,XIE Bolin,et al.Improved PSO algorithm based network community detection method[J].Computer applications and software,2013,30(8):25-27.
- [10]李晶辉,张小刚,陈华,等.一种改进隐朴素贝叶斯算法的研究[J].小型微型计算机系统,2013,34(7):1654-1658.LI Jinghui,ZHANG Xiaogang,CHEN Hua,et al.Improved algorithm for learning hidden naive Bayes[J].Journal of Chinese computer systems,2013,34(7):1654-1658.