一种基于决策树ID3的改进算法An improved ID3 algorithm for decision tree
王子京,刘毓
摘要(Abstract):
针对传统ID3算法无法处理属性值连续的数据集,设计了一种新的改进算法用于连续评价数据的处理。改进算法先用聚类算法对连续属性值进行离散化,再计算属性的粗糙度作为属性分裂的标准,最后用改进的ID3算法生成决策树。通过仿真验证了该方法的预测正确率,并探讨其应用条件。实验结果表明,在不降低正确率的情况下,该算法可处理属性值连续的数据且具有更好的可读性及更低的运算量。
关键词(KeyWords): 数据挖掘;决策树;粗糙集;ID3算法;大数据;算法改进
基金项目(Foundation): 陕西省工业攻关(2016GY-113)~~
作者(Author): 王子京,刘毓
DOI: 10.16652/j.issn.1004-373x.2018.15.009
参考文献(References):
- [1]李泓波,白劲波,杨高明,等.决策树技术研究综述[J].电脑知识与技术,2015,11(24):1-4.LI Hongbo,BAI Jinbo,YANG Gaoming,et al.Review on decision tree technology research[J].Computer knowledge and technology,2015,11(24):1-4.
- [2]翟俊海,翟梦尧,李胜杰.基于相容粗糙集技术的连续值属性决策树归纳[J].计算机科学,2012,39(11):183-186.ZHAI Junhai,ZHAI Mengyao,LI Shengjie.Induction of decision tree for continuous-valued attributes based on tolerance rough sets technique[J].Computer science,2012,39(11):183-186.
- [3]朱付保,霍晓齐,徐显景.基于粗糙集的ID3决策树算法改进[J].郑州轻工业学院学报(自然科学版),2015,30(1):50-54.ZHU Fubao,HUO Xiaoqi,XU Xianjing.Improved ID3 decision tree algorithm based on rough set[J].Journal of Zhengzhou University of Light Industry(natural science),2015,30(1):50-54.
- [4]翟俊海,王华超,张素芳.一种基于模糊熵的模糊分类算法[J].计算机工程与应用,2010,46(20):176-180.ZHAI Junhai,WANG Huachao,ZHANG Sufang.Fuzzy classification algorithm based on fuzzy entropy[J].Computer engineering and applications,2010,46(20):176-180.
- [5]巩固,吕俊怀,黄永青,等.有效改进C5.0算法的方法[J].计算机工程与设计,2009,30(22):5197-5199.GONG Gu,LüJunhuai,HUANG Yongqing,et al.Effective method of improving C5.0 algorithm[J].Computer engineering and design,2009,30(22):5197-5199.
- [6]张亮,宁芊.CART决策树的两种改进及应用[J].计算机工程与设计,2015,36(5):1209-1213.ZHANG Liang,NING Qian.Two improvements on CART decision tree and its application[J].Computer engineering and design,2015,36(5):1209-1213.
- [7]王小巍,蒋玉明.决策树ID3算法的分析与改进[J].计算机工程与设计,2011,32(9):3069-3072.WANG Xiaowei,JIANG Yuming.Analysis and improvement of ID3 decision tree algorithm[J].Computer engineering and design,2011,32(9):3069-3072.
- [8]LIU X W,WANG D H,JIANG L X.A novel method for inducing ID3 decision trees based on variable precision rough set[C]//2011 the Seventh International Conference on Natural Computation.Shanghai,China:IEEE,2011:494-497.
- [9]翟俊海,侯少星,王熙照.粗糙模糊决策树归纳算法[J].南京大学学报(自然科学版),2016,52(2):306-312.ZHAI Junhai,HOU Shaoxing,WANG Xizhao.Induction of rough fuzzy decision tree[J].Journal of Nanjing University(natural sciences),2016,52(2):306-312.
- [10]周润物,李智勇,陈少淼,等.面向大数据处理的并行优化抽样聚类K-means算法[J].计算机应用,2016,36(2):311-315.ZHOU Runwu,LI Zhiyong,CHEN Shaomiao,et al.Parallel optimization sampling clustering K-means algorithm for big data processing[J].Journal of computer applications,2016,36(2):311-315.
- [11]李晓瑜,俞丽颖,雷航,等.一种K-means改进算法的并行化实现与应用[J].电子科技大学学报,2017,46(1):61-68.LI Xiaoyu,YU Liying,LEI Hang,et al.The parallel implementation and application of an improved K-means algorithm[J].Journal of University of Electronic Science and Technology of China,2017,46(1):61-68.