大数据集合中冗余特征排除的聚类算法设计Design of clustering algorithm for redundancy feature removal in big data sets
侯莉莎
摘要(Abstract):
传统microRNA聚类算法对数据的新特征要求较高,未全面分析大数据集内的冗余特征,使得聚类结果均衡性差。因此,提出大数据集合中冗余特征排除的聚类算法,其采用聚类集成算法,在组构造时期通过使用一致的聚类算法抽取各种子集样本,实现大数据冗余特征的排除,获取排除冗余特征的大数据集聚类结果。对得到的大数据聚类特征分类能力以及特征关联性实施度量,采用基于特征聚类以及随机子空间的miRNA识别算法,实现大数据集合冗余特征的聚类。实验结果表明,所提算法具有较高的冗余数据排除性能,该算法下的大数据聚类效果优,具有较高的均衡性。
关键词(KeyWords): 大数据集;冗余特征排除;聚类算法;特征关联性;随机子空间;miRNA识别算法
基金项目(Foundation): 2014年度天津市教委科研计划项目(20140811)~~
作者(Author): 侯莉莎
DOI: 10.16652/j.issn.1004-373x.2018.14.012
参考文献(References):
- [1]古凌岚.面向大数据集的有效聚类算法[J].计算机工程与设计,2014,35(6):2183-2187.GU linglan.Efficient clustering algorithm for large data sets[J].Computer engineering and design,2014,35(6):2183-2187.
- [2]罗恩韬,王国军.大数据中一种基于语义特征阈值的层次聚类方法[J].电子与信息学报,2015,37(12):2795-2801.LUO Entao,WANG Guojun.A hierarchical clustering method based on the threshold of semantic feature in big data[J].Journal of electronics&information technology,2015,37(12):2795-2801.
- [3]张顺龙,库涛,周浩.针对多聚类中心大数据集的加速Kmeans聚类算法[J].计算机应用研究,2016,33(2):413-416.ZHANG Shunlong,KU Tao,ZHOU Hao.Accelerate K-means for multi-center clustering of big datasets[J].Application research of computers,2016,33(2):413-416.
- [4]向尧,袁景凌,钟珞,等.一种面向大数据集的粗粒度并行聚类算法研究[J].小型微型计算机系统,2014,35(10):2370-2374.XIANG Yao,YUAN Jingling,ZHONG Luo,et al.A coarsegrained clustering unit based parallel algorithm for big data set[J].Journal of Chinese computer systems,2014,35(10):2370-2374.
- [5]谢川.基于混沌关联维特征提取的大数据聚类算法[J].计算机科学,2016,43(6):229-232.XIE Chuan.Big data clustering algorithm based on chaotic correlation dimensions feature extraction[J].Computer science,2016,43(6):229-232.
- [6]张晓,王红.一种改进的基于大数据集的混合聚类算法[J].计算机工程与科学,2015,37(9):1621-1626.ZHANG Xiao,WANG Hong.An improved hybrid clustering algorithm based on large data sets[J].Computer engineering and science,2015,37(9):1621-1626.
- [7]朱琪,张会福,杨宇波,等.基于减法聚类的合并最优路径层次聚类算法[J].计算机工程,2015,41(6):178-182.ZHU Qi,ZHANG Huifu,YANG Yubo,et al.Combined optimal path hierarchical clustering algorithm based on subtractive clustering[J].Computer engineering,2015,41(6):178-182.
- [8]周润物,李智勇,陈少淼,等.面向大数据处理的并行优化抽样聚类K-means算法[J].计算机应用,2016,36(2):311-315.ZHOU Runwu,LI Zhiyong,CHEN Shaomiao,et al.Parallel optimization sampling clustering K-means algorithm for big data processing[J].Journal of computer applications,2016,36(2):311-315.
- [9]卢志茂,冯进玫,范冬梅,等.面向大数据处理的划分聚类新方法[J].系统工程与电子技术,2014,36(5):1010-1015.LU Zhimao,FENG Jinmei,FAN Dongmei,et al.Novel partitional clustering algorithm for large data processing[J].Systems engineering and electronics,2014,36(5):1010-1015.
- [10]巩树凤,张岩峰.EDDPC:一种高效的分布式密度中心聚类算法[J].计算机研究与发展,2016,53(6):1400-1409.GONG Shufeng,ZHANG Yanfeng.EDDPC:an efficient distributed density peaks clustering algorithm[J].Journal of computer research and development,2016,53(6):1400-1409.