基于云计算与非负矩阵分解的数据分级聚类Data hierarchical clustering algorithm based on cloud computing and NMF
赵艳萍,徐胜超
摘要(Abstract):
为了提高传统数据聚类算法在大数据挖掘应用中的性能,借助云计算的相关技术,并结合非负矩阵分解方法设计并实现了一种并行的数据层次聚类算法。该算法采用Map Reduce编程平台,利用Hadoop的HDFS存储大容量的电信运营商数据;描述了Map Reduce的数据分级聚类并行处理的工作机制与流程;通过Map和Reduce这种主-从编程模式很方便地使数据分级聚类的子任务在Hadoop的PC集群上运行。实验结果表明,该方法比传统用于数据聚类的非负矩阵方法具有更好的运行时间与加速比,能够在可以接受的时间范围内完成电信运营商的大数据处理。
关键词(KeyWords): 云计算;分级聚类;MapReduce;非负矩阵分解;聚类算法;并行数据
基金项目(Foundation): 国家开放大学项目:开放大学移动学习模式在欠发达民族地区的应用研究-以青海为例(Q0082A-309Y)~~
作者(Author): 赵艳萍,徐胜超
DOI: 10.16652/j.issn.1004-373x.2018.05.013
参考文献(References):
- [1]ZHENG Y,CAPRA L,WOLFSON O,et al.Urban computing:concepts,methodologies,and applications[J].ACM transactions on intelligent systems and technology,2014(1):1-9.
- [2]李应安.基于Map Reduce的聚类算法的并行化研究[D].广州:中山大学,2011.LI Y A.Research on parallelization of clustering algorithm based on Map Reduce[D].Guangzhou:Sun Yat-sen University,2011.
- [3]曹泽文,周姚.基于Map Reduce的JP算法设计与实现[J].计算机工程,2012,38(24):14-16.CAO Z W,ZHOU Y.Design and implementation of JP algorithm based on Map Reduce[J].Computer engineering,2012,38(24):14-16.
- [4]杨燕,王全根,黄波.蚁群聚类算法的并行化设计与实现[J].控制工程,2013,20(3):411-414.YANG Yan,WANG Quangen,HUANG Bo.Parallel design and implementation of ant colony clustering algorithm[J].Control engineering of China,2013,20(3):411-414.
- [5]杨慧中,董陶,陶洪峰.基于改进K-means聚类算法的组合模型建模[J].控制工程,2013,20(2):201-203.YANG Huizhong,DONG Tao,TAO Hongfeng.Combination model based on improved K-means clustering algorithm[J].Control engineering of China,2013,20(2):201-203.
- [6]李欢,刘锋,朱二周.基于改进K-means算法的海量数据分析技术研究[J].微电子学与计算机,2016,33(5):52-57.LI Huan,LIU Feng,ZHU Erzhou.Research of an improved Kmeans algorithm for analyzing mass data[J].Microelectronics&computer,2016,33(5):52-57.
- [7]LI F,OOI B C,?ZSU M T,et al.Distributed data management using Map Reduce[J].ACM computing surveys,2014,46(3):31.
- [8]吴诗极,李川,唐常杰.面向大规模信息网络的高效自适应聚类算法[J].计算机科学与探索,2014,8(4):406-416.WU Shiji,LI Chuan,TANG Changjie.Efficient adaptive clustering algorithm for large scale information network[J].Journal of frontiers of computer science&technology,2014,8(4):406-416.
- [9]任重鲁,李金明.非负矩阵分解在微阵列数据分类和聚类发现中的应用[J].计算机工程与科学,2014,36(7):1389-1397.REN Zhonglu,LI Jinming.Application of non-negative matrix factorization in microarray data classification and clustering discovery[J].Computer engineering and science,2014,36(7):1389-1397.
- [10]徐森,卢志茂,顾国昌.结合K均值和非负矩阵分解集成文本聚类算法[J].吉林大学学报(工学版),2011,41(4):1077-1082.XU Sen,LU Zhimao,GU Guochang.Integrating K-means and non-negative matrix factorization to ensemble document clustering[J].Journal of Jilin University(engineering and technology edition),2011,41(4):1077-1082.
- [11]罗明伟,姚宏亮,李俊照,等.一种基于节点相异度的社团层次划分算法[J].计算机工程,2014,40(1):275-279.LUO Mingwei,YAO Hongliang,LI Junzhao,et al.A hierarchical division algorithm for community based on node dissimilarity[J].Computer engineering,2014,40(1):275-279.
- [12]Hadoop.Hadoop Open source Web site 2016[EB/OL].[2016-10-23].http://hadoop.apache.org/.