基于关联规则的海量重复数据消除系统设计Design of massive repeating data elimination system based on association rules
连雁平
摘要(Abstract):
结合传统消除系统中重复数据吞吐量不可控、相似判断完成时间长的问题,利用关联规则,改进设计了海量重复数据消除系统。为增强相似数据搜索速度、保证消除可靠性,对系统的总体框架进行设计,依据框架将硬件部分重新划分为重复数据检测、总吞吐量提升、消除可靠性保证、系统安全四个主要模块。应用TCP/IP,NetBEUI,IPX/SPX三种协议相结合的方式代替传统系统的NetBEUI传输协议作为核心运行基础,建立基础数据序列关联规则优化重复数据检测编码,确保海量重复数据消除流程的高效性和稳定性,完成消除系统设计。采用传统系统和改进系统对比的方式进行实验,实验结果表明,该系统的重复数据吞吐量始终维持在可控范围内,且相似判断完成时间可缩短至传统系统的1 2左右。
关键词(KeyWords): 关联规则;数据消除;系统框架;冗余纠正;通信接口;数据序列;检测编码;消除流程
基金项目(Foundation): 福建省自然科学基金项目(2017J01406);; 福建省中青年教师教育科研项目(JA15527);; 武夷学院高级引进人才科研启动项目(YJ201607)~~
作者(Author): 连雁平
DOI: 10.16652/j.issn.1004-373x.2018.23.007
参考文献(References):
- [1]王思博,徐稚,向永源,等.1 m新真空太阳望远镜成像系统观测数据干涉条纹特性分析及消除方法[J].天文学报,2016,57(5):608-619.WANG Sibo,XU Zhi,XIANG Yongyuan,et al.Study and removal of the interference fringes in images observed by the imaging system of NVST[J].Acta astronomica Sinica,2016,57(5):608-619.
- [2]王光霞,杨杞,王瑞刚,等.中间锦鸡儿转录组EST-SSR标记系统性识别与引物筛选[J].生物技术通报,2016,32(2):178-184.WANG Guangxia,YANG Qi,WANG Ruigang,et al.Systematical identification and primer screening of EST-SSR marker in transcriptome of caragana intermedia[J].Biotechnology bulletin,2016,32(2):178-184.
- [3]黄丽平,俞丽云,薛恩生,等.Bethesda分类联合甲状腺超声影像报告与数据系统(TIRADS)对甲状腺结节恶性风险的评估价值[J].中华超声影像学杂志,2016,25(8):710-714.HUANG Liping,YU Liyun,XUE Ensheng,et al.Value of Bethesda classification combined with TIRADS in assessing the malignant risk of thyroid nodules[J].Chinese journal of ultrasonography,2016,25(8):710-714.
- [4]张玲玲,黄建国,唐成凯,等.同频时反水下协作通信系统中的自信息干扰消除方法[J].西北工业大学学报,2016,34(5):823-828.ZHANG Lingling,HUANG Jianguo,TANG Chengkai,et al.Self-interference cancellation in underwater cooperative communication based on time reverse[J].Journal of Northwestern Polytechnical University,2016,34(5):823-828.
- [5]郑徽,王富珍,张国民,等.2011-2013年全国传染病报告信息管理系统中乙型肝炎病例重复报告现状分析[J].中华流行病学杂志,2016,37(9):1248-1252.ZHENG Hui,WANG Fuzhen,ZHANG Guomin,et al.Analysis on duplicate reporting of hepatitis B in national notifiable communicable disease report system in China,2011-2013[J].Chinese journal of epidemiology,2016,37(9):1248-1252.
- [6]黄良,高正浩,曹洪,等.一二次系统融合的电网风险评估实用化计算方法及数据建模研究[J].电力系统保护与控制,2016,44(17):104-110.HUANG Liang,GAO Zhenghao,CAO Hong,et al.Research on calculation model for electric power system risk assessment with consideration of both primary and secondary system[J].Power system protection and control,2016,44(17):104-110.
- [7]孙美美,胡云安,韦建明.基于自适应重复学习的不确定多涡卷混沌系统同步控制[J].控制与决策,2016,31(8):1387-1393.SUN Meimei,HU Yun’an,WEI Jianming.Adaptive repetitive learning-based synchronization control of uncertain multi-scroll chaotic systems[J].Control and decision,2016,31(8):1387-1393.
- [8]杨东日,陈跃,刘姝祎,等.面向医疗健康大数据的存储系统容错策略优化研究[J].中国电子科学研究院学报,2017,12(5):546-550.YANG Dongri,CHEN Yue,LIU Shuyi,et al.Research on fault tolerant strategy optimization of storage system for the healthcare big data[J].Journal of China Academy of Electronics and Information Technology,2017,12(5):546-550.
- [9]张寅,谢雁鸣,陈岑,等.基于关联规则Apriori算法的真实世界复方苦参注射液治疗恶性肿瘤联合用药药理作用特征的回顾分析[J].中国中药杂志,2017,42(2):378-384.ZHANG Yin,XIE Yanming,CHEN Cen,et al.Association rules analysis of Fufang Kushen injection in combination with traditional Chinese medicine or modern medications in treating malignant tumor:real-world retrospective study[J].China journal of Chinese materia medica,2017,42(2):378-384.
- [10]苏泽琦,张文君,张雨珊,等.应用文献挖掘及关联规则分析方法挖掘分析古代方剂中粪便类中药的基本特征及其应用[J].北京中医药大学学报,2016,39(5):376-382.SU Zeqi,ZHANG Wenjun,ZHANG Yushan,et al.Literature study on fecal class Chinese medicines and their application in the ancient prescriptions[J].Journal of Beijing University of Traditional Chinese Medicine,2016,39(5):376-382.
- [11]焦高乐,严明义.中国城镇化进程对碳强度的影响研究:基于脱钩指数与关联规则分析[J].当代经济科学,2017,39(4):79-86.JIAO Gaole,YAN Mingyi.The impact of urbanization on carbon intensity in China:based on decoupling index and association rule analysis[J].Modern economic science,2017,39(4):79-86.
- [12]王海波,张永田,吴升.基于数据立方体的多最小支持度关联规则在犯罪分析中的应用[J].测绘科学技术学报,2016,33(4):405-409.WANG Haibo,ZHANG Yongtian,WU Sheng.Application of association rules with multiple minimum supports based on the data cube in crime analysis[J].Journal of geomatics science and technology,2016,33(4):405-409.
- [13]黄俊杰,谭波,陈孝明,等.用Apriori关联规则挖掘算法发现湖北电网雷击灾害的时空分布规律[J].应用科学学报,2017,35(1):31-41.HUANG Junjie,TAN Bo,CHEN Xiaoming,et al.Spatiotemporal distribution of lightning disasters of power lines in Hubei Province using data mining based on Apriori association rules[J].Journal of applied sciences,2017,35(1):31-41.
- [14]XIA W,JIANG H,FENG D,et al.DARE:a deduplicationaware resemblance detection and elimination scheme for data reduction with low overheads[J].IEEE transactions on computers,2016,65(6):1692-1705.
- [15]LIAO C,GOLDEN W C,ANDERSON J R,et al.Missed opportunities for repeat HIV testing in pregnancy:implications for elimination of mother-to-child transmission in the united states[J].Aids patient care Stds,2017,31(1):20-26.