高维空间下基于密度的离群点探测算法实现Implementation of Density-based Outlier Detection in High-dimensional Space
熊君丽
摘要(Abstract):
离群点是数据仓库中表现行为异常的数据。对高维空间下离群点的性质进行了研究,采用高维空间数据在低维空间投影再进行探测的策略,解决了高维空间数据稀疏难以用数据点距离判断离群的问题。算法实现中选取彼此关联紧密的维,数据点之间的距离采用最近邻定义,用基于密度的离群点探测方法,能在局部空间内更有效地探测到离群点。
关键词(KeyWords): 离群点探测;最近邻;高维空间;基于密度;数据挖掘
基金项目(Foundation):
作者(Author): 熊君丽
参考文献(References):
- [1]Knorr E M,Ng R T.Finding Intensional Knowledge of Dis-tance based Outliers.In:Atkinson M P,Orlowska M E,Valduriez P,eds.Proceedings of the 25thInternational Con-ference on Very Large Data Bases.Edinburgh,Scotland:Morgan Kaufmann,1999:211 222.
- [2]Breunig M,Kriegel H P,Raymond T.OPTICS OF:Identif-ying Density based Local Outliers[C].Proceedings of theACM SIGMOD Internatioanl Conference on Management ofData Dalls,Texas:ACM Press,2000.
- [3]Aggarawal C C,Yu P S.Outliers Detection for High Dimen-sional Data:In:Aref W G,eds.Proceedings of the ACMSIGMOD International Conference on Management of Data.Santa Barbara,CA:ACM Press,2001:37 47.
- [4]Ramaswamy S,Rastogi R,Kyuseok S.Efficient Algorithmsfor Mining Outliers from Large Data Sets.In:Chen W D,Naughton J F,Bernstein P A,eds.Proceedings of the ACMSIGMOD Internatioanl Conference on Management of DataDallas,Texas:ACM Press,2000:427 438.