基于本体和语义距离的DBpedia领域知识抽取方法A method of extracting domain knowledge from DBpedia based on ontology and semantic distance
张志申,王会勇,张晓明,艾青,孟明明
摘要(Abstract):
关联开放数据(LOD)中蕴藏着大量不同领域的知识,但是目前抽取其中特定领域知识的方法大多需要人工参与。为了能自动地抽取领域知识,提出根据领域本体抽取DBpedia中特定领域知识的方法。使用领域本体、Wikipedia和主题提取算法获得用于抽取领域知识的种子关键词集。在直接链接子图语义距离算法中,添加能够代表边指向性权值的参数,用于领域知识的抽取,并基于本体和字符串相似度比较的筛选策略对抽取的知识进行筛选。通过实验表明,该方法不仅能够获得较好的抽取效果,而且不需要人为地挑选关键词和参与筛选过程,极大地节省了时间和精力。
关键词(KeyWords): DBpedia;领域本体;直接链接子图语义距离算法;知识抽取;抽取策略;筛选
基金项目(Foundation): 国家自然科学基金(51271033);国家自然科学基金(71271076);; 河北省自然科学基金(F2018208116);; 河北省科技计划(16210312D);; 河北省高等学校科学技术研究项目(QN2015207)~~
作者(Author): 张志申,王会勇,张晓明,艾青,孟明明
DOI: 10.16652/j.issn.1004-373x.2018.13.029
参考文献(References):
- [1]LEHMANN J,ISELE R,JAKOB M,et al.DBpedia:a largescale,multilingual knowledge base extracted from Wikipedia[J].Semantic Web,2015,6(2):167-195.
- [2]BIEGA J,KUZEY E,SUCHANEK F M.Inside YAGO2s:a transparent information extraction architecture[C]//2013 ACM International Conference on World Wide Web.New York,USA:ACM,2013:325-328.
- [3]ZHANG X,LIU X,LI X,et al.MMKG:an approach to generate metallic materials knowledge graph based on DBpedia and Wikipedia[J].Computer physics communications,2016,211:98-112.
- [4]ZHANG X,PAN D,ZHAO C,et al.MMOY:towards deriving a metallic materials ontology from Yago[J].Advanced engineering informatics,2016,30(4):687-702.
- [5]ZHANG X,LüP,WANG J.STSM:an infrastructure for unifying steel knowledge and discovering new knowledge[J].International journal of database theory&application,2014,7(6):175-190.
- [6]WHETZEL P L,NOY N F,SHAH N H,et al.Sleep domain ontology[DB/OL].[2017-04-16].http://purl.bioontology.org/ontology/SDO.
- [7]SANTOSH D T,BABU K S,PRASAD S D V,et al.Opinion mining of online product reviews from traditional LDA topic clusters using feature ontology tree and Sentiwordnet[J].International journal of education and management,2016,6(6):34-44.
- [8]RAMAGE D,HALL D,NALLAPATI R,et al.Labeled LDA:a supervised topic model for credit attribution in multi-labeled corpora[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2009:248-256.
- [9]FREIRE T,LI J.Using Wikipedia to enhance student learning:a case study in economics[J].Education and information technologies,2016,21(5):1169-1181.
- [10]PASSANT A.dbrec:music recommendations using DBpedia[C]//2010 ISWC.Shanghai:Springer Berlin Heidelberg,2010:209-224.
- [11]STOILOS G,STAMOU G,KOLLIAS S.A string metric for ontology alignment[J].Hermochimica acta,2005,3729(15):624-637.
- [12]PROZOROV D,YASHINA A.The extended longest common substring algorithm for spoken document retrieval[C]//2015International Conference on Application of Information and Communication Technologies.[S.l.]:IEEE,2015.88-90.
- [13]ASHINO T.Materials ontology:an infrastructure for exchanging materials information and knowledge[J].Data science journal,2010,9(9):54-61.