面向不确定残缺数据的大学生成绩预测方法Prediction method of college students′ scores toward uncertain missing data
曹歆雨,曹卫权,李峥,孙金德
摘要(Abstract):
大学生在课程规划方面有很高的自由度,这使得成绩数据较不规整,研究者很难对学生的前序课程成绩进行有效分析、利用。已有的成绩预测方法普遍未考虑学生前序课程成绩残缺的现象,从而导致预测准确性不佳。提出一种基于K近邻局部最优重建的残缺数据插补方法,该方法能够有效抑制前序课程成绩缺失对预测模型精度的影响。实验表明,该方法的补全效果优于已有的均值插补、GMM插补等方法,结合随机森林模型实现了有效的成绩预测,为学生成绩管理、就业能力预警提供了客观的参考。
关键词(KeyWords): 成绩预测;缺失数据;数据插补;数据挖掘;机器学习;随机森林模型
基金项目(Foundation): 国家自然科学基金资助项目(61403301);国家自然科学基金(61773310)~~
作者(Author): 曹歆雨,曹卫权,李峥,孙金德
DOI: 10.16652/j.issn.1004-373x.2018.06.035
参考文献(References):
- [1]权小娟,朱晓文.大学生学习成绩变化趋势及其影响因素的实证研究[J].复旦教育论坛,2016,14(5):45-51.QUAN Xiaojuan,ZHU Xiaowen.The changing trend in college students’academic achievement and its influential factors:an empirical analysis[J].Fudan education forum,2016,14(5):45-51.
- [2]龙钧宇.基于压缩矩阵Apriori算法的高校学生成绩相关性分析研究[J].现代电子技术,2014,37(24):47-51.LONG Junyu.Research on correlation analysis of college student′s achievements based on Apriori algorithm with compressed matrix[J].Modern electronics technique,2014,37(24):47-51.
- [3]陈勇.基于遗传神经网络成绩预测的研究与实现[J].现代电子技术,2016,39(5):96-100.CHEN Yong.Research and implementation of result prediction based on genetic neural network[J].Modern electronics technique,2016,39(5):96-100.
- [4]叶苗.大数据分析大学生就业率估计模型仿真[J].计算机仿真,2016,33(11):183-186.YE Miao.Big data analysis of college students′employment rate estimation model simulation[J].Computer simulation,2016,33(11):183-186.
- [5]LEE T,CAI L.Alternative multiple imputation inference for mean and covariance structure modeling[J].Journal of educational&behavioral statistics,2012,37(6):675-702.
- [6]YAN Xiaobo,XIONG Weiqing,HU Liang,et al.Missing value imputation based on Gaussian mixture model for the Internet of Things[J].Mathematical problems in engineering,2015(3):1-8.
- [7]TANG J,ZHANG G,WANG Y,et al.A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation[J].Transportation research part C:emerging technologies,2015,51(1):29-40.
- [8]BREIMAN L.Random forests[J].Machine learning,2011,45(1):5-32.
- [9]NAGLER T,CZADO C.Evading the curse of dimensionality in multivariate kernel density estimation with simplified vines[J/OL].[2018-01-27].https://arxiv.org/pdf/1503.03305v1.pdf.
- [10]KANG P.Locally linear reconstruction based missing value imputation for supervised learning[J].Neurocomputing,2013,118(11):65-78.
- [11]JANSSEN H.Monte-carlo based uncertainty analysis:sampling efficiency and sampling convergence[J].Reliability engineering&system safety,2013,109(2):123-132.
- [12]WITTEN I H,FRANK E,HALL M A.Data mining:practical machine learning tools and techniques[M].Beijing:China Machine Press,2005.