中图分类号: TP391 文献标识码: A DOI: 10.19358/j.issn.2096-5133.2020.11.006 引用格式: 景鸿理,黄娜,李建国. 基于机器学习的恶意软件检测研究进展及挑战[J].信息技术与网络安全,2020,39(11):38-44,68.
Research progress and challenges of malware detection method based on machine learning
Jing Hongli1,Huang Na1,2,Li Jianguo1
1.Beijing Topsec Science & Technology Inc.,Beijing 100085,China; 2.Beijing University of Technology,Beijing 100124,China
Abstract: Due to the increasing number of malware and the updated attack means, malware detection combined with machine learning technology is a new direction of its development. Firstly, this paper introduces the static detecting methods and dynamic detecting methods of malware briefly; summarizes the general process of malware detecting methods based on machine learning, and reviews the existing methods with research progress. Using the data sets of Ember 2017 and Ember 2018, the structural feature correlation methods, including RF(Random Forest), LightGBM, SVM(Support Vector Machine), K-means and CNN(Convolutional Neural Network), are analyzed and validated,and the 2019 sample set analysis is used to validate the serialization feature correlation method, including several common deep learning algorithm models. The accuracy, precision, recall and F1_score of the trained model on different testing data sets are calculated as evaluating metrics. According to the experimental results, the advantages and disadvantages of various methods are discussed in this paper, the generalization ability of the tree model is verified and analyzed emphatically. It is shown that the model generally has degradation problem with the continuous evolution of samples, and the further research direction is pointed out at last.
Key words : malware detection;static detection of malware;machine learning;LightGBM;random forest