涉及隐私侵占类APP识别与分类方法研究-AET-电子技术应用

涉及隐私侵占类APP识别与分类方法研究

信息技术与网络安全 12期

易黎1，邱秀连1，马芳1，彭艳兵1，程光2

(1.南京烽火星空通信发展有限公司，江苏南京210019；2.东南大学网络空间安全学院，江苏南京211189)

摘要： 随着信息基础建设的发展和移动应用的普及，用户个人信息在使用过程中被应用开发者大量收集，出现了对个人信息的非法泄露和使用问题，严重威胁到了个人信息安全。为了更加高效准确地识别是否存在侵占隐私行为及对应APP类别，提出了一种基于多模态特征的多策略组合的识别算法。首先，该算法采用Word2vec的方法来完成APP相关文本的词汇层面的特征向量表示，随后有针对性地将获得的特征向量输入CNN网络进行分类，接着根据文本分类的结果和多种行为特征集合生成应用程序特征向量，最后结合多种不同的基分类器，采用硬投票的方式预测侵占隐私行为。实验结果表明，经过训练的模型在验证集上的分类结果F1值最高可达91%，该方法可以有效地对侵占隐私类APP进行识别及分类，有助于在大数据时代，保障个人信息安全建设。

关键词： 多标签文本分类特征提取行为特征模型构建机器学习

中图分类号： TP391.4
文献标识码： A
DOI： 10.19358/j.issn.2096-5133.2021.12.002
引用格式：易黎，邱秀连，马芳，等. 涉及隐私侵占类APP识别与分类方法研究[J].信息技术与网络安全，2021，40(12)：8-14.

Research on identification and classification methods of APP involving privacy infringement

Yi Li1，Qiu Xiulian1，Ma Fang1，Peng Yanbing1，Cheng Guang2

(1.Nanjing FiberHome Software Technology Co.，Ltd.，Nanjing 210019，China； 2.School of Cyber Science and Engineering，Southeast University，Nanjing 211189，China)

Abstract： With the development of information infrastructure and the popularization of mobile applications, a large number of users′ personal information is collected by application developers in the process of use, and there are problems with the illegal collecting and using of personal information, which seriously threatens the security of personal information. In order to more effectively identify the type of APP and whether it has violated privacy, a recognition algorithm based on multi-modal features and multi-strategy combination is proposed. Firstly, the algorithm uses the Word2vec method to extract feature formation vectors related to APP text, and then the obtained feature vector is input into the CNN network for classification. Based on the result of the text classification and a variety of behavior feature sets, it generates application feature vectors, and finally combines a variety of different base classifiers and uses hard voting to predict the applications′ invade-privacy categories. The experimental result shows that the F1 value of the trained model on the validation set can be as high as 91%. This method can effectively identify and classify privacy-invading apps, which is helpful to ensure the security of personal information in the era of big data.

Key words : multi-label text classification；feature extraction；behavioral features；model construction；machine learning

0 引言

中国互联网络信息中心（CNNIC）发布的第48次《中国互联网络发展状况统计报告》表明，截至2021年6月，中国手机网民的数量已高达10.07亿，如此巨大的用户量具有不可估量的商业价值，而其背后如此巨大的用户个人信息在当前信息时代环境下更是蕴含着巨大价值[1]。但在实践中，如此众多的用户使用量其问题也接踵而至，最明显的是关于用户个人信息泄漏事件层出不穷，对用户个人信息的侵害可谓无孔不入，智能手机APP为用户带来便利的同时，也成为个人信息泄漏的根本原因之一。

依据敏感程度和安全性不同，用户个人信息内容分为用户核心隐私信息、用户的重要隐私信息与用户的普通隐私信息三个类别[2]。其中关于通讯录联系人、手机账号、账户密码、聊天记录以及定位用户当前所在地点等内容被划分为核心隐私信息；关于手机发送接收短信信息、拨通电话、调用手机自带的摄像头权限等内容信息归属于重要隐私信息一类；最后用户的Wi-Fi连接无线网络、蓝牙连接无线设备、手机数据网络流量使用等信息属于普通隐私信息。

本文详细内容请下载：http://www.chinaaet.com/resource/share/2000003889

作者信息：

易黎1，邱秀连1，马芳1，彭艳兵1，程光2

(1.南京烽火星空通信发展有限公司，江苏南京210019；2.东南大学网络空间安全学院，江苏南京211189)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容