- Количество слайдов: 30
How to Find APP Relationship: An Iterative Process Ming Liu Harbin Institute of Technology School of Computer Science and Technology
Backgrounds Plenty of apps are released to help users make the best use of their phones. Facing to massive apps available to be used, app retrieval and app recommendation are good solutions for users to acquire their desire apps. Recent methods are conducted mostly depending on user’s log or latent context similarity between apps. They can only detect whether two apps are downloaded, installed meanwhile or provide similar functions or not.
APP Relationship Apps contain deep relationship such as one app needs another app to cooperate to fulfill its work. This relationship can’t be dug only by user’s log or latent contexts of apps. “Hotels. com” and “alipay”. https: //play. google. com/store/apps/details? id=com. hco m. android&hl=zh_CN https: //play. google. com/store/apps/details? id=com. alip ay. android. client. pad&hl=zh_CN
The Role of Reviews contain useful information about apps, such as user’s viewpoint. Given two apps (marked as app 1 and app 2), users in one review to app 1 require a service which app 1 can’t provide, and there is another review to app 2 where users state this service is provided by app 2, app 1 and app 2 are possibly relevant. This relationship isn’t similarity.
Challenging Reviews are too short, thus, are uneasy to be full used. Most of reviews don’t directly describe apps whereas only contain user’s viewpoint, thus, are uneasy to be used to extract attributes. An iterative process by combining review similarity and app relationship into an calculating process.
Related Work App is just entity, and the way to calculate entity relationship can be directly used to calculate app relationship. Dictionary based way (sometimes called as knowledge based way) relies on professional thesauruses to extract attributes to define relationship among entities (or apps). Statistic based way (sometimes called as corpus based way) digs relationship among entities based on large-scale corpus.
Defeats Dictionary based way With its hierarchical structure (e. g. Word. Net), one can easily tell entity relationship in terms of the position of entity. Most of recent thesauruses don’t import apps as their terms, thus, it’s impossible to extract attributes from them to represent apps. Statistic based way It seldom encounters missing data issue. It bases on contextual similarity by attributes extracted from corpus, thus, it can only detect entity similarity.
Organization We use M to organize reviews and apps, and apply app vectors and review vectors to represent apps and reviews respectively. M is an n*k matrix constituted by Vector Space Model. Each column in M indicates one app in APP. Each row in M indicates one review in RC. The effective and efficient statistical metric tf-idf is adopted to form the value of each entry in these vectors.
APP Relationship Calculation Generally speaking, the straightforward way to calculate the relationship between two apps (e. g. appp and appq), is to use their app vectors as bases, such as tfcp and tfcq denote the values of cth entry respectively in V(appp) and in V(appq).
Drawback Previous equation bases on the idea that two apps frequently appearing in the same review are possibly similar to each other. Consequence: many similarities are close to 0. Reason: many apps share no common reviews even they are really similar.
Expand Reviews often contain topic similarity. For example, given two reviews, rci and rcj, respectively corresponding to appp and appq, if users in rci require a service which appp doesn’t provide, and users in rcj state this service is provided by appq, rci and rcj are topic similar. It’s straightforward that, if two apps frequently appear in the topic similar reviews, these two apps are relevant.
Results V’(appp) and V’(appq) respectively denote two app vectors after expanding by topic similarity among reviews. The cth entries in them can be calculated by
Review Similarity To calculate previous equations, it needs to calculate topic similarity among reviews beforehand. As the same to app relationship, vector based measurements can be directly adopted to calculate topic similarity as
Expand As topic similarity among reviews, apps contain semantic relationship among them, which causes two reviews without sharing the common apps even present the similar topic.
Assumption Calculations between app relationship and review similarity interact. To calculate review similarity, it needs to calculate app relationship in advance. Given rci and rcj, to calculate Sim(rci, rcj), it needs to calculate Pgc in advance to form app relationship between appg and appc. In contrast, to calculate app relationship, it needs to calculate review similarity in advance.
Simulating Results similar relevant irrelevant Google Map Baidu Map Booking. com Alipay Calculator PGA Tour Sohu Video Youku Video Effective Weight Loss Nike + Running MX Player Chrome Neuro Desktop Auto. CAD Cameringo Demo Trip Advisor m. Weather Football 2014 Medical Directory Medi. Diary Basic Photo Art Studio Pic Frames Fun Weight Loss Amazon English Dictionary Word Web Discount Calculator Ebay Change Voice Tube
Two Ways App relationship obtained by two ways
Reasons to Observations Our process combines app relationship and review similarity as an iterative calculating process. App relationship can be dug from reviews and then to conduct review similarity calculation. Review similarity can be found by the relationship among apps and then to direct app relationship calculation. Via this iterative process, deep relationship among apps can be obtained.
Initialization To perform our two-way-alternative process, we need to set one initial parameter (either initial app relationship R 0(appp, appq) or initial review similarity Sim 0(rci, rcj)). (initial parameters ) That is to choose the measurement to calculate app relationship and review similarity via app vector and review vector. It’s just to decide how to calculate Sim(rci, rcj) and R(appp, appq). (initial measurement)
Data Sets and measurements Miller-65, which includes 65 entity (or word) pairs selected from Word. Net whose relationship values are already defined by experts. Metrics: Pearson correlation and Spearman correlation APP Collection, which collects 1000 apps from Google play to form one artificial testing collection. Metric: F 1
Calculating results when changing initial parameters, whereas, fixing Cosine as initial measurement Miller-65 APP Collection HCT-HCG Cosine CH TLDA Cucerzan TPCA Cucerzan KL GMEL TLSI ESBM Euclidean ESBM LC ELPM WC Snowball TLSI ERPM TSR TWI 1 Pearson Spearman TWI 2 Pearson Spearman TWI 1 F 1 TWI 2 F 1 HCT-HCG Cosine 53. 78 54. 69 CH TLDA 56. 91 84. 35 69. 34 79. 68 83. 52 82. 14 68. 89 67. 54 88. 08 87. 01 87. 41 86. 16 70. 81 69. 35 81. 81 80. 21 Cucerzan TPCA Cucerzan KL 65. 57 63. 58 GMEL TLSI 68. 77 81. 13 ESBM Euclidean 61. 56 60. 83 65. 43 76. 73 68. 72 67. 25 68. 53 66. 83 84. 11 82. 56 83. 75 82. 33 66. 35 64. 78 67. 19 65. 64 68. 48 67. 13 81. 73 80. 21 ESBM LC 82. 31 81. 88 85. 39 83. 72 85. 03 83. 39 ELPM WC 85. 22 83. 86 82. 12 80. 44 Snowball TLSI 82. 45 79. 76 84. 46 83. 03 84. 49 83. 02 ERPM TSR 81. 63 81. 14
Calculating results when fixing Cosine to set initial parameters, whereas, changing initial measurements to calculate vectors Miller-65 APP Collection TWI 1 TWI 2 Pearson Spearman Pearson TWI 2 F 1 Cosine TWI 1 64. 83 62. 84 Euclidean 64. 31 62. 37 Spearman Cosine 68. 11 66. 63 65. 94 64. 22 Euclidean 67. 67 66. 15 65. 42 63. 78 KL 65. 22 63. 25 KL 68. 59 67. 05 66. 31 64. 67 TLDA 65. 48 63. 47 TLDA 68. 78 67. 30 66. 52 64. 83 TPCA 68. 75 67. 26 66. 47 64. 76 TPCA 65. 52 63. 33 TLSI 68. 93 67. 40 66. 69 64. 98 TLSI 65. 76 63. 74 LC 68. 85 67. 31 66. 52 64. 83 LC 65. 71 63. 57 WC 69. 03 67. 47 66. 71 64. 97 WC 65. 84 63. 82 TSR 69. 15 67. 59 66. 85 65. 11 TSR 65. 92 63. 89
Reasons The reason why initial parameters deeply affect calculating results is that, initial parameters are the only predefined factors to affect the subsequent calculation. The reason why initial measurements are unable to affect calculating results is that, app vector and review vector are already expanded by semantic relationship among apps and topic similarity among reviews. Therefore, different initial measurements are unable to import extra semantics, thus, they are unable to affect calculating results.
Conclusions To acquire high-quality results, we can only focus on choosing an effective method to set initial parameters. However, to determine which method is effective is uneasy to be fulfilled. For this reason, we hope to acquire high-quality results even with weak initial parameters.
Two Definitions Weak initial parameters or weak initial measurements Effective initial parameters or effective initial measurements Obtained by simple concurrence based methods VS obtained by compression based methods or semantic based methods. For example: Cosine, KL, or Euclidean distance VS Word Clustering, Lexical Cohesion, LSI, PCA, or LDA.
“Booking. com” and “Alipay” Two ways initialized by one combined method are marked as the same symbol. • With effective initial parameters, the final results are large, and, with weak initial parameters, the final results are small. • With effective initial parameters. calculating results from two ways are closer to each other than with weak initial parameters.
Observations and Conclusions Observations: The results with effective initial parameters are closer to the optimal results than those with weak initial parameters. The range of the results with weak initial parameters in the beginning stages contains that with effective initial parameters. Conclusion: When initial parameters for two ways are both optimal, calculating results should be the same at any time. The optimal results lie between two results respectively obtained by two ways of our two-way-alternative process.
Two Ways Combination The results with effective initial parameters are closer to the optimal results, it is reasonable that the results from the smooth way are credible and take more effects on the combined results. With effective initial parameters, the tracks are smooth, whereas, with weak initial parameters, the tracks are rough.
Selected Publications 1. Ming Liu, Chong Wu, Yuanchao Liu A Vector Reconstruction based Clustering Algorithm Particularly for Large-Scale Text Collection. Neural Networks. 2014, Accepted. (SCI) 2. 刘铭, 吴冲, 刘远超. 基于特征权重量化的相似度计算方法. 计算机学报, 2014, Accepted. 3. Ming Liu, Chong Wu, Yuanchao Liu. Weight Evaluation for Features via Constrained Data-Pairs. Information Sciences. 2014, Accepted. (SCI) 4. Ming Liu, Yuanchao Liu, Bingquan Liu, Lei Lin. Probability based Text Clustering Algorithm by Alternately Repeating Two Operations. Journal of Information Science. 2013, 39(3): 372 -383. (SCI, IDS: 149 BC) 5. Ming Liu, Lei Lin, Lili Shan, Chengjie Sun. A Novel Self-Adaptive Clustering Algorithm for Dynamic Data. ICONIP 2012, Doha, Qatar, 2012: 42 -49. 6. Ming Liu, Bingquan Liu, Yuanchao Liu, Chengjie Sun. Data Evolvement Analysis Based on Topology Self-Adaptive Clustering Algorithm. Information Technology and Control. 2012, 41(2): 162 -172. (SCI, IDS: 967 UJ) 7. 刘铭, 王晓龙, 刘远超. 基于词汇链的关键短语抽取方法的研究. 计算机学报. 2010, 33(7): 1246 -1255. 8. 刘铭, 王晓龙, 刘远超. 一种大规模高维数据快速聚类算法. 自动化学报. 2009, 35(7): 859 -866.
End Thank you!