情报学报
情報學報
정보학보
Journal of the China Society for Scientific andTechnical Information
2015年
4期
424-431
,共8页
微博%新闻学%事件检测%特征选择
微博%新聞學%事件檢測%特徵選擇
미박%신문학%사건검측%특정선택
microblog%journalism%event detection%feature selection
近年来,微博在突发事件与公共安全中展现了越来越强的影响力。如果缺乏对微博突发事件的检测与监控手段,一旦发生微博突发事件,将使政府部门及企业处于被动地位,甚至危及社会稳定。目前微博事件检测通常采用基于特征集的分类方法,但对于同一个微博数据集,使用不同的特征集往往会得到截然不同的结果。因此,如何根据微博事件检测的要求选择合适的特征集是目前亟待解决的一个关键问题。本论文提出了一种新的微博事件特征选择方法。该方法借鉴了新闻学上的事件“5W”要素,在微博事件的“5W”初始特征集上通过交叉迭代方法优选出最优的特征集。我们在实际的微博数据集上,基于准确率、召回率、F 值等多种指标比较了我们提出的方法与巳有的4种特征选择策略的性能。结果表明,本文提出的方法可以降低最终选择的特征数目,并且在微博事件检测上具有较高的召回率和F 值。
近年來,微博在突髮事件與公共安全中展現瞭越來越彊的影響力。如果缺乏對微博突髮事件的檢測與鑑控手段,一旦髮生微博突髮事件,將使政府部門及企業處于被動地位,甚至危及社會穩定。目前微博事件檢測通常採用基于特徵集的分類方法,但對于同一箇微博數據集,使用不同的特徵集往往會得到截然不同的結果。因此,如何根據微博事件檢測的要求選擇閤適的特徵集是目前亟待解決的一箇關鍵問題。本論文提齣瞭一種新的微博事件特徵選擇方法。該方法藉鑒瞭新聞學上的事件“5W”要素,在微博事件的“5W”初始特徵集上通過交扠迭代方法優選齣最優的特徵集。我們在實際的微博數據集上,基于準確率、召迴率、F 值等多種指標比較瞭我們提齣的方法與巳有的4種特徵選擇策略的性能。結果錶明,本文提齣的方法可以降低最終選擇的特徵數目,併且在微博事件檢測上具有較高的召迴率和F 值。
근년래,미박재돌발사건여공공안전중전현료월래월강적영향력。여과결핍대미박돌발사건적검측여감공수단,일단발생미박돌발사건,장사정부부문급기업처우피동지위,심지위급사회은정。목전미박사건검측통상채용기우특정집적분류방법,단대우동일개미박수거집,사용불동적특정집왕왕회득도절연불동적결과。인차,여하근거미박사건검측적요구선택합괄적특정집시목전극대해결적일개관건문제。본논문제출료일충신적미박사건특정선택방법。해방법차감료신문학상적사건“5W”요소,재미박사건적“5W”초시특정집상통과교차질대방법우선출최우적특정집。아문재실제적미박수거집상,기우준학솔、소회솔、F 치등다충지표비교료아문제출적방법여사유적4충특정선택책략적성능。결과표명,본문제출적방법가이강저최종선택적특정수목,병차재미박사건검측상구유교고적소회솔화F 치。
In recent years,microblogs have shown much influence in emergency events detection and public safety. Governments and enterprises have to employ effective ways to detect and monitor emergent events in microblogs; otherwise when emergent events in microblogs happen,they will be in a passive situation and even the society’ s stability will be affected. At present,researchers usualy employ feature-based classification approaches to detect events in microblogs. However,it is very common to get different results when different features are used in event detection,even in the same microblog dataset. Therefore,it has been a critical issue how to select appropriate features for event detection in microblogs. In this paper,we propose ajournalism-based approach for selecting features for microblog event detection. Our proposal uses the “ 5W” feature of news events that has been revealed in journalism researches. In particular,we first construct an initial feature set representing the “5W” features of microblog event,then we perform across-fold and iterative procedure to find out the best feature set. We conduct experiments on areal microblog dataset to compare our proposal with four existing methods of feature selection,in terms of various metrics including precision,recal ,and F-measure. The results showthat our proposal can reduce the number of the selected feature set. In addition,it obtains high performance w. r. t. recal and F-measure in microblog event detection.