通信学报
通信學報
통신학보
JOURNAL OF CHINA INSTITUTE OF COMMUNICATIONS
2013年
8期
1-9
,共9页
话题检测与跟踪%依存连接权%关联词对%报道关系检测%向量空间模型
話題檢測與跟蹤%依存連接權%關聯詞對%報道關繫檢測%嚮量空間模型
화제검측여근종%의존련접권%관련사대%보도관계검측%향량공간모형
topic detection and tracking%dependency connection weights%associating words group%report relation detection%vector space model
针对在新闻话题中报道突发、热点相似且子话题层次丰富的现象,依据增量 TF-IDF 值构造特征维,生成全局向量;然后在时间窗内生成特征连接权的局部邻接图,利用依存句法进行分析降维;最后采用领域词典加权,时间阈值衰减;从而构造出利用依存连接权VSM进行关联分析的子话题检测与跟踪(sTDT)计算方法。实验表明,利用依存关联分析使文本表示由线性变为平面结构,能够有效地提取描述子话题;在人工标注的测试语料下,其最小DET代价比经典方法至少降低2.2%。
針對在新聞話題中報道突髮、熱點相似且子話題層次豐富的現象,依據增量 TF-IDF 值構造特徵維,生成全跼嚮量;然後在時間窗內生成特徵連接權的跼部鄰接圖,利用依存句法進行分析降維;最後採用領域詞典加權,時間閾值衰減;從而構造齣利用依存連接權VSM進行關聯分析的子話題檢測與跟蹤(sTDT)計算方法。實驗錶明,利用依存關聯分析使文本錶示由線性變為平麵結構,能夠有效地提取描述子話題;在人工標註的測試語料下,其最小DET代價比經典方法至少降低2.2%。
침대재신문화제중보도돌발、열점상사차자화제층차봉부적현상,의거증량 TF-IDF 치구조특정유,생성전국향량;연후재시간창내생성특정련접권적국부린접도,이용의존구법진행분석강유;최후채용영역사전가권,시간역치쇠감;종이구조출이용의존련접권VSM진행관련분석적자화제검측여근종(sTDT)계산방법。실험표명,이용의존관련분석사문본표시유선성변위평면결구,능구유효지제취묘술자화제;재인공표주적측시어료하,기최소DET대개비경전방법지소강저2.2%。
Aiming at the phenomenon that there are abrupt reports, similar topics and abundant levels of subtopics in the news, a novel method based on relationship analysis using dependent sentence pattern was proposed for sub-topic detection and tracking (sTDT), which constructed feature dimensions to generate the global vectors according to the increment of TF-IDF, and then created the partial adjoin map based on the connection weights within the time window and decreased the dimensions through dependent sentence pattern. Finally, a novel method for sTDT computing was built with adjoins dictionary weights and time threshold attenuation. Experiments show that the proposed method transferrs the text from linear to plane structure, and ex-tracts the subtopics effectively, of which the minimum DET cost is reduced by at least 2.2 percent than that of classical methods.