武汉理工大学学报(信息与管理工程版)
武漢理工大學學報(信息與管理工程版)
무한리공대학학보(신식여관리공정판)
JOURNAL OF WUHAN AUTOMOTIVE POLYTECHNIC UNIVERSITY
2014年
6期
759-763
,共5页
杨亭%丰洪才%金凯%赵杰雪
楊亭%豐洪纔%金凱%趙傑雪
양정%봉홍재%금개%조걸설
竞争力%多模态融合%相似性度量%典型相关性%场景分割
競爭力%多模態融閤%相似性度量%典型相關性%場景分割
경쟁력%다모태융합%상사성도량%전형상관성%장경분할
competition%multi-modality%similarity measurement%canonical correlation%scene segmentation
针对视频分割中底层特征与高层语义之间的“语义鸿沟”问题,提出了一种基于多模态融合和镜头间竞争力的场景分割算法,对视频帧的图像、文本、音频等模态进行特征提取,用欧式距离、余弦距离计算出同种模态数据的相似性,用典型相关分析法计算出不同模态数据的相关度,分别对各模态数据的相似性和相关度进行融合得到镜头之间的相似度和相关度,采用镜头间竞争力的方法分别对相似镜头和相关镜头进行场景分割并对分割出的两个场景边界集合取交集得到最终的场景边界,从而实现对视频的场景分割。实验结果表明,该方法在场景分割中具有较高的性能,查全率和查准率分别达到82.1%和86.7%。
針對視頻分割中底層特徵與高層語義之間的“語義鴻溝”問題,提齣瞭一種基于多模態融閤和鏡頭間競爭力的場景分割算法,對視頻幀的圖像、文本、音頻等模態進行特徵提取,用歐式距離、餘絃距離計算齣同種模態數據的相似性,用典型相關分析法計算齣不同模態數據的相關度,分彆對各模態數據的相似性和相關度進行融閤得到鏡頭之間的相似度和相關度,採用鏡頭間競爭力的方法分彆對相似鏡頭和相關鏡頭進行場景分割併對分割齣的兩箇場景邊界集閤取交集得到最終的場景邊界,從而實現對視頻的場景分割。實驗結果錶明,該方法在場景分割中具有較高的性能,查全率和查準率分彆達到82.1%和86.7%。
침대시빈분할중저층특정여고층어의지간적“어의홍구”문제,제출료일충기우다모태융합화경두간경쟁력적장경분할산법,대시빈정적도상、문본、음빈등모태진행특정제취,용구식거리、여현거리계산출동충모태수거적상사성,용전형상관분석법계산출불동모태수거적상관도,분별대각모태수거적상사성화상관도진행융합득도경두지간적상사도화상관도,채용경두간경쟁력적방법분별대상사경두화상관경두진행장경분할병대분할출적량개장경변계집합취교집득도최종적장경변계,종이실현대시빈적장경분할。실험결과표명,해방법재장경분할중구유교고적성능,사전솔화사준솔분별체도82.1%화86.7%。
To solve the problem of"semantic gap"between low-level features and high-level semantic in video scene seg-mentation, an algorithm of video scene segmentation was put forward based on multimodal feature fusion and competition.The im-age, text and audio features were abstracted as the low-level features of the video frame.Euclidean distance, cosine similarity distance were used to calculate the similarity of homogeneous data, and the method of canonical correlation analysis was used to calculate the heterogeneous data correlation, respectively.The shot similarity and shot relevance were obtained by similarity fu-sion and correlation fusion.Then a competition analysis of splitting and merging forces for scene segmentation was adopted.The final scene was obtained by take the intersection of two segmented scenarios border sets.Thus the video scene segmentation was realized.The results of experiments show that the video scene can be effectively separated by the proposed method, and the recall ratio, precision reached 82.1%and 86.7%respectively.