红外与激光工程
紅外與激光工程
홍외여격광공정
INFRARED AND LASER ENGINEERING
2014年
7期
2354-2361
,共8页
粒子滤波%目标跟踪%多核DSP%并行计算
粒子濾波%目標跟蹤%多覈DSP%併行計算
입자려파%목표근종%다핵DSP%병행계산
particle filter%object tracking%multicore DSP%parallel computing
目标跟踪中的伺服系统需要极低的跟踪延时,由于粒子滤波跟踪算法固有的庞大计算量使得目标跟踪的精度大受影响。提出了一种粒子滤波跟踪算法在多核DSP系统中的快速实现方法。首先,利用DSP片上的包加速器来降低以太网相机的采集延时以及CPU占用率,CPU占用率从31%降低到10%;其次,通过手动操作高速缓存的刷新和实效,解决了多核同时共享图像数据带来的存储器一致性问题,多个核能通过高速缓存快速获取图像数据;最后,通过在多核核心上设置代理任务的方法,建立了一种多核并行计算的机制。粒子滤波算法中计算复杂度高的运算阶段被分配到多个核心上同时运算,实现了算法的低延时。实验结果显示8核加速比达到7倍以上,优于开放多处理标准OpenMP的并行优化效果。
目標跟蹤中的伺服繫統需要極低的跟蹤延時,由于粒子濾波跟蹤算法固有的龐大計算量使得目標跟蹤的精度大受影響。提齣瞭一種粒子濾波跟蹤算法在多覈DSP繫統中的快速實現方法。首先,利用DSP片上的包加速器來降低以太網相機的採集延時以及CPU佔用率,CPU佔用率從31%降低到10%;其次,通過手動操作高速緩存的刷新和實效,解決瞭多覈同時共享圖像數據帶來的存儲器一緻性問題,多箇覈能通過高速緩存快速穫取圖像數據;最後,通過在多覈覈心上設置代理任務的方法,建立瞭一種多覈併行計算的機製。粒子濾波算法中計算複雜度高的運算階段被分配到多箇覈心上同時運算,實現瞭算法的低延時。實驗結果顯示8覈加速比達到7倍以上,優于開放多處理標準OpenMP的併行優化效果。
목표근종중적사복계통수요겁저적근종연시,유우입자려파근종산법고유적방대계산량사득목표근종적정도대수영향。제출료일충입자려파근종산법재다핵DSP계통중적쾌속실현방법。수선,이용DSP편상적포가속기래강저이태망상궤적채집연시이급CPU점용솔,CPU점용솔종31%강저도10%;기차,통과수동조작고속완존적쇄신화실효,해결료다핵동시공향도상수거대래적존저기일치성문제,다개핵능통과고속완존쾌속획취도상수거;최후,통과재다핵핵심상설치대리임무적방법,건립료일충다핵병행계산적궤제。입자려파산법중계산복잡도고적운산계단피분배도다개핵심상동시운산,실현료산법적저연시。실험결과현시8핵가속비체도7배이상,우우개방다처리표준OpenMP적병행우화효과。
The object tracking servo system requires a low delay from an object moving to starting of rotations while the inherent computational complexity of PF (Particle Filter) affects the tracking precision. In this paper, a multicore DSP parallel implementation strategy for particle filter object tracking was proposed. Firstly, the PA module on chip was used to reduce the GigE image capturing delay and the CPU occupancy. The CPU load was considerably reduced from 31% to 10%. Secondly, by manually FLUSH after writing and INVALID before reading, the memory consistency problem was addressed and cacheable shared image data can be accessed at high efficiency. Finally, a mechanism of parallel computing on multi-core processor was introduced by adding proxy task. The computational intensive stages of particle filter were dispatched to 8 cores to eliminate system delay. Experimental results show that the tracking response time was decreased and algorithmic speedup runs up to 7 and exceeds OpenMP.