北京大学学报(自然科学版)
北京大學學報(自然科學版)
북경대학학보(자연과학판)
ACTA SCIENTIARUM NATURALIUM UNIVERSITATIS PEKINENSIS
2008年
4期
522-526
,共5页
彭春干%于敦山%曹喜信%盛世敏
彭春榦%于敦山%曹喜信%盛世敏
팽춘간%우돈산%조희신%성세민
H.264%VLSI结构%视频编码
H.264%VLSI結構%視頻編碼
H.264%VLSI결구%시빈편마
H.264%VLSI%video coding
针对H.264视频编码标准关键技术52级标量量化的VLSI实现过程中,传统结构的速度和面积不能有效满足H.264在高速高并行编码应用中的实时要求,通过采用部分CSD码无符号压缩移位加法树、参考电平连线、对量化系数和步长重新进行分组分段编码等方法,有效替代了H.264标量量化过程中出现的矩阵乘法、查表、除法等不利于硬件加速的算法,提出了一种非常适合流水加速的基于4×4块并行的VLSI结构,通过控制级联加法器级数就可以有效调节其速度性能,当级数为2时,其块处理速率可以达到121.6 MHz, 能够满足4096×2304@120 Hz视频的实时处理要求.该结构在面积和功耗方面较传统结构也有较大的改进,采用SMIC 0.13 μm工艺单元库,综合时钟频率设为100 MHz时,等效门和功耗分别节省了38%和30%.
針對H.264視頻編碼標準關鍵技術52級標量量化的VLSI實現過程中,傳統結構的速度和麵積不能有效滿足H.264在高速高併行編碼應用中的實時要求,通過採用部分CSD碼無符號壓縮移位加法樹、參攷電平連線、對量化繫數和步長重新進行分組分段編碼等方法,有效替代瞭H.264標量量化過程中齣現的矩陣乘法、查錶、除法等不利于硬件加速的算法,提齣瞭一種非常適閤流水加速的基于4×4塊併行的VLSI結構,通過控製級聯加法器級數就可以有效調節其速度性能,噹級數為2時,其塊處理速率可以達到121.6 MHz, 能夠滿足4096×2304@120 Hz視頻的實時處理要求.該結構在麵積和功耗方麵較傳統結構也有較大的改進,採用SMIC 0.13 μm工藝單元庫,綜閤時鐘頻率設為100 MHz時,等效門和功耗分彆節省瞭38%和30%.
침대H.264시빈편마표준관건기술52급표량양화적VLSI실현과정중,전통결구적속도화면적불능유효만족H.264재고속고병행편마응용중적실시요구,통과채용부분CSD마무부호압축이위가법수、삼고전평련선、대양화계수화보장중신진행분조분단편마등방법,유효체대료H.264표량양화과정중출현적구진승법、사표、제법등불리우경건가속적산법,제출료일충비상괄합류수가속적기우4×4괴병행적VLSI결구,통과공제급련가법기급수취가이유효조절기속도성능,당급수위2시,기괴처리속솔가이체도121.6 MHz, 능구만족4096×2304@120 Hz시빈적실시처리요구.해결구재면적화공모방면교전통결구야유교대적개진,채용SMIC 0.13 μm공예단원고,종합시종빈솔설위100 MHz시,등효문화공모분별절성료38%화30%.
52-level scalar quantization technology plays an important role in H.264/AVC. A novel parallel VLSI architecture is proposed for its hardware implementation, in which the 4×4 matrix multiplications is replaced by 16 unsigned compressed shift-adder-trees using partial CSD code scheme, switching reference wirings substitutes for look-up operation, and division is also avoided effectively, and no ROM or RAM is adopted in the overall quantizer. It can fulfill all the quantization calculations for all H.264 hybrid transform in 4×4 block parallelism. Its block throughput can reach 121.6 MHz, which can meet the real-time requirement for 4096×2304@120 Hz (119.43936 M/s) video compression. Compared with the conventional architecture, 38% cost and 30% power are saved. Considering speed and cost optimization, this architecture is very suitable for pipeline acceleration, and it is a useful IP for high resolution H.264 encoder VLSI realization.