计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2014年
7期
2515-2519
,共5页
频谱共享%多臂赌博机%在线学习%部分可观察的马尔科夫%最优传输
頻譜共享%多臂賭博機%在線學習%部分可觀察的馬爾科伕%最優傳輸
빈보공향%다비도박궤%재선학습%부분가관찰적마이과부%최우전수
spectrum sharing%multi-armed bandit%online learning%partially observable Markov decision process%optimal trans-mission
针对频谱共享中信道状态建模为完全知识马尔科夫时,应用受限的问题,提出了不同信道下基于信道感知的在线学习。根据授权用户是否存在于当前信道来选择激进发送或保守发送,由于保守发送时,信道状态是不可观测的,因此将信道模型建模为部分可观测马尔科夫决策过程。将信道未知情况下的最优传输策略建模为多臂赌博机模型。仿真结果表明,在信道不完全可知情况下的多臂赌博机在线学习算法能获得最优K步策略,并通过UCB-TUNED方法改善了最优传输的K步保守策略的收敛性。
針對頻譜共享中信道狀態建模為完全知識馬爾科伕時,應用受限的問題,提齣瞭不同信道下基于信道感知的在線學習。根據授權用戶是否存在于噹前信道來選擇激進髮送或保守髮送,由于保守髮送時,信道狀態是不可觀測的,因此將信道模型建模為部分可觀測馬爾科伕決策過程。將信道未知情況下的最優傳輸策略建模為多臂賭博機模型。倣真結果錶明,在信道不完全可知情況下的多臂賭博機在線學習算法能穫得最優K步策略,併通過UCB-TUNED方法改善瞭最優傳輸的K步保守策略的收斂性。
침대빈보공향중신도상태건모위완전지식마이과부시,응용수한적문제,제출료불동신도하기우신도감지적재선학습。근거수권용호시부존재우당전신도래선택격진발송혹보수발송,유우보수발송시,신도상태시불가관측적,인차장신도모형건모위부분가관측마이과부결책과정。장신도미지정황하적최우전수책략건모위다비도박궤모형。방진결과표명,재신도불완전가지정황하적다비도박궤재선학습산법능획득최우K보책략,병통과UCB-TUNED방법개선료최우전수적K보보수책략적수렴성。
Aiming at the problems that when the spectrum sharing channel state was modeled as a complete knowledge of Mar-kov,the application was limited,different channel based on channel-aware online learning was proposed,and according to the presence or absence of authorized users,radical or conservative sending was chosen.Due to the unobservable conservative trans-mission channel state,the channel was modeled as partially observable Markov decision process (POMDP),and the optimal transmission was modeled as multi-armed bandit in unknown channel.Results of the simulation indicated that the multi-armed bandit online learning could get the K-conservative policy in the circumstances of not fully known channel.At the same time,the convergence speed was improved by UCB-TUNED algorithm.