电子学报
電子學報
전자학보
ACTA ELECTRONICA SINICA
2014年
7期
1429-1434
,共6页
仵博%郑红燕%冯延蓬%陈鑫
仵博%鄭紅燕%馮延蓬%陳鑫
오박%정홍연%풍연봉%진흠
马尔可夫决策过程%贝叶斯强化学习%动态贝叶斯网路
馬爾可伕決策過程%貝葉斯彊化學習%動態貝葉斯網路
마이가부결책과정%패협사강화학습%동태패협사망로
Markov decision processes%Bayesian reinforcement learning%dynamic Bayesian networks
针对贝叶斯强化学习中参数个数巨大,收敛速度慢,无法实现在线学习的问题,提出一种基于模型的可分解贝叶斯强化学习方法。首先,将学习参数进行可分解表示,降低学习参数的个数;然后,根据先验知识和观察数据采用贝叶斯方法来学习,最优化探索和利用二者之间的平衡关系;最后,采用基于点的贝叶斯强化学习方法实现学习过程的快速收敛,从而达到在线学习的目的。仿真结果表明该算法能够满足实时系统性能的要求。
針對貝葉斯彊化學習中參數箇數巨大,收斂速度慢,無法實現在線學習的問題,提齣一種基于模型的可分解貝葉斯彊化學習方法。首先,將學習參數進行可分解錶示,降低學習參數的箇數;然後,根據先驗知識和觀察數據採用貝葉斯方法來學習,最優化探索和利用二者之間的平衡關繫;最後,採用基于點的貝葉斯彊化學習方法實現學習過程的快速收斂,從而達到在線學習的目的。倣真結果錶明該算法能夠滿足實時繫統性能的要求。
침대패협사강화학습중삼수개수거대,수렴속도만,무법실현재선학습적문제,제출일충기우모형적가분해패협사강화학습방법。수선,장학습삼수진행가분해표시,강저학습삼수적개수;연후,근거선험지식화관찰수거채용패협사방법래학습,최우화탐색화이용이자지간적평형관계;최후,채용기우점적패협사강화학습방법실현학습과정적쾌속수렴,종이체도재선학습적목적。방진결과표명해산법능구만족실시계통성능적요구。
Due to the enormous number of parameters and slow convergence which are the major obstacles for online learn -ing in model-based Bayesian reinforcement learning ,the paper presents a model-based factored Bayesian reinforcement learning ap-proach .Firstly ,factored representations are made to represent the dynamics with fewer parameters .Then ,according to prior knowl-edge and observable data ,this paper exploits model-based reinforcement learning to provide an elegant solution to the optimal explo-ration-exploitation tradeoff .Finally ,a pointed-based Bayesian reinforcement learning approach is proposed to speed up the conver -gence to achieve online learning .The experimental results show that the proposed approach can approximate the underlying Bayesian reinforcement learning task well with guaranteed real-time performance .