模式识别与人工智能
模式識彆與人工智能
모식식별여인공지능
Moshi Shibie yu Rengong Zhineng
2014年
2期
103-110
,共8页
陈兴国%高阳%范顺国%俞亚君
陳興國%高暘%範順國%俞亞君
진흥국%고양%범순국%유아군
强化学习%连续动作空间%函数估计%核方法
彊化學習%連續動作空間%函數估計%覈方法
강화학습%련속동작공간%함수고계%핵방법
Reinforcement Learning%Continuous Action Space%Function Approximation%Kernel Method
强化学习算法通常要处理连续状态及连续动作空间问题以实现精确控制。就此文中结合Actor-Critic方法在处理连续动作空间的优点及核方法在处理连续状态空间的优势,提出一种基于核方法的连续动作Actor-Critic学习算法( KCACL)。该算法中,Actor根据奖赏不作为原则更新动作概率,Critic采用基于核方法的在线选择时间差分算法学习状态值函数。对比实验验证该算法的有效性。
彊化學習算法通常要處理連續狀態及連續動作空間問題以實現精確控製。就此文中結閤Actor-Critic方法在處理連續動作空間的優點及覈方法在處理連續狀態空間的優勢,提齣一種基于覈方法的連續動作Actor-Critic學習算法( KCACL)。該算法中,Actor根據獎賞不作為原則更新動作概率,Critic採用基于覈方法的在線選擇時間差分算法學習狀態值函數。對比實驗驗證該算法的有效性。
강화학습산법통상요처리련속상태급련속동작공간문제이실현정학공제。취차문중결합Actor-Critic방법재처리련속동작공간적우점급핵방법재처리련속상태공간적우세,제출일충기우핵방법적련속동작Actor-Critic학습산법( KCACL)。해산법중,Actor근거장상불작위원칙경신동작개솔,Critic채용기우핵방법적재선선택시간차분산법학습상태치함수。대비실험험증해산법적유효성。
In reinforcement learning, the learning algorithms frequently have to deal with both continuous state and continuous action spaces to control accurately. In this paper, the great capacity of kernel method for handling continuous state space problems and the advantage of actor-critic method in dealing with continuous action space problems are combined. Kernel-based continuous-action actor-critic learning ( KCACL ) is proposed grounded on the combination. In KCACL, the actor updates each action probability based on reward-inaction, and the critic updates the state value function according to online selective kernel-based temporal difference( OSKTD) learning. The experimental results demonstrate the effectiveness of the proposed algorithm.