机器人
機器人
궤기인
ROBOT
2009年
4期
320-326
,共7页
多agent强化学习%Q学习%策略再用%基于案例的推理%追捕问题
多agent彊化學習%Q學習%策略再用%基于案例的推理%追捕問題
다agent강화학습%Q학습%책략재용%기우안례적추리%추포문제
multiagent reinforcement learning%Q-learning%policy reuse%case-based reasoning (CBR)%pursuit problem
提出一种基于案例推理的多agent强化学习方法.构建了系统策略案例库,通过判断agent之间的协作关系选择相应案例库子集.利用模拟退火方法从中寻找最合适的可再用案例策略,agent按照案例指导执行动作选择.在没有可用案例的情况下,agent执行联合行为学习(JAL).在学习结果的基础上实时更新系统策略案例库.追捕问题的仿真结果表明所提方法明显提高了学习速度与收敛性.
提齣一種基于案例推理的多agent彊化學習方法.構建瞭繫統策略案例庫,通過判斷agent之間的協作關繫選擇相應案例庫子集.利用模擬退火方法從中尋找最閤適的可再用案例策略,agent按照案例指導執行動作選擇.在沒有可用案例的情況下,agent執行聯閤行為學習(JAL).在學習結果的基礎上實時更新繫統策略案例庫.追捕問題的倣真結果錶明所提方法明顯提高瞭學習速度與收斂性.
제출일충기우안례추리적다agent강화학습방법.구건료계통책략안례고,통과판단agent지간적협작관계선택상응안례고자집.이용모의퇴화방법종중심조최합괄적가재용안례책략,agent안조안례지도집행동작선택.재몰유가용안례적정황하,agent집행연합행위학습(JAL).재학습결과적기출상실시경신계통책략안례고.추포문제적방진결과표명소제방법명현제고료학습속도여수렴성.
A multiagent reinforcement learning approach based on CBR (case-based reasoning) is proposed. The system policy case library is built, and the relevant policy case subset is chosen by judging the cooperation relationship between the agents. Simulated annealing is used to find the fittest and reuseful case policy, and then the agents choose their actions based on the case. And if there is no practicable case in the case library, the agents carry out joint action learning (JAL). The system policy case library can be updated in real time based on the learning results. The detailed simulation results on pursuit problem are presented to show the superiority of the presented method in learning speed and convergency.