计算机应用与软件
計算機應用與軟件
계산궤응용여연건
Computer Applications and Software
2015年
10期
101-104
,共4页
李学俊%陈士洋%张以文%李龙澍
李學俊%陳士洋%張以文%李龍澍
리학준%진사양%장이문%리룡주
机器人足球%Keepaway%强化学习%抢球策略
機器人足毬%Keepaway%彊化學習%搶毬策略
궤기인족구%Keepaway%강화학습%창구책략
RoboCup%Keepaway%Reinforcement learning%Stealing strategy
在 RoboCup Keepaway 任务训练中,传统手工抢球策略的主观性强,对训练情形变化的适应性差,导致抢球球员任务完成时间长、抢断成功率低。针对这一问题,将强化学习应用于 Keepaway 中抢球球员的高层动作决策。通过对抢球任务特点的分析,合理设计了抢球球员强化学习模型的状态空间、动作空间及回报值,并给出了抢球球员的强化学习算法。实验结果表明经强化学习后,抢球球员能够根据比赛情形做出更客观的决策,决策效果显著优于手工策略。对于4v3和5v4规模的典型 Keepaway 任务,抢球球员采用学习后的策略决策时,抢球任务完成时间至少缩短了7.1%,抢断成功率至少提升了15.0%。
在 RoboCup Keepaway 任務訓練中,傳統手工搶毬策略的主觀性彊,對訓練情形變化的適應性差,導緻搶毬毬員任務完成時間長、搶斷成功率低。針對這一問題,將彊化學習應用于 Keepaway 中搶毬毬員的高層動作決策。通過對搶毬任務特點的分析,閤理設計瞭搶毬毬員彊化學習模型的狀態空間、動作空間及迴報值,併給齣瞭搶毬毬員的彊化學習算法。實驗結果錶明經彊化學習後,搶毬毬員能夠根據比賽情形做齣更客觀的決策,決策效果顯著優于手工策略。對于4v3和5v4規模的典型 Keepaway 任務,搶毬毬員採用學習後的策略決策時,搶毬任務完成時間至少縮短瞭7.1%,搶斷成功率至少提升瞭15.0%。
재 RoboCup Keepaway 임무훈련중,전통수공창구책략적주관성강,대훈련정형변화적괄응성차,도치창구구원임무완성시간장、창단성공솔저。침대저일문제,장강화학습응용우 Keepaway 중창구구원적고층동작결책。통과대창구임무특점적분석,합리설계료창구구원강화학습모형적상태공간、동작공간급회보치,병급출료창구구원적강화학습산법。실험결과표명경강화학습후,창구구원능구근거비새정형주출경객관적결책,결책효과현저우우수공책략。대우4v3화5v4규모적전형 Keepaway 임무,창구구원채용학습후적책략결책시,창구임무완성시간지소축단료7.1%,창단성공솔지소제승료15.0%。
In Robocop Keepaway training task,traditional hand-coded ball-stealing strategies are very subjective and can't adapt well to training situation changes,this leads to the takers taking longer time to complete the tasks and having lower stealing success rate.To solve this problem,we apply the reinforcement learning to high-level action decision-making for stealing takers in Keepaway task.By analysing the characteristic of stealing task,we reasonably design the state space,action space and reward value of the reinforcement learning model of stealing takers,and state a corresponding reinforcement learning algorithm for stealing takers.Experimental results show that after the rein-forced learning the stealing takers can make more objective decisions according to game's situation,the effect of the decisions made are re-markably better than the hand-coded strategies.For typical 4v3 and 5v4 scale Keepaway tasks,with the learned strategy to making decision, the stealing takers shorten 7.1% of the time at least for completing ball -stealing task,and the stealing success rate increases no less than 15.0% as well.