TY - GEN
T1 - Decomposing large-scale POMDP via belief state analysis
AU - Li, Xin
AU - CHEUNG, Kwok Wai
AU - LIU, Jiming
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2005
Y1 - 2005
N2 - Partially observable Markov decision process (POMDP) is commonly used to model a stochastic environment with unobservable states for supporting optimal decision making. Computing the optimal policy for a large-scale POMDP is known to be intractable. Belief compression, being an approximate solution, has recently been proposed to reduce the dimension of POMDP's belief state space and shown to be effective in improving the problem tractability. In this paper, with the conjecture that temporally close belief states could be characterized by a lower intrinsic dimension, we propose a spatio-temporal brief clustering that considers both the belief states ' spatial (in the belief space) and temporal similarities, as well as incorporate it into the belief compression algorithm. The proposed clustering results in belief state clusters as sub-POMDPs of much lower dimension so as to be distributed to a set of distributed agents for collaborative problem solving. The proposed method has been tested using a synthesized navigation problem (Hallway2) and empirically shown to be able to result in policies of superior long-term rewards when compared with those based on solely belief compression. Some future research directions for extending this belief state analysis approach are also included.
AB - Partially observable Markov decision process (POMDP) is commonly used to model a stochastic environment with unobservable states for supporting optimal decision making. Computing the optimal policy for a large-scale POMDP is known to be intractable. Belief compression, being an approximate solution, has recently been proposed to reduce the dimension of POMDP's belief state space and shown to be effective in improving the problem tractability. In this paper, with the conjecture that temporally close belief states could be characterized by a lower intrinsic dimension, we propose a spatio-temporal brief clustering that considers both the belief states ' spatial (in the belief space) and temporal similarities, as well as incorporate it into the belief compression algorithm. The proposed clustering results in belief state clusters as sub-POMDPs of much lower dimension so as to be distributed to a set of distributed agents for collaborative problem solving. The proposed method has been tested using a synthesized navigation problem (Hallway2) and empirically shown to be able to result in policies of superior long-term rewards when compared with those based on solely belief compression. Some future research directions for extending this belief state analysis approach are also included.
UR - http://www.scopus.com/inward/record.url?scp=33846314380&partnerID=8YFLogxK
U2 - 10.1109/IAT.2005.63
DO - 10.1109/IAT.2005.63
M3 - Conference proceeding
AN - SCOPUS:33846314380
SN - 0769524168
SN - 9780769524160
T3 - Proceedings - 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IAT'05
SP - 428
EP - 434
BT - Proceedings - 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IAT'05
T2 - 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology
Y2 - 19 September 2005 through 22 September 2005
ER -