TY - JOUR
T1 - Improving POMDP tractability via belief compression and clustering
AU - Li, Xin
AU - CHEUNG, Kwok Wai
AU - LIU, Jiming
N1 - Funding Information:
Manuscript received May 28, 2008; revised January 17, 2009. First published July 31, 2009; current version published October 30, 2009. This work was supported in part by the Hong Kong Baptist University under Faculty Research Grant FRG/05-06/II-81. This paper was recommended by Associate Editor S. E. Shimony.
PY - 2010/2
Y1 - 2010/2
N2 - Partially observable Markov decision process (POMDP) is a commonly adopted mathematical framework for solving planning problems in stochastic environments. However, computing the optimal policy of POMDP for large-scale problems is known to be intractable, where the high dimensionality of the underlying belief space is one of the major causes. In this paper, we propose a hybrid approach that integrates two different approaches for reducing the dimensionality of the belief space: 1) belief compression and 2) value-directed compression. In particular, a novel orthogonal nonnegative matrix factorization is derived for the belief compression, which is then integrated in a value-directed framework for computing the policy. In addition, with the conjecture that a properly partitioned belief space can have its per-cluster intrinsic dimension further reduced, we propose to apply a κ-means-like clustering technique to partition the belief space to form a set of sub-POMDPs before applying the dimension reduction techniques to each of them. We have evaluated the proposed belief compression and clustering approaches based on a set of benchmark problems and demonstrated their effectiveness in reducing the cost for computing policies, with the quality of the policies being retained.
AB - Partially observable Markov decision process (POMDP) is a commonly adopted mathematical framework for solving planning problems in stochastic environments. However, computing the optimal policy of POMDP for large-scale problems is known to be intractable, where the high dimensionality of the underlying belief space is one of the major causes. In this paper, we propose a hybrid approach that integrates two different approaches for reducing the dimensionality of the belief space: 1) belief compression and 2) value-directed compression. In particular, a novel orthogonal nonnegative matrix factorization is derived for the belief compression, which is then integrated in a value-directed framework for computing the policy. In addition, with the conjecture that a properly partitioned belief space can have its per-cluster intrinsic dimension further reduced, we propose to apply a κ-means-like clustering technique to partition the belief space to form a set of sub-POMDPs before applying the dimension reduction techniques to each of them. We have evaluated the proposed belief compression and clustering approaches based on a set of benchmark problems and demonstrated their effectiveness in reducing the cost for computing policies, with the quality of the policies being retained.
KW - Belief clustering
KW - Belief compression
KW - Nonnegative matrix factorization (NMF)
KW - Partially observable Markov decision process (POMDP)
UR - http://www.scopus.com/inward/record.url?scp=77955087265&partnerID=8YFLogxK
U2 - 10.1109/TSMCB.2009.2021573
DO - 10.1109/TSMCB.2009.2021573
M3 - Journal article
AN - SCOPUS:77955087265
SN - 1083-4419
VL - 40
SP - 125
EP - 136
JO - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
JF - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
IS - 1
ER -