Abstract
Partially observable Markov decision process (POMDP) is a commonly adopted mathematical framework for solving planning problems in stochastic environments. However, computing the optimal policy of POMDP for large-scale problems is known to be intractable, where the high dimensionality of the underlying belief space is one of the major causes. In this paper, we propose a hybrid approach that integrates two different approaches for reducing the dimensionality of the belief space: 1) belief compression and 2) value-directed compression. In particular, a novel orthogonal nonnegative matrix factorization is derived for the belief compression, which is then integrated in a value-directed framework for computing the policy. In addition, with the conjecture that a properly partitioned belief space can have its per-cluster intrinsic dimension further reduced, we propose to apply a κ-means-like clustering technique to partition the belief space to form a set of sub-POMDPs before applying the dimension reduction techniques to each of them. We have evaluated the proposed belief compression and clustering approaches based on a set of benchmark problems and demonstrated their effectiveness in reducing the cost for computing policies, with the quality of the policies being retained.
Original language | English |
---|---|
Pages (from-to) | 125-136 |
Number of pages | 12 |
Journal | IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics |
Volume | 40 |
Issue number | 1 |
DOIs | |
Publication status | Published - Feb 2010 |
Scopus Subject Areas
- Control and Systems Engineering
- Software
- Information Systems
- Human-Computer Interaction
- Computer Science Applications
- Electrical and Electronic Engineering
User-Defined Keywords
- Belief clustering
- Belief compression
- Nonnegative matrix factorization (NMF)
- Partially observable Markov decision process (POMDP)