Abstract
Statistical Policy Comparison (SPC) assesses the equivalence of two stochastic policies (policy consistency) and has received broad attention. However, the SPC framework implicitly assumes the invariance of decision environments, and therefore fails to address a flurry of real-world data science applications. In this work, we refer to this overlooked issue as environment consistency, and together with policy consistency, this extends to a generalized concept process consistency for systematically comparing policy trials under the Markov decision process (MDP) framework. To address process consistency, we propose a unified comparison framework, extending beyond traditional statistical policy comparison studies by incorporating both policy and environment comparisons. For policy consistency, existing statistical policy comparison methods can be seamlessly integrated into our intentionally-designed framework without modification. Specifically for environment consistency (the focus of this work), we devise fine-grained return tests to capture shifts of key elements in MDPs; notably, under special cases where trajectory likelihood information is available or can be estimated, we introduce a trajectory test based on the likelihood ratio test (LRT), offering increased testing power. Extensive experiments demonstrate that our proposed testing methods achieve higher statistical power than existing approaches in testing process consistency, establishing their effectiveness across diverse real-world scenarios. Our code is available at https://github.com/bcxyf123/MDP-Testing.git.
| Original language | English |
|---|---|
| Title of host publication | CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management |
| Place of Publication | New York, NY, USA |
| Publisher | Association for Computing Machinery (ACM) |
| Pages | 3677–3687 |
| Number of pages | 11 |
| ISBN (Electronic) | 9798400720406 |
| ISBN (Print) | 9798400720406 |
| DOIs | |
| Publication status | Published - 10 Nov 2025 |
Publication series
| Name | CIKM: Conference on Information and Knowledge Management |
|---|---|
| Publisher | Association for Computing Machinery |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 9 Industry, Innovation, and Infrastructure
User-Defined Keywords
- markov decision process
- policy trial
- process consistency
- statistical policy comparison
Fingerprint
Dive into the research topics of 'From Policy Comparison to Process Consistency and Beyond'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver