|
| MDPI_R (const shared_ptr< const MDP > &mdp, const indvec &state2observ, const Transition &initial) |
| Calls the base constructor and also constructs the corresponding robust MDP.
|
|
| MDPI_R (const MDP &mdp, const indvec &state2observ, const Transition &initial) |
| Calls the base constructor and also constructs the corresponding robust MDP.
|
|
const RMDP & | get_robust_mdp () const |
|
void | update_importance_weights (const numvec &weights) |
| Updates the weights on outcomes in the robust MDP based on the state weights provided. More...
|
|
indvec | solve_reweighted (long iterations, prec_t discount, const indvec &initobspol=indvec(0)) |
| Uses a simple iterative algorithm to solve the MDPI. More...
|
|
indvec | solve_robust (long iterations, prec_t threshold, prec_t discount, const indvec &initobspol=indvec(0)) |
| Uses a robust MDP formulation to solve the MDPI. More...
|
|
| MDPI (const shared_ptr< const MDP > &mdp, const indvec &state2observ, const Transition &initial) |
| Constructs the MDP with implementability constraints. More...
|
|
| MDPI (const MDP &mdp, const indvec &state2observ, const Transition &initial) |
| Constructs the MDP with implementability constraints. More...
|
|
size_t | obs_count () const |
|
size_t | state_count () const |
|
long | state2obs (long state) |
|
size_t | action_count (long obsid) |
|
indvec | obspol2statepol (const indvec &obspol) const |
| Converts a policy defined in terms of observations to a policy defined in terms of states. More...
|
|
void | obspol2statepol (const indvec &obspol, indvec &statepol) const |
| Converts a policy defined in terms of observations to a policy defined in terms of states. More...
|
|
Transition | transition2obs (const Transition &tran) |
| Converts a transition from states to observations, adding probabilities of individual states. More...
|
|
shared_ptr< const MDP > | get_mdp () |
| Internal MDP representation.
|
|
Transition | get_initial () const |
| Initial distribution of MDP.
|
|
indvec | random_policy (random_device::result_type seed=random_device{}()) |
| Constructs a random observation policy.
|
|
prec_t | total_return (prec_t discount, prec_t precision=SOLPREC) const |
| Computes a return of an observation policy. More...
|
|
void | to_csv (ostream &output_mdp, ostream &output_state2obs, ostream &output_initial, bool headers=true) const |
| Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution. More...
|
|
void | to_csv_file (const string &output_mdp, const string &output_state2obs, const string &output_initial, bool headers=true) const |
| Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution. More...
|
|
|
static unique_ptr< MDPI_R > | from_csv (istream &input_mdp, istream &input_state2obs, istream &input_initial, bool headers=true) |
|
static unique_ptr< MDPI_R > | from_csv_file (const string &input_mdp, const string &input_state2obs, const string &input_initial, bool headers=true) |
| Loads the class from an set of CSV files. More...
|
|
template<typename T = MDPI> |
static unique_ptr< T > | from_csv (istream &input_mdp, istream &input_state2obs, istream &input_initial, bool headers=true) |
| Loads an MDPI from a set of 3 csv files, for transitions, observations, and the initial distribution. More...
|
|
template<typename T = MDPI> |
static unique_ptr< T > | from_csv_file (const string &input_mdp, const string &input_state2obs, const string &input_initial, bool headers=true) |
|
An MDP with implementability constraints.
The class contains solution methods that rely on robust MDP reformulation of the problem.
indvec craam::impl::MDPI_R::solve_reweighted |
( |
long |
iterations, |
|
|
prec_t |
discount, |
|
|
const indvec & |
initobspol = indvec(0) |
|
) |
| |
|
inline |
Uses a simple iterative algorithm to solve the MDPI.
The algorithm starts with a policy composed of actions all 0, and then updates the distribution of robust outcomes (corresponding to MDP states), and computes the optimal solution for thus weighted RMDP.
This method modifies the stored robust MDP.
- Parameters
-
iterations | Maximal number of iterations; terminates when the policy no longer changes |
discount | Discount factor |
initobspol | Initial observation policy (optional). When omitted or has length 0 a policy that takes the first action (action 0) is used. |
- Returns
- Policy for observations (an index of each action for each observation)
Uses a robust MDP formulation to solve the MDPI.
States in the observation are treated as outcomes. The baseline distribution is inferred from the provided policy.
The uncertainty is bounded by using an L1 norm deviation and the provided threshold.
The method can run for several iterations, like solve_reweighted.
- Parameters
-
iterations | Maximal number of iterations; terminates when the policy no longer changes |
threshold | Upper bound on the L1 deviation from the baseline distribution. |
discount | Discount factor |
initobspol | Initial observation policy (optional). When omitted or has length 0 a policy that takes the first action (action 0) is used. |
- Returns
- Policy for observations (an index of each action for each observation)