|
CRAAM
2.0.0
Robust and Approximate Markov Decision Processes
|
Represents an MDP with implementability constraints. More...
#include <ImMDP.hpp>
Public Member Functions | |
| MDPI (const shared_ptr< const MDP > &mdp, const indvec &state2observ, const Transition &initial) | |
| Constructs the MDP with implementability constraints. More... | |
| MDPI (const MDP &mdp, const indvec &state2observ, const Transition &initial) | |
| Constructs the MDP with implementability constraints. More... | |
| size_t | obs_count () const |
| size_t | state_count () const |
| long | state2obs (long state) |
| size_t | action_count (long obsid) |
| indvec | obspol2statepol (const indvec &obspol) const |
| Converts a policy defined in terms of observations to a policy defined in terms of states. More... | |
| void | obspol2statepol (const indvec &obspol, indvec &statepol) const |
| Converts a policy defined in terms of observations to a policy defined in terms of states. More... | |
| Transition | transition2obs (const Transition &tran) |
| Converts a transition from states to observations, adding probabilities of individual states. More... | |
| shared_ptr< const MDP > | get_mdp () |
| Internal MDP representation. | |
| Transition | get_initial () const |
| Initial distribution of MDP. | |
| indvec | random_policy (random_device::result_type seed=random_device{}()) |
| Constructs a random observation policy. | |
| prec_t | total_return (prec_t discount, prec_t precision=SOLPREC) const |
| Computes a return of an observation policy. More... | |
| void | to_csv (ostream &output_mdp, ostream &output_state2obs, ostream &output_initial, bool headers=true) const |
| Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution. More... | |
| void | to_csv_file (const string &output_mdp, const string &output_state2obs, const string &output_initial, bool headers=true) const |
| Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution. More... | |
Static Public Member Functions | |
| template<typename T = MDPI> | |
| static unique_ptr< T > | from_csv (istream &input_mdp, istream &input_state2obs, istream &input_initial, bool headers=true) |
| Loads an MDPI from a set of 3 csv files, for transitions, observations, and the initial distribution. More... | |
| template<typename T = MDPI> | |
| static unique_ptr< T > | from_csv_file (const string &input_mdp, const string &input_state2obs, const string &input_initial, bool headers=true) |
Static Protected Member Functions | |
| static void | check_parameters (const MDP &mdp, const indvec &state2observ, const Transition &initial) |
| Checks whether the parameters are correct. More... | |
Protected Attributes | |
| shared_ptr< const MDP > | mdp |
| the underlying MDP | |
| indvec | state2observ |
| maps index of a state to the index of the observation | |
| Transition | initial |
| initial distribution | |
| long | obscount |
| number of observations | |
| indvec | action_counts |
| number of actions for each observation | |
Represents an MDP with implementability constraints.
Consists of an MDP and a set of observations.
|
inline |
Constructs the MDP with implementability constraints.
This constructor makes it possible to share the MDP with other data structures.
Important: when the underlying MDP changes externally, the object becomes invalid and may result in unpredictable behavior.
| mdp | A non-robust base MDP model. |
| state2observ | Maps each state to the index of the corresponding observation. A valid policy will take the same action in all states with a single observation. The index is 0-based. |
| initial | A representation of the initial distribution. The rewards in this transition are ignored (and should be 0). |
|
inline |
Constructs the MDP with implementability constraints.
The MDP model is copied (using the copy constructor) and stored internally.
| mdp | A non-robust base MDP model. It cannot be shared to prevent direct modification. |
| state2observ | Maps each state to the index of the corresponding observation. A valid policy will take the same action in all states with a single observation. The index is 0-based. |
| initial | A representation of the initial distribution. The rewards in this transition are ignored (and should be 0). |
|
inlinestaticprotected |
Checks whether the parameters are correct.
Throws an exception if the parameters are wrong.
|
inlinestatic |
Loads an MDPI from a set of 3 csv files, for transitions, observations, and the initial distribution.
The MDP size is defined by the transitions file.
| input_mdp | File name for transition probabilities and rewards |
| input_state2obs | File name for mapping states to observations |
| input_initial | File name for initial distribution |
Converts a policy defined in terms of observations to a policy defined in terms of states.
| obspol | Policy that maps observations to actions to take |
Converts a policy defined in terms of observations to a policy defined in terms of states.
| obspol | Policy that maps observations to actions to take |
| statepol | State policy target |
|
inline |
Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution.
| output_mdp | Transition probabilities and rewards |
| output_state2obs | Mapping states to observations |
| output_initial | Initial distribution |
|
inline |
Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution.
| output_mdp | File name for transition probabilities and rewards |
| output_state2obs | File name for mapping states to observations |
| output_initial | File name for initial distribution |
Computes a return of an observation policy.
| discount | Discount factor |
|
inline |
Converts a transition from states to observations, adding probabilities of individual states.
Rewards are a convex combination of the original values.
1.8.13