CRAAM  2.0.0
Robust and Approximate Markov Decision Processes
Public Member Functions | Static Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
craam::impl::MDPI_R Class Reference

An MDP with implementability constraints. More...

#include <ImMDP.hpp>

Inheritance diagram for craam::impl::MDPI_R:
craam::impl::MDPI

Public Member Functions

 MDPI_R (const shared_ptr< const MDP > &mdp, const indvec &state2observ, const Transition &initial)
 Calls the base constructor and also constructs the corresponding robust MDP.
 
 MDPI_R (const MDP &mdp, const indvec &state2observ, const Transition &initial)
 Calls the base constructor and also constructs the corresponding robust MDP.
 
const RMDPget_robust_mdp () const
 
void update_importance_weights (const numvec &weights)
 Updates the weights on outcomes in the robust MDP based on the state weights provided. More...
 
indvec solve_reweighted (long iterations, prec_t discount, const indvec &initobspol=indvec(0))
 Uses a simple iterative algorithm to solve the MDPI. More...
 
indvec solve_robust (long iterations, prec_t threshold, prec_t discount, const indvec &initobspol=indvec(0))
 Uses a robust MDP formulation to solve the MDPI. More...
 
- Public Member Functions inherited from craam::impl::MDPI
 MDPI (const shared_ptr< const MDP > &mdp, const indvec &state2observ, const Transition &initial)
 Constructs the MDP with implementability constraints. More...
 
 MDPI (const MDP &mdp, const indvec &state2observ, const Transition &initial)
 Constructs the MDP with implementability constraints. More...
 
size_t obs_count () const
 
size_t state_count () const
 
long state2obs (long state)
 
size_t action_count (long obsid)
 
indvec obspol2statepol (const indvec &obspol) const
 Converts a policy defined in terms of observations to a policy defined in terms of states. More...
 
void obspol2statepol (const indvec &obspol, indvec &statepol) const
 Converts a policy defined in terms of observations to a policy defined in terms of states. More...
 
Transition transition2obs (const Transition &tran)
 Converts a transition from states to observations, adding probabilities of individual states. More...
 
shared_ptr< const MDPget_mdp ()
 Internal MDP representation.
 
Transition get_initial () const
 Initial distribution of MDP.
 
indvec random_policy (random_device::result_type seed=random_device{}())
 Constructs a random observation policy.
 
prec_t total_return (prec_t discount, prec_t precision=SOLPREC) const
 Computes a return of an observation policy. More...
 
void to_csv (ostream &output_mdp, ostream &output_state2obs, ostream &output_initial, bool headers=true) const
 Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution. More...
 
void to_csv_file (const string &output_mdp, const string &output_state2obs, const string &output_initial, bool headers=true) const
 Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution. More...
 

Static Public Member Functions

static unique_ptr< MDPI_Rfrom_csv (istream &input_mdp, istream &input_state2obs, istream &input_initial, bool headers=true)
 
static unique_ptr< MDPI_Rfrom_csv_file (const string &input_mdp, const string &input_state2obs, const string &input_initial, bool headers=true)
 Loads the class from an set of CSV files. More...
 
- Static Public Member Functions inherited from craam::impl::MDPI
template<typename T = MDPI>
static unique_ptr< T > from_csv (istream &input_mdp, istream &input_state2obs, istream &input_initial, bool headers=true)
 Loads an MDPI from a set of 3 csv files, for transitions, observations, and the initial distribution. More...
 
template<typename T = MDPI>
static unique_ptr< T > from_csv_file (const string &input_mdp, const string &input_state2obs, const string &input_initial, bool headers=true)
 

Protected Member Functions

void initialize_robustmdp ()
 Constructs a robust version of the implementable MDP. More...
 

Protected Attributes

RMDP robust_mdp
 Robust representation of the MDPI.
 
indvec state2outcome
 Maps the index of the mdp state to the index of the observation within the state corresponding to the observation (multiple states per observation)
 
- Protected Attributes inherited from craam::impl::MDPI
shared_ptr< const MDPmdp
 the underlying MDP
 
indvec state2observ
 maps index of a state to the index of the observation
 
Transition initial
 initial distribution
 
long obscount
 number of observations
 
indvec action_counts
 number of actions for each observation
 

Additional Inherited Members

- Static Protected Member Functions inherited from craam::impl::MDPI
static void check_parameters (const MDP &mdp, const indvec &state2observ, const Transition &initial)
 Checks whether the parameters are correct. More...
 

Detailed Description

An MDP with implementability constraints.

The class contains solution methods that rely on robust MDP reformulation of the problem.

Member Function Documentation

◆ from_csv_file()

static unique_ptr<MDPI_R> craam::impl::MDPI_R::from_csv_file ( const string &  input_mdp,
const string &  input_state2obs,
const string &  input_initial,
bool  headers = true 
)
inlinestatic

Loads the class from an set of CSV files.

See also from_csv.

◆ get_robust_mdp()

const RMDP& craam::impl::MDPI_R::get_robust_mdp ( ) const
inline

Returns the internal robust MDP representation

◆ initialize_robustmdp()

void craam::impl::MDPI_R::initialize_robustmdp ( )
inlineprotected

Constructs a robust version of the implementable MDP.

◆ solve_reweighted()

indvec craam::impl::MDPI_R::solve_reweighted ( long  iterations,
prec_t  discount,
const indvec initobspol = indvec(0) 
)
inline

Uses a simple iterative algorithm to solve the MDPI.

The algorithm starts with a policy composed of actions all 0, and then updates the distribution of robust outcomes (corresponding to MDP states), and computes the optimal solution for thus weighted RMDP.

This method modifies the stored robust MDP.

Parameters
iterationsMaximal number of iterations; terminates when the policy no longer changes
discountDiscount factor
initobspolInitial observation policy (optional). When omitted or has length 0 a policy that takes the first action (action 0) is used.
Returns
Policy for observations (an index of each action for each observation)

◆ solve_robust()

indvec craam::impl::MDPI_R::solve_robust ( long  iterations,
prec_t  threshold,
prec_t  discount,
const indvec initobspol = indvec(0) 
)
inline

Uses a robust MDP formulation to solve the MDPI.

States in the observation are treated as outcomes. The baseline distribution is inferred from the provided policy.

The uncertainty is bounded by using an L1 norm deviation and the provided threshold.

The method can run for several iterations, like solve_reweighted.

Parameters
iterationsMaximal number of iterations; terminates when the policy no longer changes
thresholdUpper bound on the L1 deviation from the baseline distribution.
discountDiscount factor
initobspolInitial observation policy (optional). When omitted or has length 0 a policy that takes the first action (action 0) is used.
Returns
Policy for observations (an index of each action for each observation)

◆ update_importance_weights()

void craam::impl::MDPI_R::update_importance_weights ( const numvec weights)
inline

Updates the weights on outcomes in the robust MDP based on the state weights provided.

This method modifies the stored robust MDP.


The documentation for this class was generated from the following file: