An MDP with implementability constraints. More...

#include <ImMDP.hpp>

Inheritance diagram for craam::impl::MDPI_R:

Public Member Functions
	MDPI_R (const shared_ptr< const MDP > &mdp, const indvec &state2observ, const Transition &initial)
	Calls the base constructor and also constructs the corresponding robust MDP.

	MDPI_R (const MDP &mdp, const indvec &state2observ, const Transition &initial)
	Calls the base constructor and also constructs the corresponding robust MDP.

const RMDP &	get_robust_mdp () const

void	update_importance_weights (const numvec &weights)
	Updates the weights on outcomes in the robust MDP based on the state weights provided. More...

indvec	solve_reweighted (long iterations, prec_t discount, const indvec &initobspol=indvec(0))
	Uses a simple iterative algorithm to solve the MDPI. More...

indvec	solve_robust (long iterations, prec_t threshold, prec_t discount, const indvec &initobspol=indvec(0))
	Uses a robust MDP formulation to solve the MDPI. More...

Public Member Functions inherited from craam::impl::MDPI
	MDPI (const shared_ptr< const MDP > &mdp, const indvec &state2observ, const Transition &initial)
	Constructs the MDP with implementability constraints. More...

	MDPI (const MDP &mdp, const indvec &state2observ, const Transition &initial)
	Constructs the MDP with implementability constraints. More...

size_t	obs_count () const

size_t	state_count () const

long	state2obs (long state)

size_t	action_count (long obsid)

indvec	obspol2statepol (const indvec &obspol) const
	Converts a policy defined in terms of observations to a policy defined in terms of states. More...

void	obspol2statepol (const indvec &obspol, indvec &statepol) const
	Converts a policy defined in terms of observations to a policy defined in terms of states. More...

Transition	transition2obs (const Transition &tran)
	Converts a transition from states to observations, adding probabilities of individual states. More...

shared_ptr< const MDP >	get_mdp ()
	Internal MDP representation.

Transition	get_initial () const
	Initial distribution of MDP.

indvec	random_policy (random_device::result_type seed=random_device{}())
	Constructs a random observation policy.

prec_t	total_return (prec_t discount, prec_t precision=SOLPREC) const
	Computes a return of an observation policy. More...

void	to_csv (ostream &output_mdp, ostream &output_state2obs, ostream &output_initial, bool headers=true) const
	Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution. More...

void	to_csv_file (const string &output_mdp, const string &output_state2obs, const string &output_initial, bool headers=true) const
	Saves the MDPI to a set of 3 csv files, for transitions, observations, and the initial distribution. More...

Static Public Member Functions
static unique_ptr< MDPI_R >	from_csv (istream &input_mdp, istream &input_state2obs, istream &input_initial, bool headers=true)

static unique_ptr< MDPI_R >	from_csv_file (const string &input_mdp, const string &input_state2obs, const string &input_initial, bool headers=true)
	Loads the class from an set of CSV files. More...

Static Public Member Functions inherited from craam::impl::MDPI
template<typename T = MDPI>
static unique_ptr< T >	from_csv (istream &input_mdp, istream &input_state2obs, istream &input_initial, bool headers=true)
	Loads an MDPI from a set of 3 csv files, for transitions, observations, and the initial distribution. More...

template<typename T = MDPI>
static unique_ptr< T >	from_csv_file (const string &input_mdp, const string &input_state2obs, const string &input_initial, bool headers=true)

Protected Member Functions
void	initialize_robustmdp ()
	Constructs a robust version of the implementable MDP. More...

Protected Attributes
RMDP	robust_mdp
	Robust representation of the MDPI.

indvec	state2outcome
	Maps the index of the mdp state to the index of the observation within the state corresponding to the observation (multiple states per observation)

Protected Attributes inherited from craam::impl::MDPI
shared_ptr< const MDP >	mdp
	the underlying MDP

indvec	state2observ
	maps index of a state to the index of the observation

Transition	initial
	initial distribution

long	obscount
	number of observations

indvec	action_counts
	number of actions for each observation

Additional Inherited Members
Static Protected Member Functions inherited from craam::impl::MDPI
static void	check_parameters (const MDP &mdp, const indvec &state2observ, const Transition &initial)
	Checks whether the parameters are correct. More...

Detailed Description

An MDP with implementability constraints.

The class contains solution methods that rely on robust MDP reformulation of the problem.

Member Function Documentation

◆ from_csv_file()

static unique_ptr<MDPI_R> craam::impl::MDPI_R::from_csv_file	(	const string &	input_mdp,
		const string &	input_state2obs,
		const string &	input_initial,
		bool	headers = `true`
	)

inlinestatic

Loads the class from an set of CSV files.

◆ get_robust_mdp()

const RMDP& craam::impl::MDPI_R::get_robust_mdp ( ) const

inline

Returns the internal robust MDP representation

◆ initialize_robustmdp()

void craam::impl::MDPI_R::initialize_robustmdp ( )

inlineprotected

Constructs a robust version of the implementable MDP.

◆ solve_reweighted()

indvec craam::impl::MDPI_R::solve_reweighted	(	long	iterations,
		prec_t	discount,
		const indvec &	initobspol = `indvec(0)`
	)

inline

Uses a simple iterative algorithm to solve the MDPI.

The algorithm starts with a policy composed of actions all 0, and then updates the distribution of robust outcomes (corresponding to MDP states), and computes the optimal solution for thus weighted RMDP.

This method modifies the stored robust MDP.

Parameters

iterations	Maximal number of iterations; terminates when the policy no longer changes
discount	Discount factor
initobspol	Initial observation policy (optional). When omitted or has length 0 a policy that takes the first action (action 0) is used.

Returns: Policy for observations (an index of each action for each observation)

◆ solve_robust()

indvec craam::impl::MDPI_R::solve_robust	(	long	iterations,
		prec_t	threshold,
		prec_t	discount,
		const indvec &	initobspol = `indvec(0)`
	)

inline

Uses a robust MDP formulation to solve the MDPI.

States in the observation are treated as outcomes. The baseline distribution is inferred from the provided policy.

The uncertainty is bounded by using an L1 norm deviation and the provided threshold.

The method can run for several iterations, like solve_reweighted.

Parameters

iterations	Maximal number of iterations; terminates when the policy no longer changes
threshold	Upper bound on the L1 deviation from the baseline distribution.
discount	Discount factor
initobspol	Initial observation policy (optional). When omitted or has length 0 a policy that takes the first action (action 0) is used.

Returns: Policy for observations (an index of each action for each observation)

◆ update_importance_weights()

void craam::impl::MDPI_R::update_importance_weights ( const numvec & weights )

inline

Updates the weights on outcomes in the robust MDP based on the state weights provided.

This method modifies the stored robust MDP.

The documentation for this class was generated from the following file:

craam/ImMDP.hpp

Public Member Functions

Static Public Member Functions

Protected Member Functions

Protected Attributes

Additional Inherited Members

Detailed Description

Member Function Documentation

◆ from_csv_file()

◆ get_robust_mdp()

◆ initialize_robustmdp()

◆ solve_reweighted()

◆ solve_robust()

◆ update_importance_weights()