Main namespace which includes modeling a solving functionality. More...

Namespaces
	algorithms
	Main namespace for algorithms that operate on MDPs and RMDPs.

	impl
	A namespace with tools for implementable, interpretable, and aggregated MDPs.

	msen
	A namespace for handling sampling and simulation.

Classes
class	GRMDP
	A general robust Markov decision process. More...

class	OutcomeManagement
	A class that manages creation and access to outcomes to be used by actions. More...

class	RegularAction
	Action in a regular MDP. More...

class	SAState
	State for sa-rectangular uncertainty (or no uncertainty) in an MDP. More...

class	Transition
	Represents sparse transition probabilities and rewards from a single state. More...

class	WeightedOutcomeAction
	An action in a robust MDP that allows for outcomes chosen by nature. More...

Typedefs
using	prec_t = double
	Default precision used throughout the code. More...

using	numvec = vector< prec_t >
	Default numerical vector.

using	indvec = vector< long >
	Default index vector.

using	vec_scal_t = pair< numvec, prec_t >
	Pair of a vector and a scalar.

using	ind_vec_scal_t = tuple< prec_t, numvec, prec_t >
	Tuple of a index, vector and a scalar.

typedef GRMDP< RegularState >	MDP
	Regular MDP with discrete actions and one outcome per action.

typedef GRMDP< WeightedRobustState >	RMDP
	An uncertain MDP with outcomes and weights. More...

typedef SAState< RegularAction >	RegularState
	Regular MDP state with no outcomes.

typedef SAState< WeightedOutcomeAction >	WeightedRobustState
	State with uncertain outcomes with L1 constraints on the distribution.

Functions
template<class T >
std::ostream &	operator<< (std::ostream &os, const std::vector< T > &vec)
	This is a useful functionality for debugging. More...

template<typename T >
vector< size_t >	sort_indexes (vector< T > const &v)
	Sort indices by values in ascending order. More...

template<typename T >
vector< size_t >	sort_indexes_desc (vector< T > const &v)
	Sort indices by values in descending order. More...

pair< numvec, prec_t >	worstcase_l1 (numvec const &z, numvec const &q, prec_t t)
	Computes the solution of: min_p p^T * z s.t. More...

template<class Model >
void	add_transition (Model &mdp, long fromid, long actionid, long outcomeid, long toid, prec_t probability, prec_t reward)
	Adds a transition probability and reward for a particular outcome. More...

template<class Model >
void	add_transition (Model &mdp, long fromid, long actionid, long toid, prec_t probability, prec_t reward)
	Adds a transition probability and reward for a model with no outcomes. More...

template<class Model >
Model &	from_csv (Model &mdp, istream &input, bool header=true)
	Loads an GRMDP definition from a simple csv file. More...

template<class Model >
Model &	from_csv_file (Model &mdp, const string &filename, bool header=true)
	Loads the transition probabilities and rewards from a CSV file. More...

template<class Model >
void	set_uniform_outcome_dst (Model &mdp)
	Sets the distribution for outcomes for each state and action to be uniform.

template<class Model >
void	set_outcome_dst (Model &mdp, size_t stateid, size_t actionid, const numvec &dist)
	Sets the distribution of outcomes for the given state and action.

template<class Model >
bool	is_outcome_dst_normalized (const Model &mdp)
	Checks whether outcome distributions sum to 1 for all states and actions. More...

template<class Model >
void	normalize_outcome_dst (Model &mdp)
	Normalizes outcome distributions for all states and actions. More...

RMDP	robustify (const MDP &mdp, bool allowzeros=false)
	Adds uncertainty to a regular MDP. More...

Variables
constexpr prec_t	SOLPREC = 0.0001
	Default solution precision.

constexpr unsigned long	MAXITER = 100000
	Default number of iterations.

constexpr prec_t	THRESHOLD = 1e-5
	Numerical threshold.

const prec_t	tolerance = 1e-5
	tolerance for checking whether a transition probability is normalized

Detailed Description

Main namespace which includes modeling a solving functionality.

Value-function based methods (value iteration and policy iteration) style algorithms.

Robust MDP methods for computing value functions.

Provides abstractions that allow generalization to both robust and regular MDPs.

Typedef Documentation

◆ prec_t

using craam::prec_t = typedef double

Default precision used throughout the code.

◆ RMDP

typedef GRMDP<WeightedRobustState> craam::RMDP

An uncertain MDP with outcomes and weights.

See craam::L1RobustState.

Function Documentation

◆ add_transition() [1/2]

template<class Model >

void craam::add_transition	(	Model &	mdp,
		long	fromid,
		long	actionid,
		long	outcomeid,
		long	toid,
		prec_t	probability,
		prec_t	reward
	)

inline

Adds a transition probability and reward for a particular outcome.

Parameters

mdp	model to add the transition to
fromid	Starting state ID
actionid	Action ID
toid	Destination ID
probability	Probability of the transition (must be non-negative)
reward	The reward associated with the transition.

◆ add_transition() [2/2]

template<class Model >

void craam::add_transition	(	Model &	mdp,
		long	fromid,
		long	actionid,
		long	toid,
		prec_t	probability,
		prec_t	reward
	)

inline

Adds a transition probability and reward for a model with no outcomes.

Parameters

mdp	model to add the transition to
fromid	Starting state ID
actionid	Action ID
outcomeid	Outcome ID (A single outcome corresponds to a regular MDP)
toid	Destination ID
probability	Probability of the transition (must be non-negative)
reward	The reward associated with the transition.

◆ from_csv()

template<class Model >

Model& craam::from_csv	(	Model &	mdp,
		istream &	input,
		bool	header = `true`
	)

inline

Loads an GRMDP definition from a simple csv file.

States, actions, and outcomes are identified by 0-based ids. The columns are separated by commas, and rows by new lines.

The file is formatted with the following columns: idstatefrom, idaction, idoutcome, idstateto, probability, reward

Note that outcome distributions are not restored.

Parameters

mdp	Model output (also returned)
input	Source of the RMDP
header	Whether the first line of the file represents the header. The column names are not checked for correctness or number!

Returns: The input model

◆ from_csv_file()

template<class Model >

Model& craam::from_csv_file	(	Model &	mdp,
		const string &	filename,
		bool	header = `true`
	)

inline

Loads the transition probabilities and rewards from a CSV file.

Parameters

mdp	Model output (also returned)
filename	Name of the file
header	Whether to create a header of the file too

Returns: The input model

◆ is_outcome_dst_normalized()

template<class Model >

bool craam::is_outcome_dst_normalized ( const Model & mdp )

inline

Checks whether outcome distributions sum to 1 for all states and actions.

This function only applies to models that have outcomes, such as ones using "WeightedOutcomeAction" or its derivatives.

◆ normalize_outcome_dst()

template<class Model >

void craam::normalize_outcome_dst ( Model & mdp )

inline

Normalizes outcome distributions for all states and actions.

This function only applies to models that have outcomes, such as ones using "WeightedOutcomeAction" or its derivatives.

◆ operator<<()

template<class T >

std::ostream& craam::operator<<	(	std::ostream &	os,
		const std::vector< T > &	vec
	)

This is a useful functionality for debugging.

◆ robustify()

RMDP craam::robustify	(	const MDP &	mdp,
		bool	allowzeros = `false`
	)

inline

Adds uncertainty to a regular MDP.

Turns transition probabilities to uncertain outcomes and uses the transition probabilities as the nominal weights assigned to the outcomes.

The input is an MDP: \( \mathcal{M} = (\mathcal{S},\mathcal{A},P,r) ,\) where the states are \( \mathcal{S} = \{ s_1, \ldots, s_n \} \) The output RMDP is: \( \bar{\mathcal{M}} = (\mathcal{S},\mathcal{A},\mathcal{B}, \bar{P},\bar{r},d), \) where the states and actions are the same as in the original MDP and \( d : \mathcal{S} \times \mathcal{A} \rightarrow \Delta^{\mathcal{B}} \) is the nominal probability of outcomes. Outcomes, transition probabilities, and rewards depend on whether uncertain transitions to zero-probability states are allowed:

When allowzeros = true, then \( \bar{\mathcal{M}} \) will also allow uncertain transition to states that have zero probabilities in \( \mathcal{M} \).

Outcomes are identical for all states and actions: \( \mathcal{B} = \{ b_1, \ldots, b_n \} \)
Transition probabilities are: \( \bar{P}(s_i,a,b_k,s_l) = 1 \text{ if } k = l, \text{ otherwise } 0 \)
Rewards are: \( \bar{r}(s_i,a,b_k,s_l) = r(s_i,a,s_k) \text{ if } k = l, \text{ otherwise } 0 \)
Nominal outcome probabilities are: \( d(s,a,b_k) = P(s,a,s_k) \)

When allowzeros = false, then \( \bar{\mathcal{M}} \) will only allow transitions to states that have non-zero transition probabilities in \( \mathcal{M} \). Let \( z_k(s,a) \) denote the \( k \)-th state with a non-zero transition probability from state \( s \) and action \( a \).

Outcomes for \( s,a \) are: \( \mathcal{B}(s,a) = \{ b_1, \ldots, b_{|z(s,a)|} \}, \) where \( |z(s,a)| \) is the number of positive transition probabilities in \( P \).
Transition probabilities are: \( \bar{P}(s_i,a,b_k,s_l) = 1 \text{ if } z_k(s_i,a) = l, \text{ otherwise } 0 \)
Rewards are: \( \bar{r}(s_i,a,b_k,s_k) = r(s_i,a,s_{z_k(s_i,a)}) \)
Nominal outcome probabilities are: \( d(s,a,b_k) = P(s,a,z_k(s,a)) \)

Parameters

mdp	MDP \( \mathcal{M} \) used as the input
allowzeros	Whether to allow outcomes to states with zero transition probability

Returns: RMDP with nominal probabilities

◆ sort_indexes()

template<typename T >

vector<size_t> craam::sort_indexes ( vector< T > const & v )

inline

Sort indices by values in ascending order.

Parameters

v	List of values

Returns: Sorted indices

◆ sort_indexes_desc()

template<typename T >

vector<size_t> craam::sort_indexes_desc ( vector< T > const & v )

inline

Sort indices by values in descending order.

Parameters

v	List of values

Returns: Sorted indices

◆ worstcase_l1()

pair<numvec,prec_t> craam::worstcase_l1	(	numvec const &	z,
		numvec const &	q,
		prec_t	t
	)

inline

Computes the solution of: min_p p^T * z s.t.

||p - q|| <= t 1^T p = 1 p >= 0

Notes

This implementation works in O(n log n) time because of the sort. Using quickselect to choose the right quantile would work in O(n) time.

This function does not check whether the probability distribution sums to 1.

Namespaces

Classes

Typedefs

Functions

Variables

Detailed Description

Typedef Documentation

◆ prec_t

◆ RMDP

Function Documentation

◆ add_transition() [1/2]

◆ add_transition() [2/2]

◆ from_csv()

◆ from_csv_file()

◆ is_outcome_dst_normalized()

◆ normalize_outcome_dst()

◆ operator<<()

◆ robustify()

◆ sort_indexes()

◆ sort_indexes_desc()

◆ worstcase_l1()

Notes