CRAAM  2.0.0
Robust and Approximate Markov Decision Processes
Namespaces | Classes | Typedefs | Functions | Variables
craam Namespace Reference

Main namespace which includes modeling a solving functionality. More...

Namespaces

 algorithms
 Main namespace for algorithms that operate on MDPs and RMDPs.
 
 impl
 A namespace with tools for implementable, interpretable, and aggregated MDPs.
 
 msen
 A namespace for handling sampling and simulation.
 

Classes

class  GRMDP
 A general robust Markov decision process. More...
 
class  OutcomeManagement
 A class that manages creation and access to outcomes to be used by actions. More...
 
class  RegularAction
 Action in a regular MDP. More...
 
class  SAState
 State for sa-rectangular uncertainty (or no uncertainty) in an MDP. More...
 
class  Transition
 Represents sparse transition probabilities and rewards from a single state. More...
 
class  WeightedOutcomeAction
 An action in a robust MDP that allows for outcomes chosen by nature. More...
 

Typedefs

using prec_t = double
 Default precision used throughout the code. More...
 
using numvec = vector< prec_t >
 Default numerical vector.
 
using indvec = vector< long >
 Default index vector.
 
using vec_scal_t = pair< numvec, prec_t >
 Pair of a vector and a scalar.
 
using ind_vec_scal_t = tuple< prec_t, numvec, prec_t >
 Tuple of a index, vector and a scalar.
 
typedef GRMDP< RegularStateMDP
 Regular MDP with discrete actions and one outcome per action.
 
typedef GRMDP< WeightedRobustStateRMDP
 An uncertain MDP with outcomes and weights. More...
 
typedef SAState< RegularActionRegularState
 Regular MDP state with no outcomes.
 
typedef SAState< WeightedOutcomeActionWeightedRobustState
 State with uncertain outcomes with L1 constraints on the distribution.
 

Functions

template<class T >
std::ostream & operator<< (std::ostream &os, const std::vector< T > &vec)
 This is a useful functionality for debugging. More...
 
template<typename T >
vector< size_t > sort_indexes (vector< T > const &v)
 Sort indices by values in ascending order. More...
 
template<typename T >
vector< size_t > sort_indexes_desc (vector< T > const &v)
 Sort indices by values in descending order. More...
 
pair< numvec, prec_tworstcase_l1 (numvec const &z, numvec const &q, prec_t t)
 Computes the solution of: min_p p^T * z s.t. More...
 
template<class Model >
void add_transition (Model &mdp, long fromid, long actionid, long outcomeid, long toid, prec_t probability, prec_t reward)
 Adds a transition probability and reward for a particular outcome. More...
 
template<class Model >
void add_transition (Model &mdp, long fromid, long actionid, long toid, prec_t probability, prec_t reward)
 Adds a transition probability and reward for a model with no outcomes. More...
 
template<class Model >
Model & from_csv (Model &mdp, istream &input, bool header=true)
 Loads an GRMDP definition from a simple csv file. More...
 
template<class Model >
Model & from_csv_file (Model &mdp, const string &filename, bool header=true)
 Loads the transition probabilities and rewards from a CSV file. More...
 
template<class Model >
void set_uniform_outcome_dst (Model &mdp)
 Sets the distribution for outcomes for each state and action to be uniform.
 
template<class Model >
void set_outcome_dst (Model &mdp, size_t stateid, size_t actionid, const numvec &dist)
 Sets the distribution of outcomes for the given state and action.
 
template<class Model >
bool is_outcome_dst_normalized (const Model &mdp)
 Checks whether outcome distributions sum to 1 for all states and actions. More...
 
template<class Model >
void normalize_outcome_dst (Model &mdp)
 Normalizes outcome distributions for all states and actions. More...
 
RMDP robustify (const MDP &mdp, bool allowzeros=false)
 Adds uncertainty to a regular MDP. More...
 

Variables

constexpr prec_t SOLPREC = 0.0001
 Default solution precision.
 
constexpr unsigned long MAXITER = 100000
 Default number of iterations.
 
constexpr prec_t THRESHOLD = 1e-5
 Numerical threshold.
 
const prec_t tolerance = 1e-5
 tolerance for checking whether a transition probability is normalized
 

Detailed Description

Main namespace which includes modeling a solving functionality.

Value-function based methods (value iteration and policy iteration) style algorithms.

Robust MDP methods for computing value functions.

Provides abstractions that allow generalization to both robust and regular MDPs.

Typedef Documentation

◆ prec_t

using craam::prec_t = typedef double

Default precision used throughout the code.

◆ RMDP

An uncertain MDP with outcomes and weights.

See craam::L1RobustState.

Function Documentation

◆ add_transition() [1/2]

template<class Model >
void craam::add_transition ( Model &  mdp,
long  fromid,
long  actionid,
long  outcomeid,
long  toid,
prec_t  probability,
prec_t  reward 
)
inline

Adds a transition probability and reward for a particular outcome.

Parameters
mdpmodel to add the transition to
fromidStarting state ID
actionidAction ID
toidDestination ID
probabilityProbability of the transition (must be non-negative)
rewardThe reward associated with the transition.

◆ add_transition() [2/2]

template<class Model >
void craam::add_transition ( Model &  mdp,
long  fromid,
long  actionid,
long  toid,
prec_t  probability,
prec_t  reward 
)
inline

Adds a transition probability and reward for a model with no outcomes.

Parameters
mdpmodel to add the transition to
fromidStarting state ID
actionidAction ID
outcomeidOutcome ID (A single outcome corresponds to a regular MDP)
toidDestination ID
probabilityProbability of the transition (must be non-negative)
rewardThe reward associated with the transition.

◆ from_csv()

template<class Model >
Model& craam::from_csv ( Model &  mdp,
istream &  input,
bool  header = true 
)
inline

Loads an GRMDP definition from a simple csv file.

States, actions, and outcomes are identified by 0-based ids. The columns are separated by commas, and rows by new lines.

The file is formatted with the following columns: idstatefrom, idaction, idoutcome, idstateto, probability, reward

Note that outcome distributions are not restored.

Parameters
mdpModel output (also returned)
inputSource of the RMDP
headerWhether the first line of the file represents the header. The column names are not checked for correctness or number!
Returns
The input model

◆ from_csv_file()

template<class Model >
Model& craam::from_csv_file ( Model &  mdp,
const string &  filename,
bool  header = true 
)
inline

Loads the transition probabilities and rewards from a CSV file.

Parameters
mdpModel output (also returned)
filenameName of the file
headerWhether to create a header of the file too
Returns
The input model

◆ is_outcome_dst_normalized()

template<class Model >
bool craam::is_outcome_dst_normalized ( const Model &  mdp)
inline

Checks whether outcome distributions sum to 1 for all states and actions.

This function only applies to models that have outcomes, such as ones using "WeightedOutcomeAction" or its derivatives.

◆ normalize_outcome_dst()

template<class Model >
void craam::normalize_outcome_dst ( Model &  mdp)
inline

Normalizes outcome distributions for all states and actions.

This function only applies to models that have outcomes, such as ones using "WeightedOutcomeAction" or its derivatives.

◆ operator<<()

template<class T >
std::ostream& craam::operator<< ( std::ostream &  os,
const std::vector< T > &  vec 
)

This is a useful functionality for debugging.

◆ robustify()

RMDP craam::robustify ( const MDP mdp,
bool  allowzeros = false 
)
inline

Adds uncertainty to a regular MDP.

Turns transition probabilities to uncertain outcomes and uses the transition probabilities as the nominal weights assigned to the outcomes.

The input is an MDP: \( \mathcal{M} = (\mathcal{S},\mathcal{A},P,r) ,\) where the states are \( \mathcal{S} = \{ s_1, \ldots, s_n \} \) The output RMDP is: \( \bar{\mathcal{M}} = (\mathcal{S},\mathcal{A},\mathcal{B}, \bar{P},\bar{r},d), \) where the states and actions are the same as in the original MDP and \( d : \mathcal{S} \times \mathcal{A} \rightarrow \Delta^{\mathcal{B}} \) is the nominal probability of outcomes. Outcomes, transition probabilities, and rewards depend on whether uncertain transitions to zero-probability states are allowed:

When allowzeros = true, then \( \bar{\mathcal{M}} \) will also allow uncertain transition to states that have zero probabilities in \( \mathcal{M} \).

  • Outcomes are identical for all states and actions: \( \mathcal{B} = \{ b_1, \ldots, b_n \} \)
  • Transition probabilities are: \( \bar{P}(s_i,a,b_k,s_l) = 1 \text{ if } k = l, \text{ otherwise } 0 \)
  • Rewards are: \( \bar{r}(s_i,a,b_k,s_l) = r(s_i,a,s_k) \text{ if } k = l, \text{ otherwise } 0 \)
  • Nominal outcome probabilities are: \( d(s,a,b_k) = P(s,a,s_k) \)

When allowzeros = false, then \( \bar{\mathcal{M}} \) will only allow transitions to states that have non-zero transition probabilities in \( \mathcal{M} \). Let \( z_k(s,a) \) denote the \( k \)-th state with a non-zero transition probability from state \( s \) and action \( a \).

  • Outcomes for \( s,a \) are: \( \mathcal{B}(s,a) = \{ b_1, \ldots, b_{|z(s,a)|} \}, \) where \( |z(s,a)| \) is the number of positive transition probabilities in \( P \).
  • Transition probabilities are: \( \bar{P}(s_i,a,b_k,s_l) = 1 \text{ if } z_k(s_i,a) = l, \text{ otherwise } 0 \)
  • Rewards are: \( \bar{r}(s_i,a,b_k,s_k) = r(s_i,a,s_{z_k(s_i,a)}) \)
  • Nominal outcome probabilities are: \( d(s,a,b_k) = P(s,a,z_k(s,a)) \)
Parameters
mdpMDP \( \mathcal{M} \) used as the input
allowzerosWhether to allow outcomes to states with zero transition probability
Returns
RMDP with nominal probabilities

◆ sort_indexes()

template<typename T >
vector<size_t> craam::sort_indexes ( vector< T > const &  v)
inline

Sort indices by values in ascending order.

Parameters
vList of values
Returns
Sorted indices

◆ sort_indexes_desc()

template<typename T >
vector<size_t> craam::sort_indexes_desc ( vector< T > const &  v)
inline

Sort indices by values in descending order.

Parameters
vList of values
Returns
Sorted indices

◆ worstcase_l1()

pair<numvec,prec_t> craam::worstcase_l1 ( numvec const &  z,
numvec const &  q,
prec_t  t 
)
inline

Computes the solution of: min_p p^T * z s.t.

||p - q|| <= t 1^T p = 1 p >= 0

Notes

This implementation works in O(n log n) time because of the sort. Using quickselect to choose the right quantile would work in O(n) time.

This function does not check whether the probability distribution sums to 1.