CRAAM
2.0.0
Robust and Approximate Markov Decision Processes
|
Main namespace which includes modeling a solving functionality. More...
Namespaces | |
algorithms | |
Main namespace for algorithms that operate on MDPs and RMDPs. | |
impl | |
A namespace with tools for implementable, interpretable, and aggregated MDPs. | |
msen | |
A namespace for handling sampling and simulation. | |
Classes | |
class | GRMDP |
A general robust Markov decision process. More... | |
class | OutcomeManagement |
A class that manages creation and access to outcomes to be used by actions. More... | |
class | RegularAction |
Action in a regular MDP. More... | |
class | SAState |
State for sa-rectangular uncertainty (or no uncertainty) in an MDP. More... | |
class | Transition |
Represents sparse transition probabilities and rewards from a single state. More... | |
class | WeightedOutcomeAction |
An action in a robust MDP that allows for outcomes chosen by nature. More... | |
Typedefs | |
using | prec_t = double |
Default precision used throughout the code. More... | |
using | numvec = vector< prec_t > |
Default numerical vector. | |
using | indvec = vector< long > |
Default index vector. | |
using | vec_scal_t = pair< numvec, prec_t > |
Pair of a vector and a scalar. | |
using | ind_vec_scal_t = tuple< prec_t, numvec, prec_t > |
Tuple of a index, vector and a scalar. | |
typedef GRMDP< RegularState > | MDP |
Regular MDP with discrete actions and one outcome per action. | |
typedef GRMDP< WeightedRobustState > | RMDP |
An uncertain MDP with outcomes and weights. More... | |
typedef SAState< RegularAction > | RegularState |
Regular MDP state with no outcomes. | |
typedef SAState< WeightedOutcomeAction > | WeightedRobustState |
State with uncertain outcomes with L1 constraints on the distribution. | |
Functions | |
template<class T > | |
std::ostream & | operator<< (std::ostream &os, const std::vector< T > &vec) |
This is a useful functionality for debugging. More... | |
template<typename T > | |
vector< size_t > | sort_indexes (vector< T > const &v) |
Sort indices by values in ascending order. More... | |
template<typename T > | |
vector< size_t > | sort_indexes_desc (vector< T > const &v) |
Sort indices by values in descending order. More... | |
pair< numvec, prec_t > | worstcase_l1 (numvec const &z, numvec const &q, prec_t t) |
Computes the solution of: min_p p^T * z s.t. More... | |
template<class Model > | |
void | add_transition (Model &mdp, long fromid, long actionid, long outcomeid, long toid, prec_t probability, prec_t reward) |
Adds a transition probability and reward for a particular outcome. More... | |
template<class Model > | |
void | add_transition (Model &mdp, long fromid, long actionid, long toid, prec_t probability, prec_t reward) |
Adds a transition probability and reward for a model with no outcomes. More... | |
template<class Model > | |
Model & | from_csv (Model &mdp, istream &input, bool header=true) |
Loads an GRMDP definition from a simple csv file. More... | |
template<class Model > | |
Model & | from_csv_file (Model &mdp, const string &filename, bool header=true) |
Loads the transition probabilities and rewards from a CSV file. More... | |
template<class Model > | |
void | set_uniform_outcome_dst (Model &mdp) |
Sets the distribution for outcomes for each state and action to be uniform. | |
template<class Model > | |
void | set_outcome_dst (Model &mdp, size_t stateid, size_t actionid, const numvec &dist) |
Sets the distribution of outcomes for the given state and action. | |
template<class Model > | |
bool | is_outcome_dst_normalized (const Model &mdp) |
Checks whether outcome distributions sum to 1 for all states and actions. More... | |
template<class Model > | |
void | normalize_outcome_dst (Model &mdp) |
Normalizes outcome distributions for all states and actions. More... | |
RMDP | robustify (const MDP &mdp, bool allowzeros=false) |
Adds uncertainty to a regular MDP. More... | |
Variables | |
constexpr prec_t | SOLPREC = 0.0001 |
Default solution precision. | |
constexpr unsigned long | MAXITER = 100000 |
Default number of iterations. | |
constexpr prec_t | THRESHOLD = 1e-5 |
Numerical threshold. | |
const prec_t | tolerance = 1e-5 |
tolerance for checking whether a transition probability is normalized | |
Main namespace which includes modeling a solving functionality.
Value-function based methods (value iteration and policy iteration) style algorithms.
Robust MDP methods for computing value functions.
Provides abstractions that allow generalization to both robust and regular MDPs.
using craam::prec_t = typedef double |
Default precision used throughout the code.
typedef GRMDP<WeightedRobustState> craam::RMDP |
An uncertain MDP with outcomes and weights.
See craam::L1RobustState.
|
inline |
Adds a transition probability and reward for a particular outcome.
mdp | model to add the transition to |
fromid | Starting state ID |
actionid | Action ID |
toid | Destination ID |
probability | Probability of the transition (must be non-negative) |
reward | The reward associated with the transition. |
|
inline |
Adds a transition probability and reward for a model with no outcomes.
mdp | model to add the transition to |
fromid | Starting state ID |
actionid | Action ID |
outcomeid | Outcome ID (A single outcome corresponds to a regular MDP) |
toid | Destination ID |
probability | Probability of the transition (must be non-negative) |
reward | The reward associated with the transition. |
|
inline |
Loads an GRMDP definition from a simple csv file.
States, actions, and outcomes are identified by 0-based ids. The columns are separated by commas, and rows by new lines.
The file is formatted with the following columns: idstatefrom, idaction, idoutcome, idstateto, probability, reward
Note that outcome distributions are not restored.
mdp | Model output (also returned) |
input | Source of the RMDP |
header | Whether the first line of the file represents the header. The column names are not checked for correctness or number! |
|
inline |
Loads the transition probabilities and rewards from a CSV file.
mdp | Model output (also returned) |
filename | Name of the file |
header | Whether to create a header of the file too |
|
inline |
Checks whether outcome distributions sum to 1 for all states and actions.
This function only applies to models that have outcomes, such as ones using "WeightedOutcomeAction" or its derivatives.
|
inline |
Normalizes outcome distributions for all states and actions.
This function only applies to models that have outcomes, such as ones using "WeightedOutcomeAction" or its derivatives.
std::ostream& craam::operator<< | ( | std::ostream & | os, |
const std::vector< T > & | vec | ||
) |
This is a useful functionality for debugging.
Adds uncertainty to a regular MDP.
Turns transition probabilities to uncertain outcomes and uses the transition probabilities as the nominal weights assigned to the outcomes.
The input is an MDP: \( \mathcal{M} = (\mathcal{S},\mathcal{A},P,r) ,\) where the states are \( \mathcal{S} = \{ s_1, \ldots, s_n \} \) The output RMDP is: \( \bar{\mathcal{M}} = (\mathcal{S},\mathcal{A},\mathcal{B}, \bar{P},\bar{r},d), \) where the states and actions are the same as in the original MDP and \( d : \mathcal{S} \times \mathcal{A} \rightarrow \Delta^{\mathcal{B}} \) is the nominal probability of outcomes. Outcomes, transition probabilities, and rewards depend on whether uncertain transitions to zero-probability states are allowed:
When allowzeros = true, then \( \bar{\mathcal{M}} \) will also allow uncertain transition to states that have zero probabilities in \( \mathcal{M} \).
When allowzeros = false, then \( \bar{\mathcal{M}} \) will only allow transitions to states that have non-zero transition probabilities in \( \mathcal{M} \). Let \( z_k(s,a) \) denote the \( k \)-th state with a non-zero transition probability from state \( s \) and action \( a \).
mdp | MDP \( \mathcal{M} \) used as the input |
allowzeros | Whether to allow outcomes to states with zero transition probability |
|
inline |
Sort indices by values in ascending order.
v | List of values |
|
inline |
Sort indices by values in descending order.
v | List of values |
Computes the solution of: min_p p^T * z s.t.
||p - q|| <= t 1^T p = 1 p >= 0
This implementation works in O(n log n) time because of the sort. Using quickselect to choose the right quantile would work in O(n) time.
This function does not check whether the probability distribution sums to 1.