CRAAM  2.0.0
Robust and Approximate Markov Decision Processes
Public Member Functions | Protected Attributes | List of all members
craam::msen::SampledMDP Class Reference

Constructs an MDP from integer samples. More...

#include <Samples.hpp>

Public Member Functions

 SampledMDP ()
 Constructs an empty MDP from discrete samples.
 
void add_samples (const DiscreteSamples &samples)
 Constructs or adds states and actions based on the provided samples. More...
 
shared_ptr< const MDPget_mdp () const
 
shared_ptr< MDPget_mdp_mod ()
 
Transition get_initial () const
 
vector< vector< prec_t > > get_state_action_weights ()
 
long state_count ()
 Returns thenumber of states in the samples (the highest observed index. More...
 

Protected Attributes

shared_ptr< MDPmdp
 Internal MDP representation.
 
Transition initial
 Initial distribution.
 
vector< vector< prec_t > > state_action_weights
 Sample counts.
 

Detailed Description

Constructs an MDP from integer samples.

Integer samples: Each decision state, expectation state, and action are identified by an integer.

Input: Sample set \( \Sigma = (s_i, a_i, s_i', r_i, w_i)_{i=0}^{m-1} \)
Output: An MDP such that:

The class also tracks cumulative weights of state-action samples \( z \):

\[ z(s,a) = \sum_{i=0}^{m-1} w_i 1\{ s = s_i, a = a_i \} \]

If \( z(s,a) = 0 \) then the action \( a \) is marked as invalid. There is some extra memory penalty due to storing these weights.

Important: Actions that are not sampled (no samples per that state and action pair) are labeled as invalid and are not included in the computation of value function or the solution. For example, if there is an action 1 in state zero but there are no samples that include action 0 then action 0 is still created, but is ignored when computing the value function.

When sample sets are added by multiple calls of SampledMDP::add_samples, the results is the same as if all the individual sample sets were combined and added together. See SampledMDP::add_samples for more details.

Member Function Documentation

◆ add_samples()

void craam::msen::SampledMDP::add_samples ( const DiscreteSamples samples)
inline

Constructs or adds states and actions based on the provided samples.

Sample sets can be added iteratively. Assume that the current transition probabilities are constructed based on a sample set \( \Sigma = (s_i, a_i, s_i', r_i, w_i)_{i=0}^{m-1} \) and add_samples is called with sample set \( \Sigma' = (s_j, a_j, s_j', r_j, w_j)_{i=m}^{n-1} \). The result is the same as if simultaneously adding samples \( 0 \ldots (n-1) \).

New MDP values are updates as follows:

  • Cumulative state-action weights \( z'\):

    \[ z'(s,a) = z(s,a) + \sum_{j=m}^{n-1} w_j 1\{ s = s_j, a = a_j \} \]

  • Transition probabilities \( P \):

    \begin{align*} P'(s,a,s') &= \frac{z(s,a) * P(s,a,s') + \sum_{j=m}^{n-1} w_j 1\{ s = s_j, a = a_j, s' = s_j' \} } { z'(s,a) } = \\ &= \frac{P(s,a,s') + (1 / z(s,a)) \sum_{j=m}^{n-1} w_j 1\{ s = s_j, a = a_j, s' = s_j' \} } { z'(s,a) / z(s,a) } \end{align*}

    The denominator is computed implicitly by normalizing transition probabilities.
  • Rewards \( r' \):

    \begin{align*} r'(s,a,s') &= \frac{r(s,a,s') z(s,a) P(s,a,s') + \sum_{j=m}^{n-1} r_j w_j 1\{ s = s_j, a = a_j, s' = s_j' \}} {z'(s,a)P'(s,a,s')} \\ r'(s,a,s') &= \frac{r(s,a,s') z(s,a) P(s,a,s') + \sum_{j=m}^{n-1} r_j w_j 1\{ s = s_j, a = a_j, s' = s_j' \}} {z'(s,a)P'(s,a,s')} \\ &= \frac{r(s,a,s') P(s,a,s') + \sum_{j=m}^{n-1}r_j (w_j/z(s,a)) 1\{ s = s_j, a = a_j, s' = s_j' \}} {z'(s,a)P'(s,a,s')/ z(s,a)} \\ &= \frac{r(s,a,s') P(s,a,s') + \sum_{j=m}^{n-1} r_j (w_j/z(s,a) 1\{ s = s_j, a = a_j, s' = s_j' \}} {P(s,a,s') + \sum_{j=m}^{n-1} (w_j/z(s,a)) 1\{ s = s_j, a = a_j, s' = s_j' \}} \end{align*}

    The last line follows from the definition of \( P(s,a,s') \). This corresponds to the operation of Transition::add_sample repeatedly for \( j = m \ldots (n-1) \) with

    \begin{align*} p &= (w_j/z(s,a)) 1\{ s = s_j, a = a_j, s' = s_j' \}\\ r &= r_j \end{align*}

    .
Parameters
samplesNew sample set to add to transition probabilities and rewards

◆ get_initial()

Transition craam::msen::SampledMDP::get_initial ( ) const
inline
Returns
Initial distribution based on empirical sample data

◆ get_mdp()

shared_ptr<const MDP> craam::msen::SampledMDP::get_mdp ( ) const
inline
Returns
A constant pointer to the internal MDP

◆ get_mdp_mod()

shared_ptr<MDP> craam::msen::SampledMDP::get_mdp_mod ( )
inline
Returns
A modifiable pointer to the internal MDP. Take care when changing.

◆ get_state_action_weights()

vector<vector<prec_t> > craam::msen::SampledMDP::get_state_action_weights ( )
inline
Returns
State-action cumulative weights \( z \). See class description for details.

◆ state_count()

long craam::msen::SampledMDP::state_count ( )
inline

Returns thenumber of states in the samples (the highest observed index.

Some may be missing)

Returns
0 when there are no samples

The documentation for this class was generated from the following file: