Constructs an MDP from integer samples. More...

#include <Samples.hpp>

Public Member Functions
	SampledMDP ()
	Constructs an empty MDP from discrete samples.

void	add_samples (const DiscreteSamples &samples)
	Constructs or adds states and actions based on the provided samples. More...

shared_ptr< const MDP >	get_mdp () const

shared_ptr< MDP >	get_mdp_mod ()

Transition	get_initial () const

vector< vector< prec_t > >	get_state_action_weights ()

long	state_count ()
	Returns thenumber of states in the samples (the highest observed index. More...

Protected Attributes
shared_ptr< MDP >	mdp
	Internal MDP representation.

Transition	initial
	Initial distribution.

vector< vector< prec_t > >	state_action_weights
	Sample counts.

Detailed Description

Constructs an MDP from integer samples.

Integer samples: Each decision state, expectation state, and action are identified by an integer.

Input: Sample set \( \Sigma = (s_i, a_i, s_i', r_i, w_i)_{i=0}^{m-1} \)
Output: An MDP such that:

States: \( \mathcal{S} = \bigcup_{i=0}^{m-1} \{ s_i \} \cup \bigcup_{i=0}^{m-1} \{ s_i' \} \)
Actions: \( \mathcal{A} = \bigcup_{i=0}^{m-1} \{ a_i \} \)
Transition probabilities:
\[ P(s,a,s') = \frac{\sum_{i=0}^{m-1} w_i 1\{ s = s_i, a = a_i, s' = s_i' \} } { \sum_{i=0}^{m-1} w_i 1\{ s = s_i, a = a_i \} } \]
Rewards:
\[ r(s,a,s') = \frac{\sum_{i=0}^{m-1} r_i w_i 1\{ s = s_i, a = a_i, s' = s_i' \} } { \sum_{i=0}^{m-1} w_i 1\{ s = s_i, a = a_i, s' = s_i' \} } \]

The class also tracks cumulative weights of state-action samples \( z \):

\[ z(s,a) = \sum_{i=0}^{m-1} w_i 1\{ s = s_i, a = a_i \} \]

If \( z(s,a) = 0 \) then the action \( a \) is marked as invalid. There is some extra memory penalty due to storing these weights.

Important: Actions that are not sampled (no samples per that state and action pair) are labeled as invalid and are not included in the computation of value function or the solution. For example, if there is an action 1 in state zero but there are no samples that include action 0 then action 0 is still created, but is ignored when computing the value function.

When sample sets are added by multiple calls of SampledMDP::add_samples, the results is the same as if all the individual sample sets were combined and added together. See SampledMDP::add_samples for more details.

Member Function Documentation

◆ add_samples()

void craam::msen::SampledMDP::add_samples ( const DiscreteSamples & samples )

inline

Constructs or adds states and actions based on the provided samples.

Sample sets can be added iteratively. Assume that the current transition probabilities are constructed based on a sample set \( \Sigma = (s_i, a_i, s_i', r_i, w_i)_{i=0}^{m-1} \) and add_samples is called with sample set \( \Sigma' = (s_j, a_j, s_j', r_j, w_j)_{i=m}^{n-1} \). The result is the same as if simultaneously adding samples \( 0 \ldots (n-1) \).

New MDP values are updates as follows:

Cumulative state-action weights \( z'\):
\[ z'(s,a) = z(s,a) + \sum_{j=m}^{n-1} w_j 1\{ s = s_j, a = a_j \} \]
Transition probabilities \( P \):
\begin{align*} P'(s,a,s') &= \frac{z(s,a) * P(s,a,s') + \sum_{j=m}^{n-1} w_j 1\{ s = s_j, a = a_j, s' = s_j' \} } { z'(s,a) } = \\ &= \frac{P(s,a,s') + (1 / z(s,a)) \sum_{j=m}^{n-1} w_j 1\{ s = s_j, a = a_j, s' = s_j' \} } { z'(s,a) / z(s,a) } \end{align*}
The denominator is computed implicitly by normalizing transition probabilities.
Rewards \( r' \):
\begin{align*} r'(s,a,s') &= \frac{r(s,a,s') z(s,a) P(s,a,s') + \sum_{j=m}^{n-1} r_j w_j 1\{ s = s_j, a = a_j, s' = s_j' \}} {z'(s,a)P'(s,a,s')} \\ r'(s,a,s') &= \frac{r(s,a,s') z(s,a) P(s,a,s') + \sum_{j=m}^{n-1} r_j w_j 1\{ s = s_j, a = a_j, s' = s_j' \}} {z'(s,a)P'(s,a,s')} \\ &= \frac{r(s,a,s') P(s,a,s') + \sum_{j=m}^{n-1}r_j (w_j/z(s,a)) 1\{ s = s_j, a = a_j, s' = s_j' \}} {z'(s,a)P'(s,a,s')/ z(s,a)} \\ &= \frac{r(s,a,s') P(s,a,s') + \sum_{j=m}^{n-1} r_j (w_j/z(s,a) 1\{ s = s_j, a = a_j, s' = s_j' \}} {P(s,a,s') + \sum_{j=m}^{n-1} (w_j/z(s,a)) 1\{ s = s_j, a = a_j, s' = s_j' \}} \end{align*}
The last line follows from the definition of \( P(s,a,s') \). This corresponds to the operation of Transition::add_sample repeatedly for \( j = m \ldots (n-1) \) with
\begin{align*} p &= (w_j/z(s,a)) 1\{ s = s_j, a = a_j, s' = s_j' \}\\ r &= r_j \end{align*}
.

Parameters

samples New sample set to add to transition probabilities and rewards

◆ get_initial()

Transition craam::msen::SampledMDP::get_initial ( ) const

inline

Returns: Initial distribution based on empirical sample data

◆ get_mdp()

shared_ptr<const MDP> craam::msen::SampledMDP::get_mdp ( ) const

inline

Returns: A constant pointer to the internal MDP

◆ get_mdp_mod()

shared_ptr<MDP> craam::msen::SampledMDP::get_mdp_mod ( )

inline

Returns: A modifiable pointer to the internal MDP. Take care when changing.

◆ get_state_action_weights()

vector<vector<prec_t> > craam::msen::SampledMDP::get_state_action_weights ( )

inline

Returns: State-action cumulative weights \( z \). See class description for details.

◆ state_count()

long craam::msen::SampledMDP::state_count ( )

inline

Returns thenumber of states in the samples (the highest observed index.

Some may be missing)

Returns: 0 when there are no samples

The documentation for this class was generated from the following file:

craam/Samples.hpp

Public Member Functions

Protected Attributes

Detailed Description

Member Function Documentation

◆ add_samples()

◆ get_initial()

◆ get_mdp()

◆ get_mdp_mod()

◆ get_state_action_weights()

◆ state_count()