Do you want BuboFlash to help you learning these things? Or do you want to add or correct something? Click here to log in or create user.

#reinforcement-learning

As mentioned in the introduction, in this paper we are interested in the multitask RL scenario, where the agent has to solve multiple tasks. Each task is defined by a reward function R w ; thus, instead of a single MDP M , our environment is a set of MDPs that share the same structure except for the reward function. Following Barreto et al. (2017), we assume that the expected one-step reward associated with transition \(s \stackrel{a}{\rightarrow} s^{\prime}\) is given by \(\mathrm{E}\left[R_{\mathbf{w}}\left(s, a, s^{\prime}\right)\right]=r_{\mathbf{w}}\left(s, a, s^{\prime}\right)=\phi\left(s, a, s^{\prime}\right)^{\top} \mathbf{w}\), where \(\phi\left(s, a, s^{\prime}\right) \in \mathbb{R}^{d}\) are features of (s, a, s') and \(\mathbf{w} \in \mathbb{R}^{d}\) are weights.

If you want to change selection, open document below and click on "Move attachment"

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | suggested re-reading day | |||

started reading on | finished reading on |

Do you want to join discussion? Click here to log in or create user.