This environment is part of the Toy Text environments. Please read that page first for general information.
This is a simple implementation of the Gridworld Cliff reinforcement learning task.
Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction by Sutton and Barto.
With inspiration from: https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py
The board is a 4x12 matrix, with (using NumPy matrix indexing):
[3, 0] as the start at bottom-left
[3, 11] as the goal at bottom-right
[3, 1..10] as the cliff at bottom-center
If the agent steps on the cliff it returns to the start. An episode terminates when the agent reaches the goal.
There are 4 discrete deterministic actions:
0: move up
1: move right
2: move down
3: move left
There are 3x12 + 1 possible states. In fact, the agent cannot be at the cliff, nor at the goal (as this results the end of episode). They remain all the positions of the first 3 rows plus the bottom-left cell. The observation is simply the current position encoded as flattened index.
Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward.
v0: Initial version release