- Install dependencies:
pip install .
to install all packages defined in setup.py - Run unit tests:
py.test -vv -s
- Execute an experiment with
./imrl/utils/cli.py --episodes 6
- Discrete, tabular, deterministic 3x3 gridworld:
./imrl/utils/cli.py --environment=gridworld --gridworld_size=3 --num_vi=5 --episodes=200
- This learns the policy through value iteration for 100 episodes and then executes on that policy
- Discrete, tabular, stochastic 3x3 gridworld:
./imrl/utils/cli.py --environment=gridworld --gridworld_size=3 --num_vi=5 --failure_rate=0.1 --episodes=200
- Discrete, tabular, stochastic 5x5 gridworld testing theta convergence on a single value iteration:
./imrl/utils/cli.py --environment=gridworld --gridworld_size=5 --num_vi=9 --vi_interval=400 --failure_rate=0.1 --episodes=600
- Continuous, tabular, deterministic gridworld:
./imrl/utils/cli.py --environment=gridworld_continuous --gridworld_size=3 --num_vi=5 --episodes=200 --results_interval=45
- Naming convention: Where variables are named that represent different time steps, such as a state
s
and next states'
, the first state will be namedstate
and the secondstate_prime