World:
x : [0, 5]
y : [-5,5]
heading [0,360] with 45 degree step
State is a tuple (x y heading)

Possible actions:
forward (move one step in direction described by current heading)
left (rotate 45 deg. - increase heading)
right (decrease heading 45 deg.)

Problem: starting from (0 0 0) reach as fast as you can (4 4 *) where * is "dont care".

Standard Q-Learn (without any additional tricks)

Worse thing is that with wrong parameters QLearn can go quite unstable

My algorithm based on mean values

My algorithm based on mean values with tolerance 0.1

Most freqent solution was:
(((0 0 0) LEFT) ((0 0 45) FORWARD) ((1 1 45) FORWARD) ((2 2 45) FORWARD) ((3 3 45) FORWARD) ((4 4 45) NIL))
My algorithm based on mean values with tolerance 0.1 and learning discout 0.75

My algorithm based on mean values with tolerance 0.1 and learning discout 0.5


a, f ang g are steps-per-turn (how many actions were taken before goal was reached).