OpenAI gym cartpole-v0 understanding observation and action relationship

I'm interested in modelling a system that can use openai gym to make a model that not only performs well but hopefully even better yet continuously improves to converge on the best moves.
This is how I initialize the env

import gym
env = gym.make("CartPole-v0")
env.reset()

it returns a set of info; observation, reward, done and info, info always nothing so ignore that.

reward I'd hope would signify whether the action taken is good or bad but it always returns a reward of 1 until the game ends, it's more of a counter of how long you've been playing.

The action can be sampled by

action = env.action_space.sample()

which in this case is either 1 or 0.
To put into perspective for anyone who doesn't know what this game is, here's the link and it's objective is to balance a pole by moving left or right i.e. provide an input of 0 or 1.

The observation is the only key way to tell whether you're making a good or bad move.

obs, reward, done, info = env.step(action)

and the observation looks something like this

array([-0.02861881, 0.02662095, -0.01234258, 0.03900408])

as I said before reward is always 1 so not a good pointer of good or bad move based on the observation and done means the game has come to an end though I also can't tell if it means you lost or won also.

Since the objective as you'll see from the link to the page is to balance the pole for a total reward of +195 averaged over 100 games that's the determining guide of a successful game, not sure then if you've successfully then balanced it completely or just lasted long but still, I've followed a few examples and suggestion to generate a lot of random games and those that do rank well use them to train a model.

But this way feels sketchy and not inherently aware of what a failing move is i.e. when you're about to tip the pole more than 15 degrees or the cart moves 2.4 units from the center.

I've been able to gather data from running the simulation for over 200000 times and using this also found I've got a good number of games that lasted for more than 80 steps. (the goal is 195) so using this I graphed these games (< ipython notebook) there's a number of graphs and since I'm graphing each observation individually per game it's too many graphs to put here just to hopefully then maybe see a link between a final observation and the game ending since these are randomly sampled actions so it's random moves.

What I thought I saw was maybe for the first observation that if it gets to 0 the game ends but I've also seen some others where the game runs with negative values. I can't make sense of the data even with graphing basically.

What I really would like to know is if possible what each value in the observation means and also if 0 means left or right but the later would be easier to deduce when I can understand the first.

asked Aug 28 '18 at 15:08

Samuel M.

544628

add a comment |

import gym
env = gym.make("CartPole-v0")
env.reset()

it returns a set of info; observation, reward, done and info, info always nothing so ignore that.

reward I'd hope would signify whether the action taken is good or bad but it always returns a reward of 1 until the game ends, it's more of a counter of how long you've been playing.

The action can be sampled by

action = env.action_space.sample()

The observation is the only key way to tell whether you're making a good or bad move.

obs, reward, done, info = env.step(action)

and the observation looks something like this

array([-0.02861881, 0.02662095, -0.01234258, 0.03900408])

But this way feels sketchy and not inherently aware of what a failing move is i.e. when you're about to tip the pole more than 15 degrees or the cart moves 2.4 units from the center.

What I really would like to know is if possible what each value in the observation means and also if 0 means left or right but the later would be easier to deduce when I can understand the first.

asked Aug 28 '18 at 15:08

Samuel M.

544628

add a comment |

import gym
env = gym.make("CartPole-v0")
env.reset()

it returns a set of info; observation, reward, done and info, info always nothing so ignore that.

reward I'd hope would signify whether the action taken is good or bad but it always returns a reward of 1 until the game ends, it's more of a counter of how long you've been playing.

The action can be sampled by

action = env.action_space.sample()

The observation is the only key way to tell whether you're making a good or bad move.

obs, reward, done, info = env.step(action)

and the observation looks something like this

array([-0.02861881, 0.02662095, -0.01234258, 0.03900408])

But this way feels sketchy and not inherently aware of what a failing move is i.e. when you're about to tip the pole more than 15 degrees or the cart moves 2.4 units from the center.

What I really would like to know is if possible what each value in the observation means and also if 0 means left or right but the later would be easier to deduce when I can understand the first.

asked Aug 28 '18 at 15:08

Samuel M.

544628

import gym
env = gym.make("CartPole-v0")
env.reset()

it returns a set of info; observation, reward, done and info, info always nothing so ignore that.

reward I'd hope would signify whether the action taken is good or bad but it always returns a reward of 1 until the game ends, it's more of a counter of how long you've been playing.

The action can be sampled by

action = env.action_space.sample()

The observation is the only key way to tell whether you're making a good or bad move.

obs, reward, done, info = env.step(action)

and the observation looks something like this

array([-0.02861881, 0.02662095, -0.01234258, 0.03900408])

But this way feels sketchy and not inherently aware of what a failing move is i.e. when you're about to tip the pole more than 15 degrees or the cart moves 2.4 units from the center.

What I really would like to know is if possible what each value in the observation means and also if 0 means left or right but the later would be easier to deduce when I can understand the first.

python openai-gym

asked Aug 28 '18 at 15:08

Samuel M.

544628

asked Aug 28 '18 at 15:08

Samuel M.

544628

asked Aug 28 '18 at 15:08

Samuel M.

544628

asked Aug 28 '18 at 15:08

Samuel M.

544628

asked Aug 28 '18 at 15:08

Samuel M.

544628

add a comment |

1 Answer
1

active

oldest

votes

It seems you asked this question quite some time ago. However, the answer is that the observation is given by the position of the cart, the angle of the pole and their derivatives. The position in the middle is 0. So the negative is left and positive is right.

answered Nov 14 '18 at 21:10

shunyo

738629

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52061122%2fopenai-gym-cartpole-v0-understanding-observation-and-action-relationship%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Nov 14 '18 at 21:10

shunyo

738629

add a comment |

answered Nov 14 '18 at 21:10

shunyo

738629

add a comment |

answered Nov 14 '18 at 21:10

shunyo

738629

answered Nov 14 '18 at 21:10

shunyo

738629

answered Nov 14 '18 at 21:10

shunyo

738629

answered Nov 14 '18 at 21:10

shunyo

738629

answered Nov 14 '18 at 21:10

shunyo

738629

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

JqNE3RYufm7GQrtRC2yldae8dKJ1,1KmBk,M9j

搜尋此網誌

Odtnhj