OpenAI gym cartpole-v0 understanding observation and action relationship
I'm interested in modelling a system that can use openai gym to make a model that not only performs well but hopefully even better yet continuously improves to converge on the best moves.
This is how I initialize the env
import gym
env = gym.make("CartPole-v0")
env.reset()
it returns a set of info; observation, reward, done and info, info always nothing so ignore that.
reward I'd hope would signify whether the action taken is good or bad but it always returns a reward of 1 until the game ends, it's more of a counter of how long you've been playing.
The action can be sampled by
action = env.action_space.sample()
which in this case is either 1 or 0.
To put into perspective for anyone who doesn't know what this game is, here's the link and it's objective is to balance a pole by moving left or right i.e. provide an input of 0 or 1.
The observation is the only key way to tell whether you're making a good or bad move.
obs, reward, done, info = env.step(action)
and the observation looks something like this
array([-0.02861881, 0.02662095, -0.01234258, 0.03900408])
as I said before reward is always 1 so not a good pointer of good or bad move based on the observation and done means the game has come to an end though I also can't tell if it means you lost or won also.
Since the objective as you'll see from the link to the page is to balance the pole for a total reward of +195 averaged over 100 games that's the determining guide of a successful game, not sure then if you've successfully then balanced it completely or just lasted long but still, I've followed a few examples and suggestion to generate a lot of random games and those that do rank well use them to train a model.
But this way feels sketchy and not inherently aware of what a failing move is i.e. when you're about to tip the pole more than 15 degrees or the cart moves 2.4 units from the center.
I've been able to gather data from running the simulation for over 200000 times and using this also found I've got a good number of games that lasted for more than 80 steps. (the goal is 195) so using this I graphed these games (< ipython notebook) there's a number of graphs and since I'm graphing each observation individually per game it's too many graphs to put here just to hopefully then maybe see a link between a final observation and the game ending since these are randomly sampled actions so it's random moves.
What I thought I saw was maybe for the first observation that if it gets to 0 the game ends but I've also seen some others where the game runs with negative values. I can't make sense of the data even with graphing basically.
What I really would like to know is if possible what each value in the observation means and also if 0 means left or right but the later would be easier to deduce when I can understand the first.
python openai-gym
add a comment |
I'm interested in modelling a system that can use openai gym to make a model that not only performs well but hopefully even better yet continuously improves to converge on the best moves.
This is how I initialize the env
import gym
env = gym.make("CartPole-v0")
env.reset()
it returns a set of info; observation, reward, done and info, info always nothing so ignore that.
reward I'd hope would signify whether the action taken is good or bad but it always returns a reward of 1 until the game ends, it's more of a counter of how long you've been playing.
The action can be sampled by
action = env.action_space.sample()
which in this case is either 1 or 0.
To put into perspective for anyone who doesn't know what this game is, here's the link and it's objective is to balance a pole by moving left or right i.e. provide an input of 0 or 1.
The observation is the only key way to tell whether you're making a good or bad move.
obs, reward, done, info = env.step(action)
and the observation looks something like this
array([-0.02861881, 0.02662095, -0.01234258, 0.03900408])
as I said before reward is always 1 so not a good pointer of good or bad move based on the observation and done means the game has come to an end though I also can't tell if it means you lost or won also.
Since the objective as you'll see from the link to the page is to balance the pole for a total reward of +195 averaged over 100 games that's the determining guide of a successful game, not sure then if you've successfully then balanced it completely or just lasted long but still, I've followed a few examples and suggestion to generate a lot of random games and those that do rank well use them to train a model.
But this way feels sketchy and not inherently aware of what a failing move is i.e. when you're about to tip the pole more than 15 degrees or the cart moves 2.4 units from the center.
I've been able to gather data from running the simulation for over 200000 times and using this also found I've got a good number of games that lasted for more than 80 steps. (the goal is 195) so using this I graphed these games (< ipython notebook) there's a number of graphs and since I'm graphing each observation individually per game it's too many graphs to put here just to hopefully then maybe see a link between a final observation and the game ending since these are randomly sampled actions so it's random moves.
What I thought I saw was maybe for the first observation that if it gets to 0 the game ends but I've also seen some others where the game runs with negative values. I can't make sense of the data even with graphing basically.
What I really would like to know is if possible what each value in the observation means and also if 0 means left or right but the later would be easier to deduce when I can understand the first.
python openai-gym
add a comment |
I'm interested in modelling a system that can use openai gym to make a model that not only performs well but hopefully even better yet continuously improves to converge on the best moves.
This is how I initialize the env
import gym
env = gym.make("CartPole-v0")
env.reset()
it returns a set of info; observation, reward, done and info, info always nothing so ignore that.
reward I'd hope would signify whether the action taken is good or bad but it always returns a reward of 1 until the game ends, it's more of a counter of how long you've been playing.
The action can be sampled by
action = env.action_space.sample()
which in this case is either 1 or 0.
To put into perspective for anyone who doesn't know what this game is, here's the link and it's objective is to balance a pole by moving left or right i.e. provide an input of 0 or 1.
The observation is the only key way to tell whether you're making a good or bad move.
obs, reward, done, info = env.step(action)
and the observation looks something like this
array([-0.02861881, 0.02662095, -0.01234258, 0.03900408])
as I said before reward is always 1 so not a good pointer of good or bad move based on the observation and done means the game has come to an end though I also can't tell if it means you lost or won also.
Since the objective as you'll see from the link to the page is to balance the pole for a total reward of +195 averaged over 100 games that's the determining guide of a successful game, not sure then if you've successfully then balanced it completely or just lasted long but still, I've followed a few examples and suggestion to generate a lot of random games and those that do rank well use them to train a model.
But this way feels sketchy and not inherently aware of what a failing move is i.e. when you're about to tip the pole more than 15 degrees or the cart moves 2.4 units from the center.
I've been able to gather data from running the simulation for over 200000 times and using this also found I've got a good number of games that lasted for more than 80 steps. (the goal is 195) so using this I graphed these games (< ipython notebook) there's a number of graphs and since I'm graphing each observation individually per game it's too many graphs to put here just to hopefully then maybe see a link between a final observation and the game ending since these are randomly sampled actions so it's random moves.
What I thought I saw was maybe for the first observation that if it gets to 0 the game ends but I've also seen some others where the game runs with negative values. I can't make sense of the data even with graphing basically.
What I really would like to know is if possible what each value in the observation means and also if 0 means left or right but the later would be easier to deduce when I can understand the first.
python openai-gym
I'm interested in modelling a system that can use openai gym to make a model that not only performs well but hopefully even better yet continuously improves to converge on the best moves.
This is how I initialize the env
import gym
env = gym.make("CartPole-v0")
env.reset()
it returns a set of info; observation, reward, done and info, info always nothing so ignore that.
reward I'd hope would signify whether the action taken is good or bad but it always returns a reward of 1 until the game ends, it's more of a counter of how long you've been playing.
The action can be sampled by
action = env.action_space.sample()
which in this case is either 1 or 0.
To put into perspective for anyone who doesn't know what this game is, here's the link and it's objective is to balance a pole by moving left or right i.e. provide an input of 0 or 1.
The observation is the only key way to tell whether you're making a good or bad move.
obs, reward, done, info = env.step(action)
and the observation looks something like this
array([-0.02861881, 0.02662095, -0.01234258, 0.03900408])
as I said before reward is always 1 so not a good pointer of good or bad move based on the observation and done means the game has come to an end though I also can't tell if it means you lost or won also.
Since the objective as you'll see from the link to the page is to balance the pole for a total reward of +195 averaged over 100 games that's the determining guide of a successful game, not sure then if you've successfully then balanced it completely or just lasted long but still, I've followed a few examples and suggestion to generate a lot of random games and those that do rank well use them to train a model.
But this way feels sketchy and not inherently aware of what a failing move is i.e. when you're about to tip the pole more than 15 degrees or the cart moves 2.4 units from the center.
I've been able to gather data from running the simulation for over 200000 times and using this also found I've got a good number of games that lasted for more than 80 steps. (the goal is 195) so using this I graphed these games (< ipython notebook) there's a number of graphs and since I'm graphing each observation individually per game it's too many graphs to put here just to hopefully then maybe see a link between a final observation and the game ending since these are randomly sampled actions so it's random moves.
What I thought I saw was maybe for the first observation that if it gets to 0 the game ends but I've also seen some others where the game runs with negative values. I can't make sense of the data even with graphing basically.
What I really would like to know is if possible what each value in the observation means and also if 0 means left or right but the later would be easier to deduce when I can understand the first.
python openai-gym
python openai-gym
asked Aug 28 '18 at 15:08
Samuel M.Samuel M.
544628
544628
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
It seems you asked this question quite some time ago. However, the answer is that the observation is given by the position of the cart, the angle of the pole and their derivatives. The position in the middle is 0. So the negative is left and positive is right.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52061122%2fopenai-gym-cartpole-v0-understanding-observation-and-action-relationship%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It seems you asked this question quite some time ago. However, the answer is that the observation is given by the position of the cart, the angle of the pole and their derivatives. The position in the middle is 0. So the negative is left and positive is right.
add a comment |
It seems you asked this question quite some time ago. However, the answer is that the observation is given by the position of the cart, the angle of the pole and their derivatives. The position in the middle is 0. So the negative is left and positive is right.
add a comment |
It seems you asked this question quite some time ago. However, the answer is that the observation is given by the position of the cart, the angle of the pole and their derivatives. The position in the middle is 0. So the negative is left and positive is right.
It seems you asked this question quite some time ago. However, the answer is that the observation is given by the position of the cart, the angle of the pole and their derivatives. The position in the middle is 0. So the negative is left and positive is right.
answered Nov 14 '18 at 21:10
shunyoshunyo
738629
738629
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52061122%2fopenai-gym-cartpole-v0-understanding-observation-and-action-relationship%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown