How do the loss weights work in Tensorflow?
I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.
loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
# t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
# e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
_loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
scope="sigmoid_cross_entropy",
loss_collection=tf.GraphKeys.LOSSES)
loss_sum += _loss
Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ...
and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.
weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.
I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], ...
. This also does not provide any improvement except being slightly worse than unweighted version.
Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?
python tensorflow machine-learning
add a comment |
I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.
loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
# t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
# e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
_loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
scope="sigmoid_cross_entropy",
loss_collection=tf.GraphKeys.LOSSES)
loss_sum += _loss
Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ...
and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.
weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.
I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], ...
. This also does not provide any improvement except being slightly worse than unweighted version.
Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?
python tensorflow machine-learning
Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.
– jdehesa
Nov 13 '18 at 14:38
add a comment |
I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.
loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
# t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
# e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
_loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
scope="sigmoid_cross_entropy",
loss_collection=tf.GraphKeys.LOSSES)
loss_sum += _loss
Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ...
and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.
weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.
I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], ...
. This also does not provide any improvement except being slightly worse than unweighted version.
Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?
python tensorflow machine-learning
I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.
loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
# t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
# e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
_loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
scope="sigmoid_cross_entropy",
loss_collection=tf.GraphKeys.LOSSES)
loss_sum += _loss
Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ...
and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.
weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.
I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], ...
. This also does not provide any improvement except being slightly worse than unweighted version.
Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?
python tensorflow machine-learning
python tensorflow machine-learning
edited Nov 13 '18 at 14:48
minerals
asked Nov 13 '18 at 14:18
mineralsminerals
1,85483358
1,85483358
Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.
– jdehesa
Nov 13 '18 at 14:38
add a comment |
Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.
– jdehesa
Nov 13 '18 at 14:38
Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.
– jdehesa
Nov 13 '18 at 14:38
Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.
– jdehesa
Nov 13 '18 at 14:38
add a comment |
1 Answer
1
active
oldest
votes
Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y
, where x
in the input vector, y
is the output vector and W
is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W
by minimizing the least-squared error
for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.
In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.
More generally you can do...
ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]
So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?
– minerals
Nov 13 '18 at 15:49
Looks backwards. I think you want [0.99, 0.01]
– bivouac0
Nov 13 '18 at 16:06
Even though I understand the intuition, setting weights for target classes as[0.99, 0.01]
made the overall model worse by 3% and I couldn't beat the unweighted system.
– minerals
Nov 14 '18 at 16:23
This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.
– bivouac0
Nov 14 '18 at 17:26
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283056%2fhow-do-the-loss-weights-work-in-tensorflow%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y
, where x
in the input vector, y
is the output vector and W
is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W
by minimizing the least-squared error
for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.
In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.
More generally you can do...
ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]
So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?
– minerals
Nov 13 '18 at 15:49
Looks backwards. I think you want [0.99, 0.01]
– bivouac0
Nov 13 '18 at 16:06
Even though I understand the intuition, setting weights for target classes as[0.99, 0.01]
made the overall model worse by 3% and I couldn't beat the unweighted system.
– minerals
Nov 14 '18 at 16:23
This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.
– bivouac0
Nov 14 '18 at 17:26
add a comment |
Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y
, where x
in the input vector, y
is the output vector and W
is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W
by minimizing the least-squared error
for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.
In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.
More generally you can do...
ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]
So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?
– minerals
Nov 13 '18 at 15:49
Looks backwards. I think you want [0.99, 0.01]
– bivouac0
Nov 13 '18 at 16:06
Even though I understand the intuition, setting weights for target classes as[0.99, 0.01]
made the overall model worse by 3% and I couldn't beat the unweighted system.
– minerals
Nov 14 '18 at 16:23
This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.
– bivouac0
Nov 14 '18 at 17:26
add a comment |
Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y
, where x
in the input vector, y
is the output vector and W
is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W
by minimizing the least-squared error
for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.
In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.
More generally you can do...
ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]
Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y
, where x
in the input vector, y
is the output vector and W
is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W
by minimizing the least-squared error
for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.
In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.
More generally you can do...
ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]
edited Nov 13 '18 at 15:30
answered Nov 13 '18 at 15:09
bivouac0bivouac0
1,218415
1,218415
So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?
– minerals
Nov 13 '18 at 15:49
Looks backwards. I think you want [0.99, 0.01]
– bivouac0
Nov 13 '18 at 16:06
Even though I understand the intuition, setting weights for target classes as[0.99, 0.01]
made the overall model worse by 3% and I couldn't beat the unweighted system.
– minerals
Nov 14 '18 at 16:23
This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.
– bivouac0
Nov 14 '18 at 17:26
add a comment |
So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?
– minerals
Nov 13 '18 at 15:49
Looks backwards. I think you want [0.99, 0.01]
– bivouac0
Nov 13 '18 at 16:06
Even though I understand the intuition, setting weights for target classes as[0.99, 0.01]
made the overall model worse by 3% and I couldn't beat the unweighted system.
– minerals
Nov 14 '18 at 16:23
This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.
– bivouac0
Nov 14 '18 at 17:26
So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?
– minerals
Nov 13 '18 at 15:49
So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?
– minerals
Nov 13 '18 at 15:49
Looks backwards. I think you want [0.99, 0.01]
– bivouac0
Nov 13 '18 at 16:06
Looks backwards. I think you want [0.99, 0.01]
– bivouac0
Nov 13 '18 at 16:06
Even though I understand the intuition, setting weights for target classes as
[0.99, 0.01]
made the overall model worse by 3% and I couldn't beat the unweighted system.– minerals
Nov 14 '18 at 16:23
Even though I understand the intuition, setting weights for target classes as
[0.99, 0.01]
made the overall model worse by 3% and I couldn't beat the unweighted system.– minerals
Nov 14 '18 at 16:23
This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.
– bivouac0
Nov 14 '18 at 17:26
This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.
– bivouac0
Nov 14 '18 at 17:26
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283056%2fhow-do-the-loss-weights-work-in-tensorflow%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.
– jdehesa
Nov 13 '18 at 14:38