How do the loss weights work in Tensorflow?

I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.

loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
 # t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
 # e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
 _loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
 scope="sigmoid_cross_entropy",
 loss_collection=tf.GraphKeys.LOSSES)
 loss_sum += _loss

Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ... and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.

weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.

I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], .... This also does not provide any improvement except being slightly worse than unweighted version.

Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?

edited Nov 13 '18 at 14:48

asked Nov 13 '18 at 14:18

minerals

1,85483358

Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

– jdehesa
Nov 13 '18 at 14:38

add a comment |

loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
 # t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
 # e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
 _loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
 scope="sigmoid_cross_entropy",
 loss_collection=tf.GraphKeys.LOSSES)
 loss_sum += _loss

weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.

Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?

edited Nov 13 '18 at 14:48

asked Nov 13 '18 at 14:18

minerals

1,85483358

Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

– jdehesa
Nov 13 '18 at 14:38

add a comment |

loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
 # t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
 # e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
 _loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
 scope="sigmoid_cross_entropy",
 loss_collection=tf.GraphKeys.LOSSES)
 loss_sum += _loss

weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.

Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?

edited Nov 13 '18 at 14:48

asked Nov 13 '18 at 14:18

minerals

1,85483358

loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
 # t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
 # e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
 _loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
 scope="sigmoid_cross_entropy",
 loss_collection=tf.GraphKeys.LOSSES)
 loss_sum += _loss

weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.

Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?

python tensorflow machine-learning

edited Nov 13 '18 at 14:48

asked Nov 13 '18 at 14:18

minerals

1,85483358

edited Nov 13 '18 at 14:48

asked Nov 13 '18 at 14:18

minerals

1,85483358

edited Nov 13 '18 at 14:48

asked Nov 13 '18 at 14:18

minerals

1,85483358

asked Nov 13 '18 at 14:18

minerals

1,85483358

asked Nov 13 '18 at 14:18

minerals

1,85483358

Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

– jdehesa
Nov 13 '18 at 14:38

add a comment |

Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

– jdehesa
Nov 13 '18 at 14:38

Maybe your weights are not big enough. Depending on the case, a more significant difference between the weights may be necessary. Try something exaggerated (like 1000 for the underrepresented class and 1 for the rest) and see if that actually biases the model.

– jdehesa
Nov 13 '18 at 14:38

add a comment |

1 Answer
1

active

oldest

votes

Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y, where x in the input vector, y is the output vector and W is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W by minimizing the least-squared error for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.

In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.

More generally you can do...

ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]

edited Nov 13 '18 at 15:30

answered Nov 13 '18 at 15:09

bivouac0

1,218415

So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

– minerals
Nov 13 '18 at 15:49

Looks backwards. I think you want [0.99, 0.01]

– bivouac0
Nov 13 '18 at 16:06

Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

– minerals
Nov 14 '18 at 16:23

This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

– bivouac0
Nov 14 '18 at 17:26

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283056%2fhow-do-the-loss-weights-work-in-tensorflow%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

More generally you can do...

ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]

edited Nov 13 '18 at 15:30

answered Nov 13 '18 at 15:09

bivouac0

1,218415

So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

– minerals
Nov 13 '18 at 15:49

Looks backwards. I think you want [0.99, 0.01]

– bivouac0
Nov 13 '18 at 16:06

Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

– minerals
Nov 14 '18 at 16:23

This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

– bivouac0
Nov 14 '18 at 17:26

add a comment |

More generally you can do...

ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]

edited Nov 13 '18 at 15:30

answered Nov 13 '18 at 15:09

bivouac0

1,218415

So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

– minerals
Nov 13 '18 at 15:49

Looks backwards. I think you want [0.99, 0.01]

– bivouac0
Nov 13 '18 at 16:06

Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

– minerals
Nov 14 '18 at 16:23

This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

– bivouac0
Nov 14 '18 at 17:26

add a comment |

More generally you can do...

ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]

edited Nov 13 '18 at 15:30

answered Nov 13 '18 at 15:09

bivouac0

1,218415

More generally you can do...

ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]

edited Nov 13 '18 at 15:30

answered Nov 13 '18 at 15:09

bivouac0

1,218415

edited Nov 13 '18 at 15:30

answered Nov 13 '18 at 15:09

bivouac0

1,218415

answered Nov 13 '18 at 15:09

bivouac0

1,218415

answered Nov 13 '18 at 15:09

bivouac0

1,218415

So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

– minerals
Nov 13 '18 at 15:49

Looks backwards. I think you want [0.99, 0.01]

– bivouac0
Nov 13 '18 at 16:06

Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

– minerals
Nov 14 '18 at 16:23

This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

– bivouac0
Nov 14 '18 at 17:26

add a comment |

So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

– minerals
Nov 13 '18 at 15:49

Looks backwards. I think you want [0.99, 0.01]

– bivouac0
Nov 13 '18 at 16:06

Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

– minerals
Nov 14 '18 at 16:23

This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

– bivouac0
Nov 14 '18 at 17:26

So to make it clear, class B=6,000,000 and class A=61,000, 61000/(6000000+61000) and weights = [0.01,0.99]?

– minerals
Nov 13 '18 at 15:49

Looks backwards. I think you want [0.99, 0.01]

– bivouac0
Nov 13 '18 at 16:06

Even though I understand the intuition, setting weights for target classes as [0.99, 0.01] made the overall model worse by 3% and I couldn't beat the unweighted system.

– minerals
Nov 14 '18 at 16:23

This method "should" be equivalent to training with N extra copies of the class A samples. You could try making about 100x copies of those samples so that there was an equivalent amount of class A and B data. If that gives you the about the same results then I think you've verified that balancing the data isn't going to help.

– bivouac0
Nov 14 '18 at 17:26

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

GaDBbzyhEGh1,MkubpZNt 6pTqKCU,a9G7P9Q7T5,h cs,cJ7veJozerH1Z TCOm7LEXPu1 WteAeY8SiBDElK7x7E5f

搜尋此網誌

Odtnhj