Measure of goodness of 2D data points for classification

Is there a good measure of how good my dataset is for the task of classification. The ideal scenario for classification is that points for each class should be clustered closer and each cluster of different classes should be far apart. Is there a way to measure this goodness.

A Good dataset would look like this

And a bad dataset would look like this
enter image description here

Background:
I am trying to classify a few images from their 2048 dimensional embeddings generated by inception-v3 model. I tried visualizing the model by doing dimentionality reduction to 3 dimensions but it loses a lot of information. I am trying to figure out if the embeddings of my images with same class cluster together and different clusters are far apart.

asked Nov 14 '18 at 1:22

Backalla

618

3

As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

– Robert Dodier
Nov 14 '18 at 1:39

Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

– Backalla
Nov 14 '18 at 1:42

Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

– Robert Dodier
Nov 14 '18 at 1:50

add a comment |

asked Nov 14 '18 at 1:22

Backalla

618

3

As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

– Robert Dodier
Nov 14 '18 at 1:39

Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

– Backalla
Nov 14 '18 at 1:42

Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

– Robert Dodier
Nov 14 '18 at 1:50

add a comment |

asked Nov 14 '18 at 1:22

Backalla

618

numpy math machine-learning statistics classification

asked Nov 14 '18 at 1:22

Backalla

618

asked Nov 14 '18 at 1:22

Backalla

618

asked Nov 14 '18 at 1:22

Backalla

618

asked Nov 14 '18 at 1:22

Backalla

618

asked Nov 14 '18 at 1:22

Backalla

618

3

As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

– Robert Dodier
Nov 14 '18 at 1:39

Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

– Backalla
Nov 14 '18 at 1:42

Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

– Robert Dodier
Nov 14 '18 at 1:50

add a comment |

3

As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

– Robert Dodier
Nov 14 '18 at 1:39

Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

– Backalla
Nov 14 '18 at 1:42

Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

– Robert Dodier
Nov 14 '18 at 1:50

As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

– Robert Dodier
Nov 14 '18 at 1:39

Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

– Backalla
Nov 14 '18 at 1:42

Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

– Robert Dodier
Nov 14 '18 at 1:50

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291854%2fmeasure-of-goodness-of-2d-data-points-for-classification%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

JWsPD YOwH O1sxALvF6xnUND QmiVHnr8n,OnyGdU 2AcYUhgllpi uJA2ZnUC h,gj3X8k,uNfmgPGvFx,iCXb3GZ V

搜尋此網誌

Odtnhj