Measure of goodness of 2D data points for classification










0















Is there a good measure of how good my dataset is for the task of classification. The ideal scenario for classification is that points for each class should be clustered closer and each cluster of different classes should be far apart. Is there a way to measure this goodness.

A Good dataset would look like this
A Good dataset
And a bad dataset would look like this
enter image description here



Background:
I am trying to classify a few images from their 2048 dimensional embeddings generated by inception-v3 model. I tried visualizing the model by doing dimentionality reduction to 3 dimensions but it loses a lot of information. I am trying to figure out if the embeddings of my images with same class cluster together and different clusters are far apart.










share|improve this question

















  • 3





    As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

    – Robert Dodier
    Nov 14 '18 at 1:39











  • Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

    – Backalla
    Nov 14 '18 at 1:42











  • Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

    – Robert Dodier
    Nov 14 '18 at 1:50















0















Is there a good measure of how good my dataset is for the task of classification. The ideal scenario for classification is that points for each class should be clustered closer and each cluster of different classes should be far apart. Is there a way to measure this goodness.

A Good dataset would look like this
A Good dataset
And a bad dataset would look like this
enter image description here



Background:
I am trying to classify a few images from their 2048 dimensional embeddings generated by inception-v3 model. I tried visualizing the model by doing dimentionality reduction to 3 dimensions but it loses a lot of information. I am trying to figure out if the embeddings of my images with same class cluster together and different clusters are far apart.










share|improve this question

















  • 3





    As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

    – Robert Dodier
    Nov 14 '18 at 1:39











  • Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

    – Backalla
    Nov 14 '18 at 1:42











  • Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

    – Robert Dodier
    Nov 14 '18 at 1:50













0












0








0








Is there a good measure of how good my dataset is for the task of classification. The ideal scenario for classification is that points for each class should be clustered closer and each cluster of different classes should be far apart. Is there a way to measure this goodness.

A Good dataset would look like this
A Good dataset
And a bad dataset would look like this
enter image description here



Background:
I am trying to classify a few images from their 2048 dimensional embeddings generated by inception-v3 model. I tried visualizing the model by doing dimentionality reduction to 3 dimensions but it loses a lot of information. I am trying to figure out if the embeddings of my images with same class cluster together and different clusters are far apart.










share|improve this question














Is there a good measure of how good my dataset is for the task of classification. The ideal scenario for classification is that points for each class should be clustered closer and each cluster of different classes should be far apart. Is there a way to measure this goodness.

A Good dataset would look like this
A Good dataset
And a bad dataset would look like this
enter image description here



Background:
I am trying to classify a few images from their 2048 dimensional embeddings generated by inception-v3 model. I tried visualizing the model by doing dimentionality reduction to 3 dimensions but it loses a lot of information. I am trying to figure out if the embeddings of my images with same class cluster together and different clusters are far apart.







numpy math machine-learning statistics classification






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 14 '18 at 1:22









BackallaBackalla

618




618







  • 3





    As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

    – Robert Dodier
    Nov 14 '18 at 1:39











  • Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

    – Backalla
    Nov 14 '18 at 1:42











  • Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

    – Robert Dodier
    Nov 14 '18 at 1:50












  • 3





    As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

    – Robert Dodier
    Nov 14 '18 at 1:39











  • Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

    – Backalla
    Nov 14 '18 at 1:42











  • Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

    – Robert Dodier
    Nov 14 '18 at 1:50







3




3





As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

– Robert Dodier
Nov 14 '18 at 1:39





As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.

– Robert Dodier
Nov 14 '18 at 1:39













Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

– Backalla
Nov 14 '18 at 1:42





Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)

– Backalla
Nov 14 '18 at 1:42













Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

– Robert Dodier
Nov 14 '18 at 1:50





Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.

– Robert Dodier
Nov 14 '18 at 1:50












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291854%2fmeasure-of-goodness-of-2d-data-points-for-classification%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291854%2fmeasure-of-goodness-of-2d-data-points-for-classification%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

How to read a connectionString WITH PROVIDER in .NET Core?

In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

Museum of Modern and Contemporary Art of Trento and Rovereto