Measure of goodness of 2D data points for classification
Is there a good measure of how good my dataset is for the task of classification. The ideal scenario for classification is that points for each class should be clustered closer and each cluster of different classes should be far apart. Is there a way to measure this goodness.
A Good dataset would look like this
And a bad dataset would look like this
Background:
I am trying to classify a few images from their 2048 dimensional embeddings generated by inception-v3 model. I tried visualizing the model by doing dimentionality reduction to 3 dimensions but it loses a lot of information. I am trying to figure out if the embeddings of my images with same class cluster together and different clusters are far apart.
numpy math machine-learning statistics classification
add a comment |
Is there a good measure of how good my dataset is for the task of classification. The ideal scenario for classification is that points for each class should be clustered closer and each cluster of different classes should be far apart. Is there a way to measure this goodness.
A Good dataset would look like this
And a bad dataset would look like this
Background:
I am trying to classify a few images from their 2048 dimensional embeddings generated by inception-v3 model. I tried visualizing the model by doing dimentionality reduction to 3 dimensions but it loses a lot of information. I am trying to figure out if the embeddings of my images with same class cluster together and different clusters are far apart.
numpy math machine-learning statistics classification
3
As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.
– Robert Dodier
Nov 14 '18 at 1:39
Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)
– Backalla
Nov 14 '18 at 1:42
Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.
– Robert Dodier
Nov 14 '18 at 1:50
add a comment |
Is there a good measure of how good my dataset is for the task of classification. The ideal scenario for classification is that points for each class should be clustered closer and each cluster of different classes should be far apart. Is there a way to measure this goodness.
A Good dataset would look like this
And a bad dataset would look like this
Background:
I am trying to classify a few images from their 2048 dimensional embeddings generated by inception-v3 model. I tried visualizing the model by doing dimentionality reduction to 3 dimensions but it loses a lot of information. I am trying to figure out if the embeddings of my images with same class cluster together and different clusters are far apart.
numpy math machine-learning statistics classification
Is there a good measure of how good my dataset is for the task of classification. The ideal scenario for classification is that points for each class should be clustered closer and each cluster of different classes should be far apart. Is there a way to measure this goodness.
A Good dataset would look like this
And a bad dataset would look like this
Background:
I am trying to classify a few images from their 2048 dimensional embeddings generated by inception-v3 model. I tried visualizing the model by doing dimentionality reduction to 3 dimensions but it loses a lot of information. I am trying to figure out if the embeddings of my images with same class cluster together and different clusters are far apart.
numpy math machine-learning statistics classification
numpy math machine-learning statistics classification
asked Nov 14 '18 at 1:22
BackallaBackalla
618
618
3
As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.
– Robert Dodier
Nov 14 '18 at 1:39
Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)
– Backalla
Nov 14 '18 at 1:42
Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.
– Robert Dodier
Nov 14 '18 at 1:50
add a comment |
3
As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.
– Robert Dodier
Nov 14 '18 at 1:39
Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)
– Backalla
Nov 14 '18 at 1:42
Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.
– Robert Dodier
Nov 14 '18 at 1:50
3
3
As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.
– Robert Dodier
Nov 14 '18 at 1:39
As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.
– Robert Dodier
Nov 14 '18 at 1:39
Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)
– Backalla
Nov 14 '18 at 1:42
Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)
– Backalla
Nov 14 '18 at 1:42
Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.
– Robert Dodier
Nov 14 '18 at 1:50
Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.
– Robert Dodier
Nov 14 '18 at 1:50
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291854%2fmeasure-of-goodness-of-2d-data-points-for-classification%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291854%2fmeasure-of-goodness-of-2d-data-points-for-classification%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
As far as I know, there isn't any conclusive answer, but people try things like k nearest neighbors with a small number of neighbors, e.g. k = 3 to 10 or something like that. k-NN with small k gives a lower bound (i.e. optimistic) on classification error. So that can give you sense of what's the best you can hope for. Hope that helps, btw this question is really more suitable for stats.stackexchange.com.
– Robert Dodier
Nov 14 '18 at 1:39
Thanks @RobertDodier. I want to do this analysis before I start doing classification. I just want to judge my embedding generation process. I will try asking this in stats. :)
– Backalla
Nov 14 '18 at 1:42
Right -- this is something people do before they get underway with their method of choice. In your case, this will give an indication of how useful are the dimensions you've chosen. Good luck and have fun.
– Robert Dodier
Nov 14 '18 at 1:50