Validating Fuzzy Clustering

I would like to use fuzzy C-means clustering on a large unsupervided data set of 41 variables and 415 observations. However, I am stuck on trying to validate those clusters. When I plot with a random number of clusters, I can explain a total of 54% of the variance, which is not great and there are no really nice clusters as their would be with the iris database for example.

First I ran the fcm with my scales data on 3 clusters just to see, but if I am trying to find way to search for the optimal number of clusters, then I do not want to set an arbitrary defined number of clusters.

So I turned to google and googled: "valdiate fuzzy clustering in R." This link here was good, but I still have to try a bunch of different numbers of clusters. I looked at the advclust, ppclust, and clvalid packages but I could not find a walkthrough for the functions. I looked at the documentation of each package, but also could not discern what to do next.

I walked through some possible number of clusters and checked each one with the k.crisp object from fanny. I started with 100 and got down to 4. Based on object description in the documentation,

k.crisp=integer ( ≤ k ) giving the number of crisp clusters; can be less than
k , where it's recommended to decrease memb.exp.

it doesn't seem like a valid way because it is comparing the number of crisp clusters to our fuzzy clusters.

Is there a function where I can check the validity of my clusters from 2:10 clusters? Also, is it worth while to check the validity of 1 cluster? I think that is a stupid question, but I have a strange feeling 1 optimal cluster might be what I get. (Any tips on what to do if I were to get 1 cluster besides cry a little on the inside?)

Code

library(cluster)
library(factoextra)
library(ppclust)
library(advclust)
library(clValid)
data(iris)
df<-sapply(iris[-5],scale)
res.fanny<-fanny(df,3,metric='SqEuclidean')
res.fanny$k.crisp
# When I try to use euclidean, I get the warning all memberships are very close to 1/l. Maybe increase memb.exp, which I don't fully understand
# From my understanding using the SqEuclidean is equivalent to Fuzzy C-means, use the website below. Ultimately I do want to use C-means, hence I use the SqEuclidean distance
fviz_cluster(Res.fanny,ellipse.type='norm',palette='jco',ggtheme=theme_minimal(),legend='right')
fviz_silhouette(res.fanny,palette='jco',ggtheme=theme_minimal())

# With ppclust
set.seed(123)
res.fcm<-fcm(df,centers=3,nstart=10)

website as mentioned above.

edited Nov 13 '18 at 7:16

asked Nov 12 '18 at 23:09

Jack Armstrong

318519

add a comment |

I walked through some possible number of clusters and checked each one with the k.crisp object from fanny. I started with 100 and got down to 4. Based on object description in the documentation,

k.crisp=integer ( ≤ k ) giving the number of crisp clusters; can be less than
k , where it's recommended to decrease memb.exp.

it doesn't seem like a valid way because it is comparing the number of crisp clusters to our fuzzy clusters.

Code

library(cluster)
library(factoextra)
library(ppclust)
library(advclust)
library(clValid)
data(iris)
df<-sapply(iris[-5],scale)
res.fanny<-fanny(df,3,metric='SqEuclidean')
res.fanny$k.crisp
# When I try to use euclidean, I get the warning all memberships are very close to 1/l. Maybe increase memb.exp, which I don't fully understand
# From my understanding using the SqEuclidean is equivalent to Fuzzy C-means, use the website below. Ultimately I do want to use C-means, hence I use the SqEuclidean distance
fviz_cluster(Res.fanny,ellipse.type='norm',palette='jco',ggtheme=theme_minimal(),legend='right')
fviz_silhouette(res.fanny,palette='jco',ggtheme=theme_minimal())

# With ppclust
set.seed(123)
res.fcm<-fcm(df,centers=3,nstart=10)

website as mentioned above.

edited Nov 13 '18 at 7:16

asked Nov 12 '18 at 23:09

Jack Armstrong

318519

add a comment |

I walked through some possible number of clusters and checked each one with the k.crisp object from fanny. I started with 100 and got down to 4. Based on object description in the documentation,

k.crisp=integer ( ≤ k ) giving the number of crisp clusters; can be less than
k , where it's recommended to decrease memb.exp.

it doesn't seem like a valid way because it is comparing the number of crisp clusters to our fuzzy clusters.

Code

library(cluster)
library(factoextra)
library(ppclust)
library(advclust)
library(clValid)
data(iris)
df<-sapply(iris[-5],scale)
res.fanny<-fanny(df,3,metric='SqEuclidean')
res.fanny$k.crisp
# When I try to use euclidean, I get the warning all memberships are very close to 1/l. Maybe increase memb.exp, which I don't fully understand
# From my understanding using the SqEuclidean is equivalent to Fuzzy C-means, use the website below. Ultimately I do want to use C-means, hence I use the SqEuclidean distance
fviz_cluster(Res.fanny,ellipse.type='norm',palette='jco',ggtheme=theme_minimal(),legend='right')
fviz_silhouette(res.fanny,palette='jco',ggtheme=theme_minimal())

# With ppclust
set.seed(123)
res.fcm<-fcm(df,centers=3,nstart=10)

website as mentioned above.

edited Nov 13 '18 at 7:16

asked Nov 12 '18 at 23:09

Jack Armstrong

318519

I walked through some possible number of clusters and checked each one with the k.crisp object from fanny. I started with 100 and got down to 4. Based on object description in the documentation,

k.crisp=integer ( ≤ k ) giving the number of crisp clusters; can be less than
k , where it's recommended to decrease memb.exp.

it doesn't seem like a valid way because it is comparing the number of crisp clusters to our fuzzy clusters.

Code

library(cluster)
library(factoextra)
library(ppclust)
library(advclust)
library(clValid)
data(iris)
df<-sapply(iris[-5],scale)
res.fanny<-fanny(df,3,metric='SqEuclidean')
res.fanny$k.crisp
# When I try to use euclidean, I get the warning all memberships are very close to 1/l. Maybe increase memb.exp, which I don't fully understand
# From my understanding using the SqEuclidean is equivalent to Fuzzy C-means, use the website below. Ultimately I do want to use C-means, hence I use the SqEuclidean distance
fviz_cluster(Res.fanny,ellipse.type='norm',palette='jco',ggtheme=theme_minimal(),legend='right')
fviz_silhouette(res.fanny,palette='jco',ggtheme=theme_minimal())

# With ppclust
set.seed(123)
res.fcm<-fcm(df,centers=3,nstart=10)

website as mentioned above.

r validation cluster-analysis

edited Nov 13 '18 at 7:16

asked Nov 12 '18 at 23:09

Jack Armstrong

318519

edited Nov 13 '18 at 7:16

asked Nov 12 '18 at 23:09

Jack Armstrong

318519

edited Nov 13 '18 at 7:16

asked Nov 12 '18 at 23:09

Jack Armstrong

318519

asked Nov 12 '18 at 23:09

Jack Armstrong

318519

asked Nov 12 '18 at 23:09

Jack Armstrong

318519

add a comment |

1 Answer
1

active

oldest

votes

As far as I know, you need to go through different number of clusters and see how the percentage of variance explained is changing with different number of clusters. This method is called elbow method.

wss <- sapply(2:10, 
 function(k)fcm(df,centers=k,nstart=10)$sumsqrs$tot.within.ss)

plot(2:10, wss,
 type="b", pch = 19, frame = FALSE, 
 xlab="Number of clusters K",
 ylab="Total within-clusters sum of squares")

The resulting plot is

wss-number of clusters

After k = 5, total within cluster sum of squares tend to change slowly. So, k = 5 is a good candidate for being optimal number of clusters according to elbow method.

answered Nov 13 '18 at 9:16

boyaronur

16219

I am looking more for a formal method. But also isn't that using K-means clustering?
– Jack Armstrong
Nov 13 '18 at 10:19

1

The objective is similar so I think that we can use this method. Please check this paper, researchgate.net/publication/… They are using k = 1 as null hypothesis and use some kind of measure and look for an "elbow" on a graph.
– boyaronur
Nov 13 '18 at 10:53

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53271424%2fvalidating-fuzzy-clustering%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

wss <- sapply(2:10, 
 function(k)fcm(df,centers=k,nstart=10)$sumsqrs$tot.within.ss)

plot(2:10, wss,
 type="b", pch = 19, frame = FALSE, 
 xlab="Number of clusters K",
 ylab="Total within-clusters sum of squares")

The resulting plot is

wss-number of clusters

After k = 5, total within cluster sum of squares tend to change slowly. So, k = 5 is a good candidate for being optimal number of clusters according to elbow method.

answered Nov 13 '18 at 9:16

boyaronur

16219

I am looking more for a formal method. But also isn't that using K-means clustering?
– Jack Armstrong
Nov 13 '18 at 10:19

1

The objective is similar so I think that we can use this method. Please check this paper, researchgate.net/publication/… They are using k = 1 as null hypothesis and use some kind of measure and look for an "elbow" on a graph.
– boyaronur
Nov 13 '18 at 10:53

add a comment |

wss <- sapply(2:10, 
 function(k)fcm(df,centers=k,nstart=10)$sumsqrs$tot.within.ss)

plot(2:10, wss,
 type="b", pch = 19, frame = FALSE, 
 xlab="Number of clusters K",
 ylab="Total within-clusters sum of squares")

The resulting plot is

wss-number of clusters

After k = 5, total within cluster sum of squares tend to change slowly. So, k = 5 is a good candidate for being optimal number of clusters according to elbow method.

answered Nov 13 '18 at 9:16

boyaronur

16219

I am looking more for a formal method. But also isn't that using K-means clustering?
– Jack Armstrong
Nov 13 '18 at 10:19

1

The objective is similar so I think that we can use this method. Please check this paper, researchgate.net/publication/… They are using k = 1 as null hypothesis and use some kind of measure and look for an "elbow" on a graph.
– boyaronur
Nov 13 '18 at 10:53

add a comment |

wss <- sapply(2:10, 
 function(k)fcm(df,centers=k,nstart=10)$sumsqrs$tot.within.ss)

plot(2:10, wss,
 type="b", pch = 19, frame = FALSE, 
 xlab="Number of clusters K",
 ylab="Total within-clusters sum of squares")

The resulting plot is

wss-number of clusters

After k = 5, total within cluster sum of squares tend to change slowly. So, k = 5 is a good candidate for being optimal number of clusters according to elbow method.

answered Nov 13 '18 at 9:16

boyaronur

16219

wss <- sapply(2:10, 
 function(k)fcm(df,centers=k,nstart=10)$sumsqrs$tot.within.ss)

plot(2:10, wss,
 type="b", pch = 19, frame = FALSE, 
 xlab="Number of clusters K",
 ylab="Total within-clusters sum of squares")

The resulting plot is

wss-number of clusters

After k = 5, total within cluster sum of squares tend to change slowly. So, k = 5 is a good candidate for being optimal number of clusters according to elbow method.

answered Nov 13 '18 at 9:16

boyaronur

16219

answered Nov 13 '18 at 9:16

boyaronur

16219

answered Nov 13 '18 at 9:16

boyaronur

16219

answered Nov 13 '18 at 9:16

boyaronur

16219

I am looking more for a formal method. But also isn't that using K-means clustering?
– Jack Armstrong
Nov 13 '18 at 10:19

1

The objective is similar so I think that we can use this method. Please check this paper, researchgate.net/publication/… They are using k = 1 as null hypothesis and use some kind of measure and look for an "elbow" on a graph.
– boyaronur
Nov 13 '18 at 10:53

add a comment |

I am looking more for a formal method. But also isn't that using K-means clustering?
– Jack Armstrong
Nov 13 '18 at 10:19

1

The objective is similar so I think that we can use this method. Please check this paper, researchgate.net/publication/… They are using k = 1 as null hypothesis and use some kind of measure and look for an "elbow" on a graph.
– boyaronur
Nov 13 '18 at 10:53

I am looking more for a formal method. But also isn't that using K-means clustering?
– Jack Armstrong
Nov 13 '18 at 10:19

The objective is similar so I think that we can use this method. Please check this paper, researchgate.net/publication/… They are using k = 1 as null hypothesis and use some kind of measure and look for an "elbow" on a graph.
– boyaronur
Nov 13 '18 at 10:53

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

mO7Ibf4L5Zkz,P,Y,COzdsJ 1uO0hY2cKnQxpSW jKcjWlfS0A8dQPj1jeJ3RbNxO2BwTXQBrsL

搜尋此網誌

Odtnhj