Cluster analysis algorithm for identifying line clusters on a map
up vote
-3
down vote
favorite
I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:
Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.
According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.
I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.
I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.
python algorithm machine-learning scikit-learn deep-learning
add a comment |
up vote
-3
down vote
favorite
I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:
Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.
According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.
I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.
I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.
python algorithm machine-learning scikit-learn deep-learning
1
I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01
add a comment |
up vote
-3
down vote
favorite
up vote
-3
down vote
favorite
I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:
Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.
According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.
I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.
I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.
python algorithm machine-learning scikit-learn deep-learning
I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:
Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.
According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.
I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.
I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.
python algorithm machine-learning scikit-learn deep-learning
python algorithm machine-learning scikit-learn deep-learning
edited Nov 10 at 20:30
asked Nov 10 at 17:56
Ruan
35418
35418
1
I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01
add a comment |
1
I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01
1
1
I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01
I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).
add a comment |
up vote
1
down vote
I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).
add a comment |
up vote
1
down vote
up vote
1
down vote
I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).
I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).
answered Nov 10 at 18:30
mcdowella
17.4k21120
17.4k21120
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241844%2fcluster-analysis-algorithm-for-identifying-line-clusters-on-a-map%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01