Cluster analysis algorithm for identifying line clusters on a map









up vote
-3
down vote

favorite












I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.




I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here










share|improve this question



















  • 1




    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
    – Mitchel Paulin
    Nov 10 at 18:01














up vote
-3
down vote

favorite












I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.




I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here










share|improve this question



















  • 1




    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
    – Mitchel Paulin
    Nov 10 at 18:01












up vote
-3
down vote

favorite









up vote
-3
down vote

favorite











I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.




I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here










share|improve this question















I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.




I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here







python algorithm machine-learning scikit-learn deep-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 20:30

























asked Nov 10 at 17:56









Ruan

35418




35418







  • 1




    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
    – Mitchel Paulin
    Nov 10 at 18:01












  • 1




    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
    – Mitchel Paulin
    Nov 10 at 18:01







1




1




I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01




I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?
– Mitchel Paulin
Nov 10 at 18:01












1 Answer
1






active

oldest

votes

















up vote
1
down vote













I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241844%2fcluster-analysis-algorithm-for-identifying-line-clusters-on-a-map%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






    share|improve this answer
























      up vote
      1
      down vote













      I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






      share|improve this answer






















        up vote
        1
        down vote










        up vote
        1
        down vote









        I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






        share|improve this answer












        I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 18:30









        mcdowella

        17.4k21120




        17.4k21120



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241844%2fcluster-analysis-algorithm-for-identifying-line-clusters-on-a-map%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            How to read a connectionString WITH PROVIDER in .NET Core?

            In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

            Museum of Modern and Contemporary Art of Trento and Rovereto