MySQL: Efficient way computing set powers of Venn-Diagram









up vote
3
down vote

favorite












Given the 4 tables, each containing items and representing one set, how to get the count of the items in each compartment required to draw a Venn diagram as shown below. The calculation should take place in the MySQL server avoiding transmission of items to the application server.



Example tables:



s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |


Now, I think I would calculate some set powers. Some examples with I corresponding to s1, II to s2, III to s3 and IV to s4:



quadruple Venn Diagram - Venn diagram made from 4 sets using ellipses



If I reinterpret sx as being a set, I would write:




  1. |s1 ∩ s2 ∩ s3 ∩ s4| - the white 25 in the center


  2. |(s1 ∩ s2 ∩ s4) s3| - the white 15 below right in relation to the center


  3. |(s1 ∩ s4) (s2 ∪ s3)| - the white 5 on the bottom


  4. |s1 (s2 ∪ s3 ∪ s4)| - the dark blue 60 on the blue ground

  5. ... till 15.

How to calculate those powers efficiently on the MySQL server? Does MySQL provide a function aiding in the calculation?



A naive approach would be running a query for 1.



SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);


and another query for 2.



SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s4
EXCEPT
SELECT item FROM s3);


and so on, resulting in 15 queries.










share|improve this question























  • If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
    – Rainer Rillke
    Nov 10 at 0:41







  • 1




    There is no INTERSECT and EXCEPT in MySQL. So, you could use other RDBMS, which provides these features.
    – Madhur Bhaiya
    Nov 10 at 9:26










  • @MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
    – Rainer Rillke
    Nov 10 at 11:04










  • Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
    – Rainer Rillke
    Nov 11 at 14:40














up vote
3
down vote

favorite












Given the 4 tables, each containing items and representing one set, how to get the count of the items in each compartment required to draw a Venn diagram as shown below. The calculation should take place in the MySQL server avoiding transmission of items to the application server.



Example tables:



s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |


Now, I think I would calculate some set powers. Some examples with I corresponding to s1, II to s2, III to s3 and IV to s4:



quadruple Venn Diagram - Venn diagram made from 4 sets using ellipses



If I reinterpret sx as being a set, I would write:




  1. |s1 ∩ s2 ∩ s3 ∩ s4| - the white 25 in the center


  2. |(s1 ∩ s2 ∩ s4) s3| - the white 15 below right in relation to the center


  3. |(s1 ∩ s4) (s2 ∪ s3)| - the white 5 on the bottom


  4. |s1 (s2 ∪ s3 ∪ s4)| - the dark blue 60 on the blue ground

  5. ... till 15.

How to calculate those powers efficiently on the MySQL server? Does MySQL provide a function aiding in the calculation?



A naive approach would be running a query for 1.



SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);


and another query for 2.



SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s4
EXCEPT
SELECT item FROM s3);


and so on, resulting in 15 queries.










share|improve this question























  • If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
    – Rainer Rillke
    Nov 10 at 0:41







  • 1




    There is no INTERSECT and EXCEPT in MySQL. So, you could use other RDBMS, which provides these features.
    – Madhur Bhaiya
    Nov 10 at 9:26










  • @MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
    – Rainer Rillke
    Nov 10 at 11:04










  • Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
    – Rainer Rillke
    Nov 11 at 14:40












up vote
3
down vote

favorite









up vote
3
down vote

favorite











Given the 4 tables, each containing items and representing one set, how to get the count of the items in each compartment required to draw a Venn diagram as shown below. The calculation should take place in the MySQL server avoiding transmission of items to the application server.



Example tables:



s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |


Now, I think I would calculate some set powers. Some examples with I corresponding to s1, II to s2, III to s3 and IV to s4:



quadruple Venn Diagram - Venn diagram made from 4 sets using ellipses



If I reinterpret sx as being a set, I would write:




  1. |s1 ∩ s2 ∩ s3 ∩ s4| - the white 25 in the center


  2. |(s1 ∩ s2 ∩ s4) s3| - the white 15 below right in relation to the center


  3. |(s1 ∩ s4) (s2 ∪ s3)| - the white 5 on the bottom


  4. |s1 (s2 ∪ s3 ∪ s4)| - the dark blue 60 on the blue ground

  5. ... till 15.

How to calculate those powers efficiently on the MySQL server? Does MySQL provide a function aiding in the calculation?



A naive approach would be running a query for 1.



SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);


and another query for 2.



SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s4
EXCEPT
SELECT item FROM s3);


and so on, resulting in 15 queries.










share|improve this question















Given the 4 tables, each containing items and representing one set, how to get the count of the items in each compartment required to draw a Venn diagram as shown below. The calculation should take place in the MySQL server avoiding transmission of items to the application server.



Example tables:



s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |


Now, I think I would calculate some set powers. Some examples with I corresponding to s1, II to s2, III to s3 and IV to s4:



quadruple Venn Diagram - Venn diagram made from 4 sets using ellipses



If I reinterpret sx as being a set, I would write:




  1. |s1 ∩ s2 ∩ s3 ∩ s4| - the white 25 in the center


  2. |(s1 ∩ s2 ∩ s4) s3| - the white 15 below right in relation to the center


  3. |(s1 ∩ s4) (s2 ∪ s3)| - the white 5 on the bottom


  4. |s1 (s2 ∪ s3 ∪ s4)| - the dark blue 60 on the blue ground

  5. ... till 15.

How to calculate those powers efficiently on the MySQL server? Does MySQL provide a function aiding in the calculation?



A naive approach would be running a query for 1.



SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);


and another query for 2.



SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s4
EXCEPT
SELECT item FROM s3);


and so on, resulting in 15 queries.







mysql venn-diagram set-intersection






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 12:53

























asked Nov 10 at 0:27









Rainer Rillke

957819




957819











  • If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
    – Rainer Rillke
    Nov 10 at 0:41







  • 1




    There is no INTERSECT and EXCEPT in MySQL. So, you could use other RDBMS, which provides these features.
    – Madhur Bhaiya
    Nov 10 at 9:26










  • @MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
    – Rainer Rillke
    Nov 10 at 11:04










  • Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
    – Rainer Rillke
    Nov 11 at 14:40
















  • If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
    – Rainer Rillke
    Nov 10 at 0:41







  • 1




    There is no INTERSECT and EXCEPT in MySQL. So, you could use other RDBMS, which provides these features.
    – Madhur Bhaiya
    Nov 10 at 9:26










  • @MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
    – Rainer Rillke
    Nov 10 at 11:04










  • Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
    – Rainer Rillke
    Nov 11 at 14:40















If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
– Rainer Rillke
Nov 10 at 0:41





If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
– Rainer Rillke
Nov 10 at 0:41





1




1




There is no INTERSECT and EXCEPT in MySQL. So, you could use other RDBMS, which provides these features.
– Madhur Bhaiya
Nov 10 at 9:26




There is no INTERSECT and EXCEPT in MySQL. So, you could use other RDBMS, which provides these features.
– Madhur Bhaiya
Nov 10 at 9:26












@MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
– Rainer Rillke
Nov 10 at 11:04




@MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
– Rainer Rillke
Nov 10 at 11:04












Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
– Rainer Rillke
Nov 11 at 14:40




Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
– Rainer Rillke
Nov 11 at 14:40












3 Answers
3






active

oldest

votes

















up vote
1
down vote



accepted










Try something like this:



with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region


Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT' for the ANSI string concatenation to work in MySQL, or use the concat function instead. The WITH syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.



If the sets are very large you might want to index the item column before querying in case the SQL optimizer won't figure it out by itself.






share|improve this answer





























    up vote
    0
    down vote













    Following procedure:



    1. Created a stored procedure, which creates temporary in-memory tables containing the sets.

    2. Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.

    3. As noted, MySQL does not have an INTERSECT or EXCEPT. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified.

    4. Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.

    What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11






    share|improve this answer



























      up vote
      0
      down vote













      The question is a little complex so the answers are. Let me explain K.T.'s answer



      with universe as (
      select * from s1
      union
      select * from s2
      union
      select * from s3
      union
      select * from s4
      ),
      regions as (
      select
      case when s1.item is null then '0' else '1' end
      ||
      case when s2.item is null then '0' else '1' end
      ||
      case when s3.item is null then '0' else '1' end
      ||
      case when s4.item is null then '0' else '1' end as Region
      from universe u
      left join s1 on u.item = s1.item
      left join s2 on u.item = s2.item
      left join s3 on u.item = s3.item
      left join s4 on u.item = s4.item
      )
      select Region, count(*) from regions group by Region


      The universe results in the UNION of all tables (duplicates eliminated), something like



      +------+
      | item |
      +------+
      | a |
      +------+
      | b |
      +------+
      | c |
      +------+
      | d |
      +------+
      | e |
      +------+
      | ... |
      +------+


      Then, s1, s2, s3 and s4 are joined



      +------+---------+---------+---------+---------+
      | item | s1.item | s2.item | s3.item | s4.item |
      +------+---------+---------+---------+---------+
      | a | a | a | a | a |
      +------+---------+---------+---------+---------+
      | b | b | b | b | NULL |
      +------+---------+---------+---------+---------+
      | c | c | c | NULL | c |
      +------+---------+---------+---------+---------+
      | d | d | NULL | d | d |
      +------+---------+---------+---------+---------+
      | e | NULL | e | e | e |
      +------+---------+---------+---------+---------+
      | ... | ... | ... | ... | ... |
      +------+---------+---------+---------+---------+


      and converted to a binary string (0: if cell is NULL; 1: else) called Region where the first digit corresponds to s1, the second to s2 and so on



      +------+--------+
      | item | Region |
      +------+--------+
      | a | 1111 |
      +------+--------+
      | b | 1110 |
      +------+--------+
      | c | 1101 |
      +------+--------+
      | d | 1011 |
      +------+--------+
      | e | 0111 |
      +------+--------+
      | ... | ... |
      +------+--------+


      and finally aggregated and grouped by Region



      +--------+-------+
      | Region | count |
      +--------+-------+
      | 1111 | 1 |
      +--------+-------+
      | 1110 | 1 |
      +--------+-------+
      | 1101 | 1 |
      +--------+-------+
      | 1011 | 1 |
      +--------+-------+
      | 0111 | 1 |
      +--------+-------+
      | ... | |
      +--------+-------+


      Note that regions having 0 set elements in them do not show up in the results and 0000 never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.



      4-set venn diagram with regions in binary representation






      share|improve this answer






















        Your Answer






        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53234943%2fmysql-efficient-way-computing-set-powers-of-venn-diagram%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        1
        down vote



        accepted










        Try something like this:



        with universe as (
        select * from s1
        union
        select * from s2
        union
        select * from s3
        union
        select * from s4
        ),
        regions as (
        select
        case when s1.item is null then '0' else '1' end
        ||
        case when s2.item is null then '0' else '1' end
        ||
        case when s3.item is null then '0' else '1' end
        ||
        case when s4.item is null then '0' else '1' end as Region
        from universe u
        left join s1 on u.item = s1.item
        left join s2 on u.item = s2.item
        left join s3 on u.item = s3.item
        left join s4 on u.item = s4.item
        )
        select Region, count(*) from regions group by Region


        Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT' for the ANSI string concatenation to work in MySQL, or use the concat function instead. The WITH syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.



        If the sets are very large you might want to index the item column before querying in case the SQL optimizer won't figure it out by itself.






        share|improve this answer


























          up vote
          1
          down vote



          accepted










          Try something like this:



          with universe as (
          select * from s1
          union
          select * from s2
          union
          select * from s3
          union
          select * from s4
          ),
          regions as (
          select
          case when s1.item is null then '0' else '1' end
          ||
          case when s2.item is null then '0' else '1' end
          ||
          case when s3.item is null then '0' else '1' end
          ||
          case when s4.item is null then '0' else '1' end as Region
          from universe u
          left join s1 on u.item = s1.item
          left join s2 on u.item = s2.item
          left join s3 on u.item = s3.item
          left join s4 on u.item = s4.item
          )
          select Region, count(*) from regions group by Region


          Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT' for the ANSI string concatenation to work in MySQL, or use the concat function instead. The WITH syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.



          If the sets are very large you might want to index the item column before querying in case the SQL optimizer won't figure it out by itself.






          share|improve this answer
























            up vote
            1
            down vote



            accepted







            up vote
            1
            down vote



            accepted






            Try something like this:



            with universe as (
            select * from s1
            union
            select * from s2
            union
            select * from s3
            union
            select * from s4
            ),
            regions as (
            select
            case when s1.item is null then '0' else '1' end
            ||
            case when s2.item is null then '0' else '1' end
            ||
            case when s3.item is null then '0' else '1' end
            ||
            case when s4.item is null then '0' else '1' end as Region
            from universe u
            left join s1 on u.item = s1.item
            left join s2 on u.item = s2.item
            left join s3 on u.item = s3.item
            left join s4 on u.item = s4.item
            )
            select Region, count(*) from regions group by Region


            Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT' for the ANSI string concatenation to work in MySQL, or use the concat function instead. The WITH syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.



            If the sets are very large you might want to index the item column before querying in case the SQL optimizer won't figure it out by itself.






            share|improve this answer














            Try something like this:



            with universe as (
            select * from s1
            union
            select * from s2
            union
            select * from s3
            union
            select * from s4
            ),
            regions as (
            select
            case when s1.item is null then '0' else '1' end
            ||
            case when s2.item is null then '0' else '1' end
            ||
            case when s3.item is null then '0' else '1' end
            ||
            case when s4.item is null then '0' else '1' end as Region
            from universe u
            left join s1 on u.item = s1.item
            left join s2 on u.item = s2.item
            left join s3 on u.item = s3.item
            left join s4 on u.item = s4.item
            )
            select Region, count(*) from regions group by Region


            Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT' for the ANSI string concatenation to work in MySQL, or use the concat function instead. The WITH syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.



            If the sets are very large you might want to index the item column before querying in case the SQL optimizer won't figure it out by itself.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 17 at 14:24

























            answered Nov 16 at 23:07









            KT.

            5,18222455




            5,18222455






















                up vote
                0
                down vote













                Following procedure:



                1. Created a stored procedure, which creates temporary in-memory tables containing the sets.

                2. Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.

                3. As noted, MySQL does not have an INTERSECT or EXCEPT. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified.

                4. Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.

                What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11






                share|improve this answer
























                  up vote
                  0
                  down vote













                  Following procedure:



                  1. Created a stored procedure, which creates temporary in-memory tables containing the sets.

                  2. Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.

                  3. As noted, MySQL does not have an INTERSECT or EXCEPT. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified.

                  4. Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.

                  What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11






                  share|improve this answer






















                    up vote
                    0
                    down vote










                    up vote
                    0
                    down vote









                    Following procedure:



                    1. Created a stored procedure, which creates temporary in-memory tables containing the sets.

                    2. Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.

                    3. As noted, MySQL does not have an INTERSECT or EXCEPT. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified.

                    4. Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.

                    What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11






                    share|improve this answer












                    Following procedure:



                    1. Created a stored procedure, which creates temporary in-memory tables containing the sets.

                    2. Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.

                    3. As noted, MySQL does not have an INTERSECT or EXCEPT. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified.

                    4. Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.

                    What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 14 at 10:26









                    Rainer Rillke

                    957819




                    957819




















                        up vote
                        0
                        down vote













                        The question is a little complex so the answers are. Let me explain K.T.'s answer



                        with universe as (
                        select * from s1
                        union
                        select * from s2
                        union
                        select * from s3
                        union
                        select * from s4
                        ),
                        regions as (
                        select
                        case when s1.item is null then '0' else '1' end
                        ||
                        case when s2.item is null then '0' else '1' end
                        ||
                        case when s3.item is null then '0' else '1' end
                        ||
                        case when s4.item is null then '0' else '1' end as Region
                        from universe u
                        left join s1 on u.item = s1.item
                        left join s2 on u.item = s2.item
                        left join s3 on u.item = s3.item
                        left join s4 on u.item = s4.item
                        )
                        select Region, count(*) from regions group by Region


                        The universe results in the UNION of all tables (duplicates eliminated), something like



                        +------+
                        | item |
                        +------+
                        | a |
                        +------+
                        | b |
                        +------+
                        | c |
                        +------+
                        | d |
                        +------+
                        | e |
                        +------+
                        | ... |
                        +------+


                        Then, s1, s2, s3 and s4 are joined



                        +------+---------+---------+---------+---------+
                        | item | s1.item | s2.item | s3.item | s4.item |
                        +------+---------+---------+---------+---------+
                        | a | a | a | a | a |
                        +------+---------+---------+---------+---------+
                        | b | b | b | b | NULL |
                        +------+---------+---------+---------+---------+
                        | c | c | c | NULL | c |
                        +------+---------+---------+---------+---------+
                        | d | d | NULL | d | d |
                        +------+---------+---------+---------+---------+
                        | e | NULL | e | e | e |
                        +------+---------+---------+---------+---------+
                        | ... | ... | ... | ... | ... |
                        +------+---------+---------+---------+---------+


                        and converted to a binary string (0: if cell is NULL; 1: else) called Region where the first digit corresponds to s1, the second to s2 and so on



                        +------+--------+
                        | item | Region |
                        +------+--------+
                        | a | 1111 |
                        +------+--------+
                        | b | 1110 |
                        +------+--------+
                        | c | 1101 |
                        +------+--------+
                        | d | 1011 |
                        +------+--------+
                        | e | 0111 |
                        +------+--------+
                        | ... | ... |
                        +------+--------+


                        and finally aggregated and grouped by Region



                        +--------+-------+
                        | Region | count |
                        +--------+-------+
                        | 1111 | 1 |
                        +--------+-------+
                        | 1110 | 1 |
                        +--------+-------+
                        | 1101 | 1 |
                        +--------+-------+
                        | 1011 | 1 |
                        +--------+-------+
                        | 0111 | 1 |
                        +--------+-------+
                        | ... | |
                        +--------+-------+


                        Note that regions having 0 set elements in them do not show up in the results and 0000 never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.



                        4-set venn diagram with regions in binary representation






                        share|improve this answer


























                          up vote
                          0
                          down vote













                          The question is a little complex so the answers are. Let me explain K.T.'s answer



                          with universe as (
                          select * from s1
                          union
                          select * from s2
                          union
                          select * from s3
                          union
                          select * from s4
                          ),
                          regions as (
                          select
                          case when s1.item is null then '0' else '1' end
                          ||
                          case when s2.item is null then '0' else '1' end
                          ||
                          case when s3.item is null then '0' else '1' end
                          ||
                          case when s4.item is null then '0' else '1' end as Region
                          from universe u
                          left join s1 on u.item = s1.item
                          left join s2 on u.item = s2.item
                          left join s3 on u.item = s3.item
                          left join s4 on u.item = s4.item
                          )
                          select Region, count(*) from regions group by Region


                          The universe results in the UNION of all tables (duplicates eliminated), something like



                          +------+
                          | item |
                          +------+
                          | a |
                          +------+
                          | b |
                          +------+
                          | c |
                          +------+
                          | d |
                          +------+
                          | e |
                          +------+
                          | ... |
                          +------+


                          Then, s1, s2, s3 and s4 are joined



                          +------+---------+---------+---------+---------+
                          | item | s1.item | s2.item | s3.item | s4.item |
                          +------+---------+---------+---------+---------+
                          | a | a | a | a | a |
                          +------+---------+---------+---------+---------+
                          | b | b | b | b | NULL |
                          +------+---------+---------+---------+---------+
                          | c | c | c | NULL | c |
                          +------+---------+---------+---------+---------+
                          | d | d | NULL | d | d |
                          +------+---------+---------+---------+---------+
                          | e | NULL | e | e | e |
                          +------+---------+---------+---------+---------+
                          | ... | ... | ... | ... | ... |
                          +------+---------+---------+---------+---------+


                          and converted to a binary string (0: if cell is NULL; 1: else) called Region where the first digit corresponds to s1, the second to s2 and so on



                          +------+--------+
                          | item | Region |
                          +------+--------+
                          | a | 1111 |
                          +------+--------+
                          | b | 1110 |
                          +------+--------+
                          | c | 1101 |
                          +------+--------+
                          | d | 1011 |
                          +------+--------+
                          | e | 0111 |
                          +------+--------+
                          | ... | ... |
                          +------+--------+


                          and finally aggregated and grouped by Region



                          +--------+-------+
                          | Region | count |
                          +--------+-------+
                          | 1111 | 1 |
                          +--------+-------+
                          | 1110 | 1 |
                          +--------+-------+
                          | 1101 | 1 |
                          +--------+-------+
                          | 1011 | 1 |
                          +--------+-------+
                          | 0111 | 1 |
                          +--------+-------+
                          | ... | |
                          +--------+-------+


                          Note that regions having 0 set elements in them do not show up in the results and 0000 never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.



                          4-set venn diagram with regions in binary representation






                          share|improve this answer
























                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            The question is a little complex so the answers are. Let me explain K.T.'s answer



                            with universe as (
                            select * from s1
                            union
                            select * from s2
                            union
                            select * from s3
                            union
                            select * from s4
                            ),
                            regions as (
                            select
                            case when s1.item is null then '0' else '1' end
                            ||
                            case when s2.item is null then '0' else '1' end
                            ||
                            case when s3.item is null then '0' else '1' end
                            ||
                            case when s4.item is null then '0' else '1' end as Region
                            from universe u
                            left join s1 on u.item = s1.item
                            left join s2 on u.item = s2.item
                            left join s3 on u.item = s3.item
                            left join s4 on u.item = s4.item
                            )
                            select Region, count(*) from regions group by Region


                            The universe results in the UNION of all tables (duplicates eliminated), something like



                            +------+
                            | item |
                            +------+
                            | a |
                            +------+
                            | b |
                            +------+
                            | c |
                            +------+
                            | d |
                            +------+
                            | e |
                            +------+
                            | ... |
                            +------+


                            Then, s1, s2, s3 and s4 are joined



                            +------+---------+---------+---------+---------+
                            | item | s1.item | s2.item | s3.item | s4.item |
                            +------+---------+---------+---------+---------+
                            | a | a | a | a | a |
                            +------+---------+---------+---------+---------+
                            | b | b | b | b | NULL |
                            +------+---------+---------+---------+---------+
                            | c | c | c | NULL | c |
                            +------+---------+---------+---------+---------+
                            | d | d | NULL | d | d |
                            +------+---------+---------+---------+---------+
                            | e | NULL | e | e | e |
                            +------+---------+---------+---------+---------+
                            | ... | ... | ... | ... | ... |
                            +------+---------+---------+---------+---------+


                            and converted to a binary string (0: if cell is NULL; 1: else) called Region where the first digit corresponds to s1, the second to s2 and so on



                            +------+--------+
                            | item | Region |
                            +------+--------+
                            | a | 1111 |
                            +------+--------+
                            | b | 1110 |
                            +------+--------+
                            | c | 1101 |
                            +------+--------+
                            | d | 1011 |
                            +------+--------+
                            | e | 0111 |
                            +------+--------+
                            | ... | ... |
                            +------+--------+


                            and finally aggregated and grouped by Region



                            +--------+-------+
                            | Region | count |
                            +--------+-------+
                            | 1111 | 1 |
                            +--------+-------+
                            | 1110 | 1 |
                            +--------+-------+
                            | 1101 | 1 |
                            +--------+-------+
                            | 1011 | 1 |
                            +--------+-------+
                            | 0111 | 1 |
                            +--------+-------+
                            | ... | |
                            +--------+-------+


                            Note that regions having 0 set elements in them do not show up in the results and 0000 never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.



                            4-set venn diagram with regions in binary representation






                            share|improve this answer














                            The question is a little complex so the answers are. Let me explain K.T.'s answer



                            with universe as (
                            select * from s1
                            union
                            select * from s2
                            union
                            select * from s3
                            union
                            select * from s4
                            ),
                            regions as (
                            select
                            case when s1.item is null then '0' else '1' end
                            ||
                            case when s2.item is null then '0' else '1' end
                            ||
                            case when s3.item is null then '0' else '1' end
                            ||
                            case when s4.item is null then '0' else '1' end as Region
                            from universe u
                            left join s1 on u.item = s1.item
                            left join s2 on u.item = s2.item
                            left join s3 on u.item = s3.item
                            left join s4 on u.item = s4.item
                            )
                            select Region, count(*) from regions group by Region


                            The universe results in the UNION of all tables (duplicates eliminated), something like



                            +------+
                            | item |
                            +------+
                            | a |
                            +------+
                            | b |
                            +------+
                            | c |
                            +------+
                            | d |
                            +------+
                            | e |
                            +------+
                            | ... |
                            +------+


                            Then, s1, s2, s3 and s4 are joined



                            +------+---------+---------+---------+---------+
                            | item | s1.item | s2.item | s3.item | s4.item |
                            +------+---------+---------+---------+---------+
                            | a | a | a | a | a |
                            +------+---------+---------+---------+---------+
                            | b | b | b | b | NULL |
                            +------+---------+---------+---------+---------+
                            | c | c | c | NULL | c |
                            +------+---------+---------+---------+---------+
                            | d | d | NULL | d | d |
                            +------+---------+---------+---------+---------+
                            | e | NULL | e | e | e |
                            +------+---------+---------+---------+---------+
                            | ... | ... | ... | ... | ... |
                            +------+---------+---------+---------+---------+


                            and converted to a binary string (0: if cell is NULL; 1: else) called Region where the first digit corresponds to s1, the second to s2 and so on



                            +------+--------+
                            | item | Region |
                            +------+--------+
                            | a | 1111 |
                            +------+--------+
                            | b | 1110 |
                            +------+--------+
                            | c | 1101 |
                            +------+--------+
                            | d | 1011 |
                            +------+--------+
                            | e | 0111 |
                            +------+--------+
                            | ... | ... |
                            +------+--------+


                            and finally aggregated and grouped by Region



                            +--------+-------+
                            | Region | count |
                            +--------+-------+
                            | 1111 | 1 |
                            +--------+-------+
                            | 1110 | 1 |
                            +--------+-------+
                            | 1101 | 1 |
                            +--------+-------+
                            | 1011 | 1 |
                            +--------+-------+
                            | 0111 | 1 |
                            +--------+-------+
                            | ... | |
                            +--------+-------+


                            Note that regions having 0 set elements in them do not show up in the results and 0000 never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.



                            4-set venn diagram with regions in binary representation







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 25 at 20:36

























                            answered Nov 25 at 20:26









                            Rainer Rillke

                            957819




                            957819



























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.





                                Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                Please pay close attention to the following guidance:


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53234943%2fmysql-efficient-way-computing-set-powers-of-venn-diagram%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                這個網誌中的熱門文章

                                How to read a connectionString WITH PROVIDER in .NET Core?

                                In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

                                Museum of Modern and Contemporary Art of Trento and Rovereto