MySQL: Efficient way computing set powers of Venn-Diagram
up vote
3
down vote
favorite
Given the 4 tables, each containing items and representing one set, how to get the count of the items in each compartment required to draw a Venn diagram as shown below. The calculation should take place in the MySQL server avoiding transmission of items to the application server.
Example tables:
s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |
Now, I think I would calculate some set powers. Some examples with I
corresponding to s1
, II
to s2
, III
to s3
and IV
to s4
:
If I reinterpret sx
as being a set, I would write:
|s1 ∩ s2 ∩ s3 ∩ s4|
- the white 25 in the center|(s1 ∩ s2 ∩ s4) s3|
- the white 15 below right in relation to the center|(s1 ∩ s4) (s2 ∪ s3)|
- the white 5 on the bottom|s1 (s2 ∪ s3 ∪ s4)|
- the dark blue 60 on the blue ground- ... till 15.
How to calculate those powers efficiently on the MySQL server? Does MySQL provide a function aiding in the calculation?
A naive approach would be running a query for 1.
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);
and another query for 2.
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s4
EXCEPT
SELECT item FROM s3);
and so on, resulting in 15 queries.
mysql venn-diagram set-intersection
add a comment |
up vote
3
down vote
favorite
Given the 4 tables, each containing items and representing one set, how to get the count of the items in each compartment required to draw a Venn diagram as shown below. The calculation should take place in the MySQL server avoiding transmission of items to the application server.
Example tables:
s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |
Now, I think I would calculate some set powers. Some examples with I
corresponding to s1
, II
to s2
, III
to s3
and IV
to s4
:
If I reinterpret sx
as being a set, I would write:
|s1 ∩ s2 ∩ s3 ∩ s4|
- the white 25 in the center|(s1 ∩ s2 ∩ s4) s3|
- the white 15 below right in relation to the center|(s1 ∩ s4) (s2 ∪ s3)|
- the white 5 on the bottom|s1 (s2 ∪ s3 ∪ s4)|
- the dark blue 60 on the blue ground- ... till 15.
How to calculate those powers efficiently on the MySQL server? Does MySQL provide a function aiding in the calculation?
A naive approach would be running a query for 1.
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);
and another query for 2.
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s4
EXCEPT
SELECT item FROM s3);
and so on, resulting in 15 queries.
mysql venn-diagram set-intersection
If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
– Rainer Rillke
Nov 10 at 0:41
1
There is noINTERSECT
andEXCEPT
in MySQL. So, you could use other RDBMS, which provides these features.
– Madhur Bhaiya
Nov 10 at 9:26
@MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
– Rainer Rillke
Nov 10 at 11:04
Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
– Rainer Rillke
Nov 11 at 14:40
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
Given the 4 tables, each containing items and representing one set, how to get the count of the items in each compartment required to draw a Venn diagram as shown below. The calculation should take place in the MySQL server avoiding transmission of items to the application server.
Example tables:
s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |
Now, I think I would calculate some set powers. Some examples with I
corresponding to s1
, II
to s2
, III
to s3
and IV
to s4
:
If I reinterpret sx
as being a set, I would write:
|s1 ∩ s2 ∩ s3 ∩ s4|
- the white 25 in the center|(s1 ∩ s2 ∩ s4) s3|
- the white 15 below right in relation to the center|(s1 ∩ s4) (s2 ∪ s3)|
- the white 5 on the bottom|s1 (s2 ∪ s3 ∪ s4)|
- the dark blue 60 on the blue ground- ... till 15.
How to calculate those powers efficiently on the MySQL server? Does MySQL provide a function aiding in the calculation?
A naive approach would be running a query for 1.
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);
and another query for 2.
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s4
EXCEPT
SELECT item FROM s3);
and so on, resulting in 15 queries.
mysql venn-diagram set-intersection
Given the 4 tables, each containing items and representing one set, how to get the count of the items in each compartment required to draw a Venn diagram as shown below. The calculation should take place in the MySQL server avoiding transmission of items to the application server.
Example tables:
s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |
Now, I think I would calculate some set powers. Some examples with I
corresponding to s1
, II
to s2
, III
to s3
and IV
to s4
:
If I reinterpret sx
as being a set, I would write:
|s1 ∩ s2 ∩ s3 ∩ s4|
- the white 25 in the center|(s1 ∩ s2 ∩ s4) s3|
- the white 15 below right in relation to the center|(s1 ∩ s4) (s2 ∪ s3)|
- the white 5 on the bottom|s1 (s2 ∪ s3 ∪ s4)|
- the dark blue 60 on the blue ground- ... till 15.
How to calculate those powers efficiently on the MySQL server? Does MySQL provide a function aiding in the calculation?
A naive approach would be running a query for 1.
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);
and another query for 2.
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s4
EXCEPT
SELECT item FROM s3);
and so on, resulting in 15 queries.
mysql venn-diagram set-intersection
mysql venn-diagram set-intersection
edited Nov 11 at 12:53
asked Nov 10 at 0:27
Rainer Rillke
957819
957819
If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
– Rainer Rillke
Nov 10 at 0:41
1
There is noINTERSECT
andEXCEPT
in MySQL. So, you could use other RDBMS, which provides these features.
– Madhur Bhaiya
Nov 10 at 9:26
@MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
– Rainer Rillke
Nov 10 at 11:04
Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
– Rainer Rillke
Nov 11 at 14:40
add a comment |
If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
– Rainer Rillke
Nov 10 at 0:41
1
There is noINTERSECT
andEXCEPT
in MySQL. So, you could use other RDBMS, which provides these features.
– Madhur Bhaiya
Nov 10 at 9:26
@MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
– Rainer Rillke
Nov 10 at 11:04
Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
– Rainer Rillke
Nov 11 at 14:40
If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
– Rainer Rillke
Nov 10 at 0:41
If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
– Rainer Rillke
Nov 10 at 0:41
1
1
There is no
INTERSECT
and EXCEPT
in MySQL. So, you could use other RDBMS, which provides these features.– Madhur Bhaiya
Nov 10 at 9:26
There is no
INTERSECT
and EXCEPT
in MySQL. So, you could use other RDBMS, which provides these features.– Madhur Bhaiya
Nov 10 at 9:26
@MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
– Rainer Rillke
Nov 10 at 11:04
@MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
– Rainer Rillke
Nov 10 at 11:04
Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
– Rainer Rillke
Nov 11 at 14:40
Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
– Rainer Rillke
Nov 11 at 14:40
add a comment |
3 Answers
3
active
oldest
votes
up vote
1
down vote
accepted
Try something like this:
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT'
for the ANSI string concatenation to work in MySQL, or use the concat
function instead. The WITH
syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.
If the sets are very large you might want to index the item
column before querying in case the SQL optimizer won't figure it out by itself.
add a comment |
up vote
0
down vote
Following procedure:
- Created a stored procedure, which creates temporary in-memory tables containing the sets.
- Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.
- As noted, MySQL does not have an
INTERSECT
orEXCEPT
. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified. - Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.
What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
add a comment |
up vote
0
down vote
The question is a little complex so the answers are. Let me explain K.T.'s answer
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
The universe
results in the UNION of all tables (duplicates eliminated), something like
+------+
| item |
+------+
| a |
+------+
| b |
+------+
| c |
+------+
| d |
+------+
| e |
+------+
| ... |
+------+
Then, s1, s2, s3 and s4 are joined
+------+---------+---------+---------+---------+
| item | s1.item | s2.item | s3.item | s4.item |
+------+---------+---------+---------+---------+
| a | a | a | a | a |
+------+---------+---------+---------+---------+
| b | b | b | b | NULL |
+------+---------+---------+---------+---------+
| c | c | c | NULL | c |
+------+---------+---------+---------+---------+
| d | d | NULL | d | d |
+------+---------+---------+---------+---------+
| e | NULL | e | e | e |
+------+---------+---------+---------+---------+
| ... | ... | ... | ... | ... |
+------+---------+---------+---------+---------+
and converted to a binary string (0: if cell is NULL; 1: else) called Region
where the first digit corresponds to s1, the second to s2 and so on
+------+--------+
| item | Region |
+------+--------+
| a | 1111 |
+------+--------+
| b | 1110 |
+------+--------+
| c | 1101 |
+------+--------+
| d | 1011 |
+------+--------+
| e | 0111 |
+------+--------+
| ... | ... |
+------+--------+
and finally aggregated and grouped by Region
+--------+-------+
| Region | count |
+--------+-------+
| 1111 | 1 |
+--------+-------+
| 1110 | 1 |
+--------+-------+
| 1101 | 1 |
+--------+-------+
| 1011 | 1 |
+--------+-------+
| 0111 | 1 |
+--------+-------+
| ... | |
+--------+-------+
Note that regions having 0 set elements in them do not show up in the results and 0000
never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Try something like this:
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT'
for the ANSI string concatenation to work in MySQL, or use the concat
function instead. The WITH
syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.
If the sets are very large you might want to index the item
column before querying in case the SQL optimizer won't figure it out by itself.
add a comment |
up vote
1
down vote
accepted
Try something like this:
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT'
for the ANSI string concatenation to work in MySQL, or use the concat
function instead. The WITH
syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.
If the sets are very large you might want to index the item
column before querying in case the SQL optimizer won't figure it out by itself.
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Try something like this:
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT'
for the ANSI string concatenation to work in MySQL, or use the concat
function instead. The WITH
syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.
If the sets are very large you might want to index the item
column before querying in case the SQL optimizer won't figure it out by itself.
Try something like this:
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
Disclaimer: I only tested this in SQLite. You might need to SET sql_mode='PIPES_AS_CONCAT'
for the ANSI string concatenation to work in MySQL, or use the concat
function instead. The WITH
syntax is only supported starting from version 8.0 of MySQL, but you can use temporary tables or nested queries appropriately instead.
If the sets are very large you might want to index the item
column before querying in case the SQL optimizer won't figure it out by itself.
edited Nov 17 at 14:24
answered Nov 16 at 23:07
KT.
5,18222455
5,18222455
add a comment |
add a comment |
up vote
0
down vote
Following procedure:
- Created a stored procedure, which creates temporary in-memory tables containing the sets.
- Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.
- As noted, MySQL does not have an
INTERSECT
orEXCEPT
. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified. - Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.
What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
add a comment |
up vote
0
down vote
Following procedure:
- Created a stored procedure, which creates temporary in-memory tables containing the sets.
- Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.
- As noted, MySQL does not have an
INTERSECT
orEXCEPT
. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified. - Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.
What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
add a comment |
up vote
0
down vote
up vote
0
down vote
Following procedure:
- Created a stored procedure, which creates temporary in-memory tables containing the sets.
- Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.
- As noted, MySQL does not have an
INTERSECT
orEXCEPT
. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified. - Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.
What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
Following procedure:
- Created a stored procedure, which creates temporary in-memory tables containing the sets.
- Mind that MySQL does not allow you refer to a temporary in-memory table more than one time in a query.
- As noted, MySQL does not have an
INTERSECT
orEXCEPT
. But you can emulate them. By removing duplicates from your raw data/ raw sets, emulation can be even more simplified. - Decided to store the computed value into a variable each and output a table consisting of all 15 of those values corresponding to components.
What I came up with is currently https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
answered Nov 14 at 10:26
Rainer Rillke
957819
957819
add a comment |
add a comment |
up vote
0
down vote
The question is a little complex so the answers are. Let me explain K.T.'s answer
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
The universe
results in the UNION of all tables (duplicates eliminated), something like
+------+
| item |
+------+
| a |
+------+
| b |
+------+
| c |
+------+
| d |
+------+
| e |
+------+
| ... |
+------+
Then, s1, s2, s3 and s4 are joined
+------+---------+---------+---------+---------+
| item | s1.item | s2.item | s3.item | s4.item |
+------+---------+---------+---------+---------+
| a | a | a | a | a |
+------+---------+---------+---------+---------+
| b | b | b | b | NULL |
+------+---------+---------+---------+---------+
| c | c | c | NULL | c |
+------+---------+---------+---------+---------+
| d | d | NULL | d | d |
+------+---------+---------+---------+---------+
| e | NULL | e | e | e |
+------+---------+---------+---------+---------+
| ... | ... | ... | ... | ... |
+------+---------+---------+---------+---------+
and converted to a binary string (0: if cell is NULL; 1: else) called Region
where the first digit corresponds to s1, the second to s2 and so on
+------+--------+
| item | Region |
+------+--------+
| a | 1111 |
+------+--------+
| b | 1110 |
+------+--------+
| c | 1101 |
+------+--------+
| d | 1011 |
+------+--------+
| e | 0111 |
+------+--------+
| ... | ... |
+------+--------+
and finally aggregated and grouped by Region
+--------+-------+
| Region | count |
+--------+-------+
| 1111 | 1 |
+--------+-------+
| 1110 | 1 |
+--------+-------+
| 1101 | 1 |
+--------+-------+
| 1011 | 1 |
+--------+-------+
| 0111 | 1 |
+--------+-------+
| ... | |
+--------+-------+
Note that regions having 0 set elements in them do not show up in the results and 0000
never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.
add a comment |
up vote
0
down vote
The question is a little complex so the answers are. Let me explain K.T.'s answer
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
The universe
results in the UNION of all tables (duplicates eliminated), something like
+------+
| item |
+------+
| a |
+------+
| b |
+------+
| c |
+------+
| d |
+------+
| e |
+------+
| ... |
+------+
Then, s1, s2, s3 and s4 are joined
+------+---------+---------+---------+---------+
| item | s1.item | s2.item | s3.item | s4.item |
+------+---------+---------+---------+---------+
| a | a | a | a | a |
+------+---------+---------+---------+---------+
| b | b | b | b | NULL |
+------+---------+---------+---------+---------+
| c | c | c | NULL | c |
+------+---------+---------+---------+---------+
| d | d | NULL | d | d |
+------+---------+---------+---------+---------+
| e | NULL | e | e | e |
+------+---------+---------+---------+---------+
| ... | ... | ... | ... | ... |
+------+---------+---------+---------+---------+
and converted to a binary string (0: if cell is NULL; 1: else) called Region
where the first digit corresponds to s1, the second to s2 and so on
+------+--------+
| item | Region |
+------+--------+
| a | 1111 |
+------+--------+
| b | 1110 |
+------+--------+
| c | 1101 |
+------+--------+
| d | 1011 |
+------+--------+
| e | 0111 |
+------+--------+
| ... | ... |
+------+--------+
and finally aggregated and grouped by Region
+--------+-------+
| Region | count |
+--------+-------+
| 1111 | 1 |
+--------+-------+
| 1110 | 1 |
+--------+-------+
| 1101 | 1 |
+--------+-------+
| 1011 | 1 |
+--------+-------+
| 0111 | 1 |
+--------+-------+
| ... | |
+--------+-------+
Note that regions having 0 set elements in them do not show up in the results and 0000
never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.
add a comment |
up vote
0
down vote
up vote
0
down vote
The question is a little complex so the answers are. Let me explain K.T.'s answer
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
The universe
results in the UNION of all tables (duplicates eliminated), something like
+------+
| item |
+------+
| a |
+------+
| b |
+------+
| c |
+------+
| d |
+------+
| e |
+------+
| ... |
+------+
Then, s1, s2, s3 and s4 are joined
+------+---------+---------+---------+---------+
| item | s1.item | s2.item | s3.item | s4.item |
+------+---------+---------+---------+---------+
| a | a | a | a | a |
+------+---------+---------+---------+---------+
| b | b | b | b | NULL |
+------+---------+---------+---------+---------+
| c | c | c | NULL | c |
+------+---------+---------+---------+---------+
| d | d | NULL | d | d |
+------+---------+---------+---------+---------+
| e | NULL | e | e | e |
+------+---------+---------+---------+---------+
| ... | ... | ... | ... | ... |
+------+---------+---------+---------+---------+
and converted to a binary string (0: if cell is NULL; 1: else) called Region
where the first digit corresponds to s1, the second to s2 and so on
+------+--------+
| item | Region |
+------+--------+
| a | 1111 |
+------+--------+
| b | 1110 |
+------+--------+
| c | 1101 |
+------+--------+
| d | 1011 |
+------+--------+
| e | 0111 |
+------+--------+
| ... | ... |
+------+--------+
and finally aggregated and grouped by Region
+--------+-------+
| Region | count |
+--------+-------+
| 1111 | 1 |
+--------+-------+
| 1110 | 1 |
+--------+-------+
| 1101 | 1 |
+--------+-------+
| 1011 | 1 |
+--------+-------+
| 0111 | 1 |
+--------+-------+
| ... | |
+--------+-------+
Note that regions having 0 set elements in them do not show up in the results and 0000
never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.
The question is a little complex so the answers are. Let me explain K.T.'s answer
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
The universe
results in the UNION of all tables (duplicates eliminated), something like
+------+
| item |
+------+
| a |
+------+
| b |
+------+
| c |
+------+
| d |
+------+
| e |
+------+
| ... |
+------+
Then, s1, s2, s3 and s4 are joined
+------+---------+---------+---------+---------+
| item | s1.item | s2.item | s3.item | s4.item |
+------+---------+---------+---------+---------+
| a | a | a | a | a |
+------+---------+---------+---------+---------+
| b | b | b | b | NULL |
+------+---------+---------+---------+---------+
| c | c | c | NULL | c |
+------+---------+---------+---------+---------+
| d | d | NULL | d | d |
+------+---------+---------+---------+---------+
| e | NULL | e | e | e |
+------+---------+---------+---------+---------+
| ... | ... | ... | ... | ... |
+------+---------+---------+---------+---------+
and converted to a binary string (0: if cell is NULL; 1: else) called Region
where the first digit corresponds to s1, the second to s2 and so on
+------+--------+
| item | Region |
+------+--------+
| a | 1111 |
+------+--------+
| b | 1110 |
+------+--------+
| c | 1101 |
+------+--------+
| d | 1011 |
+------+--------+
| e | 0111 |
+------+--------+
| ... | ... |
+------+--------+
and finally aggregated and grouped by Region
+--------+-------+
| Region | count |
+--------+-------+
| 1111 | 1 |
+--------+-------+
| 1110 | 1 |
+--------+-------+
| 1101 | 1 |
+--------+-------+
| 1011 | 1 |
+--------+-------+
| 0111 | 1 |
+--------+-------+
| ... | |
+--------+-------+
Note that regions having 0 set elements in them do not show up in the results and 0000
never will (=item not part of any set s1, s2, s3, s4) so there are 15 regions.
edited Nov 25 at 20:36
answered Nov 25 at 20:26
Rainer Rillke
957819
957819
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53234943%2fmysql-efficient-way-computing-set-powers-of-venn-diagram%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If someone tells me convincingly it would be a lot easier to do it with Postgres, I would change the question accordingly. It should probably read "Open Source DBMS: ..." but that's too broad for SO.
– Rainer Rillke
Nov 10 at 0:41
1
There is no
INTERSECT
andEXCEPT
in MySQL. So, you could use other RDBMS, which provides these features.– Madhur Bhaiya
Nov 10 at 9:26
@MadhurBhaiya Wasn't aware of that. MariaDB introduced set operations with 10.3.
– Rainer Rillke
Nov 10 at 11:04
Current solution: gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
– Rainer Rillke
Nov 11 at 14:40