How does spark handle aggregate max for non numeric values? [duplicate]

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

I have a dataframe which has the following data

DF1

|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+

I want to understand what will the result of the dataframe if i have max on an aggregation

DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?

Edit--

This is not for date or any other datatype i want it exclusively for string

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

403114

marked as duplicate by eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 15 '18 at 14:06

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

add a comment |

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

I have a dataframe which has the following data

DF1

|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+

I want to understand what will the result of the dataframe if i have max on an aggregation

DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?

Edit--

This is not for date or any other datatype i want it exclusively for string

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

403114

marked as duplicate by eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 15 '18 at 14:06

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

add a comment |

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

I have a dataframe which has the following data

DF1

|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+

I want to understand what will the result of the dataframe if i have max on an aggregation

DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?

Edit--

This is not for date or any other datatype i want it exclusively for string

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

403114

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

I have a dataframe which has the following data

DF1

|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+

I want to understand what will the result of the dataframe if i have max on an aggregation

DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?

Edit--

This is not for date or any other datatype i want it exclusively for string

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

scala apache-spark apache-spark-sql

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

403114

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

403114

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

403114

asked Nov 15 '18 at 13:10

Sundeep Pidugu

403114

asked Nov 15 '18 at 13:10

Sundeep Pidugu

403114

marked as duplicate by eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 15 '18 at 14:06

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 15 '18 at 14:06

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

add a comment |

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

add a comment |

1 Answer
1

active

oldest

votes

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

403114

answered Nov 15 '18 at 13:32

Sathiyan S

513310

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

403114

answered Nov 15 '18 at 13:32

Sathiyan S

513310

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

403114

answered Nov 15 '18 at 13:32

Sathiyan S

513310

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

403114

answered Nov 15 '18 at 13:32

Sathiyan S

513310

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

403114

answered Nov 15 '18 at 13:32

Sathiyan S

513310

edited Nov 15 '18 at 15:13

Sundeep Pidugu

403114

edited Nov 15 '18 at 15:13

Sundeep Pidugu

403114

edited Nov 15 '18 at 15:13

Sundeep Pidugu

403114

answered Nov 15 '18 at 13:32

Sathiyan S

513310

answered Nov 15 '18 at 13:32

Sathiyan S

513310

answered Nov 15 '18 at 13:32

Sathiyan S

513310

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

This page is only for reference, If you need detailed information, please check here

geS7 7K6rnAZaKbQn9R0cJpKa eSi7ggNRuEJ N,9nygHUvQEAnRUIj

搜尋此網誌

Odtnhj