How does spark handle aggregate max for non numeric values? [duplicate]










0
















This question already has an answer here:



  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer



I have a dataframe which has the following data



DF1



|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+


I want to understand what will the result of the dataframe if i have max on an aggregation



DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?



Edit--



This is not for date or any other datatype i want it exclusively for string










share|improve this question















marked as duplicate by eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 15 '18 at 14:06


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















  • i want it exclusively for string where the link provided is for date @user10465355

    – Sundeep Pidugu
    Nov 15 '18 at 13:53















0
















This question already has an answer here:



  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer



I have a dataframe which has the following data



DF1



|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+


I want to understand what will the result of the dataframe if i have max on an aggregation



DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?



Edit--



This is not for date or any other datatype i want it exclusively for string










share|improve this question















marked as duplicate by eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 15 '18 at 14:06


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















  • i want it exclusively for string where the link provided is for date @user10465355

    – Sundeep Pidugu
    Nov 15 '18 at 13:53













0












0








0









This question already has an answer here:



  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer



I have a dataframe which has the following data



DF1



|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+


I want to understand what will the result of the dataframe if i have max on an aggregation



DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?



Edit--



This is not for date or any other datatype i want it exclusively for string










share|improve this question

















This question already has an answer here:



  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer



I have a dataframe which has the following data



DF1



|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+


I want to understand what will the result of the dataframe if i have max on an aggregation



DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?



Edit--



This is not for date or any other datatype i want it exclusively for string





This question already has an answer here:



  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer







scala apache-spark apache-spark-sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 13:52







Sundeep Pidugu

















asked Nov 15 '18 at 13:10









Sundeep PiduguSundeep Pidugu

403114




403114




marked as duplicate by eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 15 '18 at 14:06


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 15 '18 at 14:06


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • i want it exclusively for string where the link provided is for date @user10465355

    – Sundeep Pidugu
    Nov 15 '18 at 13:53

















  • i want it exclusively for string where the link provided is for date @user10465355

    – Sundeep Pidugu
    Nov 15 '18 at 13:53
















i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53





i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53












1 Answer
1






active

oldest

votes


















2














Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+





share|improve this answer

























  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48

















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+





share|improve this answer

























  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48















2














Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+





share|improve this answer

























  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48













2












2








2







Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+





share|improve this answer















Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 15:13









Sundeep Pidugu

403114




403114










answered Nov 15 '18 at 13:32









Sathiyan SSathiyan S

513310




513310












  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48

















  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48
















so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55





so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55













yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18





yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18













Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30





Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30




1




1





df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48





df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48





這個網誌中的熱門文章

Barbados

How to read a connectionString WITH PROVIDER in .NET Core?

Node.js Script on GitHub Pages or Amazon S3