grouping rows in list in pandas groupby

up vote
111
down vote

favorite

I have a pandas data frame like:

A 1
A 2
B 5
B 5
B 4
C 6

I want to group by the first column and get second column as lists in rows:

A [1,2]
B [5,5,4]
C [6]

Is it possible to do something like this using pandas groupby?

asked Mar 6 '14 at 8:31

Abhishek Thakur

5,94393869

1

Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35

1

list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41

I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52

Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11

add a comment |

up vote
111
down vote

favorite

I have a pandas data frame like:

A 1
A 2
B 5
B 5
B 4
C 6

I want to group by the first column and get second column as lists in rows:

A [1,2]
B [5,5,4]
C [6]

Is it possible to do something like this using pandas groupby?

asked Mar 6 '14 at 8:31

Abhishek Thakur

5,94393869

1

Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35

1

list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41

I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52

Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11

add a comment |

up vote
111
down vote

favorite

I have a pandas data frame like:

A 1
A 2
B 5
B 5
B 4
C 6

I want to group by the first column and get second column as lists in rows:

A [1,2]
B [5,5,4]
C [6]

Is it possible to do something like this using pandas groupby?

asked Mar 6 '14 at 8:31

Abhishek Thakur

5,94393869

I have a pandas data frame like:

A 1
A 2
B 5
B 5
B 4
C 6

I want to group by the first column and get second column as lists in rows:

A [1,2]
B [5,5,4]
C [6]

Is it possible to do something like this using pandas groupby?

python pandas

asked Mar 6 '14 at 8:31

Abhishek Thakur

5,94393869

asked Mar 6 '14 at 8:31

Abhishek Thakur

5,94393869

asked Mar 6 '14 at 8:31

Abhishek Thakur

5,94393869

asked Mar 6 '14 at 8:31

Abhishek Thakur

5,94393869

asked Mar 6 '14 at 8:31

Abhishek Thakur

5,94393869

1

Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35

1

list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41

I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52

Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11

add a comment |

1

Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35

1

list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41

I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52

Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11

Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35

list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41

I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52

Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11

add a comment |

5 Answers
5

active

oldest

votes

up vote
167
down vote

accepted

You can do this using groupby to group on the column of interest and then apply list to every group:

In [1]:
# create the dataframe 
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
 a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6

[6 rows x 2 columns]

In [76]:
df.groupby('a')['b'].apply(list)

Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object

edited Sep 28 '16 at 12:09

answered Mar 6 '14 at 10:28

EdChum

165k31349299

3

This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12

4

groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32

@AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40

1

Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
– DSM
Mar 6 '14 at 12:21

1

When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54

|
show 6 more comments

up vote
21
down vote

If performance is important go down to numpy level:

import numpy as np

df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

def f(df):
 keys, values = df.sort_values('a').values.T
 ukeys, index = np.unique(keys,True)
 arrays = np.split(values,index[1:])
 df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
 return df2

Tests:

In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop

In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop

edited Aug 27 at 16:13

Seanny123

2,19833261

answered Mar 2 '17 at 8:42

B. M.

11.6k11934

add a comment |

up vote
11
down vote

As you were saying the groupby method of a pd.DataFrame object can do the job.

Example

 L = ['A','A','B','B','B','C']
 N = [1,2,5,5,4,6]

 import pandas as pd
 df = pd.DataFrame(zip(L,N),columns = list('LN'))


 groups = df.groupby(df.L)

 groups.groups
 'A': [0, 1], 'B': [2, 3, 4], 'C': [5]

which gives and index-wise description of the groups.

To get elements of single groups, you can do, for instance

 groups.get_group('A')

 L N
 0 A 1
 1 A 2

 groups.get_group('B')

 L N
 2 B 5
 3 B 5
 4 B 4

edited Mar 6 '14 at 10:17

answered Mar 6 '14 at 10:12

Acorbe

6,74632651

add a comment |

up vote
3
down vote

A handy way to achieve this would be:

df.groupby('a').agg('b':lambda x: list(x))

Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py

answered Sep 27 at 6:28

Anamika Modi

311

lambda args: f(args) is equivalent to f
– BallpointBen
Oct 11 at 17:43

add a comment |

up vote
1
down vote

To solve this for several columns of a dataframe:

In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
 ...: :[3,3,3,4,4,4])

In [6]: df
Out[6]: 
 a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4

In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]: 
 b c
a 
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]

This answer was inspired from Anamika Modi's answer. Thank you!

answered Oct 31 at 16:25

Markus Dutschke

1,0011816

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22219004%2fgrouping-rows-in-list-in-pandas-groupby%23new-answer', 'question_page');

);

Post as a guest

Name

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
167
down vote

accepted

You can do this using groupby to group on the column of interest and then apply list to every group:

In [1]:
# create the dataframe 
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
 a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6

[6 rows x 2 columns]

In [76]:
df.groupby('a')['b'].apply(list)

Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object

edited Sep 28 '16 at 12:09

answered Mar 6 '14 at 10:28

EdChum

165k31349299

3

This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12

4

groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32

@AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40

1

Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
– DSM
Mar 6 '14 at 12:21

1

When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54

|
show 6 more comments

up vote
167
down vote

accepted

You can do this using groupby to group on the column of interest and then apply list to every group:

In [1]:
# create the dataframe 
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
 a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6

[6 rows x 2 columns]

In [76]:
df.groupby('a')['b'].apply(list)

Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object

edited Sep 28 '16 at 12:09

answered Mar 6 '14 at 10:28

EdChum

165k31349299

3

This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12

4

groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32

@AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40

1

Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
– DSM
Mar 6 '14 at 12:21

1

When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54

|
show 6 more comments

up vote
167
down vote

accepted

You can do this using groupby to group on the column of interest and then apply list to every group:

In [1]:
# create the dataframe 
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
 a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6

[6 rows x 2 columns]

In [76]:
df.groupby('a')['b'].apply(list)

Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object

edited Sep 28 '16 at 12:09

answered Mar 6 '14 at 10:28

EdChum

165k31349299

You can do this using groupby to group on the column of interest and then apply list to every group:

In [1]:
# create the dataframe 
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
 a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6

[6 rows x 2 columns]

In [76]:
df.groupby('a')['b'].apply(list)

Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object

edited Sep 28 '16 at 12:09

answered Mar 6 '14 at 10:28

EdChum

165k31349299

edited Sep 28 '16 at 12:09

answered Mar 6 '14 at 10:28

EdChum

165k31349299

answered Mar 6 '14 at 10:28

EdChum

165k31349299

answered Mar 6 '14 at 10:28

EdChum

165k31349299

3

This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12

4

groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32

@AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40

1

Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
– DSM
Mar 6 '14 at 12:21

1

When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54

|
show 6 more comments

3

This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12

4

groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32

@AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40

1

Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
– DSM
Mar 6 '14 at 12:21

1

When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54

This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12

groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32

@AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40

Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
– DSM
Mar 6 '14 at 12:21

When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54

|
show 6 more comments

up vote
21
down vote

If performance is important go down to numpy level:

import numpy as np

df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

def f(df):
 keys, values = df.sort_values('a').values.T
 ukeys, index = np.unique(keys,True)
 arrays = np.split(values,index[1:])
 df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
 return df2

Tests:

In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop

In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop

edited Aug 27 at 16:13

Seanny123

2,19833261

answered Mar 2 '17 at 8:42

B. M.

11.6k11934

add a comment |

up vote
21
down vote

If performance is important go down to numpy level:

import numpy as np

df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

def f(df):
 keys, values = df.sort_values('a').values.T
 ukeys, index = np.unique(keys,True)
 arrays = np.split(values,index[1:])
 df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
 return df2

Tests:

In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop

In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop

edited Aug 27 at 16:13

Seanny123

2,19833261

answered Mar 2 '17 at 8:42

B. M.

11.6k11934

add a comment |

up vote
21
down vote

If performance is important go down to numpy level:

import numpy as np

df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

def f(df):
 keys, values = df.sort_values('a').values.T
 ukeys, index = np.unique(keys,True)
 arrays = np.split(values,index[1:])
 df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
 return df2

Tests:

In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop

In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop

edited Aug 27 at 16:13

Seanny123

2,19833261

answered Mar 2 '17 at 8:42

B. M.

11.6k11934

If performance is important go down to numpy level:

import numpy as np

df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

def f(df):
 keys, values = df.sort_values('a').values.T
 ukeys, index = np.unique(keys,True)
 arrays = np.split(values,index[1:])
 df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
 return df2

Tests:

In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop

In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop

edited Aug 27 at 16:13

Seanny123

2,19833261

answered Mar 2 '17 at 8:42

B. M.

11.6k11934

edited Aug 27 at 16:13

Seanny123

2,19833261

edited Aug 27 at 16:13

Seanny123

2,19833261

edited Aug 27 at 16:13

Seanny123

2,19833261

answered Mar 2 '17 at 8:42

B. M.

11.6k11934

answered Mar 2 '17 at 8:42

B. M.

11.6k11934

answered Mar 2 '17 at 8:42

B. M.

11.6k11934

add a comment |

up vote
11
down vote

As you were saying the groupby method of a pd.DataFrame object can do the job.

Example

 L = ['A','A','B','B','B','C']
 N = [1,2,5,5,4,6]

 import pandas as pd
 df = pd.DataFrame(zip(L,N),columns = list('LN'))


 groups = df.groupby(df.L)

 groups.groups
 'A': [0, 1], 'B': [2, 3, 4], 'C': [5]

which gives and index-wise description of the groups.

To get elements of single groups, you can do, for instance

 groups.get_group('A')

 L N
 0 A 1
 1 A 2

 groups.get_group('B')

 L N
 2 B 5
 3 B 5
 4 B 4

edited Mar 6 '14 at 10:17

answered Mar 6 '14 at 10:12

Acorbe

6,74632651

add a comment |

up vote
11
down vote

As you were saying the groupby method of a pd.DataFrame object can do the job.

Example

 L = ['A','A','B','B','B','C']
 N = [1,2,5,5,4,6]

 import pandas as pd
 df = pd.DataFrame(zip(L,N),columns = list('LN'))


 groups = df.groupby(df.L)

 groups.groups
 'A': [0, 1], 'B': [2, 3, 4], 'C': [5]

which gives and index-wise description of the groups.

To get elements of single groups, you can do, for instance

 groups.get_group('A')

 L N
 0 A 1
 1 A 2

 groups.get_group('B')

 L N
 2 B 5
 3 B 5
 4 B 4

edited Mar 6 '14 at 10:17

answered Mar 6 '14 at 10:12

Acorbe

6,74632651

add a comment |

up vote
11
down vote

As you were saying the groupby method of a pd.DataFrame object can do the job.

Example

 L = ['A','A','B','B','B','C']
 N = [1,2,5,5,4,6]

 import pandas as pd
 df = pd.DataFrame(zip(L,N),columns = list('LN'))


 groups = df.groupby(df.L)

 groups.groups
 'A': [0, 1], 'B': [2, 3, 4], 'C': [5]

which gives and index-wise description of the groups.

To get elements of single groups, you can do, for instance

 groups.get_group('A')

 L N
 0 A 1
 1 A 2

 groups.get_group('B')

 L N
 2 B 5
 3 B 5
 4 B 4

edited Mar 6 '14 at 10:17

answered Mar 6 '14 at 10:12

Acorbe

6,74632651

As you were saying the groupby method of a pd.DataFrame object can do the job.

Example

 L = ['A','A','B','B','B','C']
 N = [1,2,5,5,4,6]

 import pandas as pd
 df = pd.DataFrame(zip(L,N),columns = list('LN'))


 groups = df.groupby(df.L)

 groups.groups
 'A': [0, 1], 'B': [2, 3, 4], 'C': [5]

which gives and index-wise description of the groups.

To get elements of single groups, you can do, for instance

 groups.get_group('A')

 L N
 0 A 1
 1 A 2

 groups.get_group('B')

 L N
 2 B 5
 3 B 5
 4 B 4

edited Mar 6 '14 at 10:17

answered Mar 6 '14 at 10:12

Acorbe

6,74632651

edited Mar 6 '14 at 10:17

answered Mar 6 '14 at 10:12

Acorbe

6,74632651

answered Mar 6 '14 at 10:12

Acorbe

6,74632651

answered Mar 6 '14 at 10:12

Acorbe

6,74632651

add a comment |

up vote
3
down vote

A handy way to achieve this would be:

df.groupby('a').agg('b':lambda x: list(x))

Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py

answered Sep 27 at 6:28

Anamika Modi

311

lambda args: f(args) is equivalent to f
– BallpointBen
Oct 11 at 17:43

add a comment |

up vote
3
down vote

A handy way to achieve this would be:

df.groupby('a').agg('b':lambda x: list(x))

Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py

answered Sep 27 at 6:28

Anamika Modi

311

lambda args: f(args) is equivalent to f
– BallpointBen
Oct 11 at 17:43

add a comment |

up vote
3
down vote

A handy way to achieve this would be:

df.groupby('a').agg('b':lambda x: list(x))

Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py

answered Sep 27 at 6:28

Anamika Modi

311

A handy way to achieve this would be:

df.groupby('a').agg('b':lambda x: list(x))

Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py

answered Sep 27 at 6:28

Anamika Modi

311

answered Sep 27 at 6:28

Anamika Modi

311

answered Sep 27 at 6:28

Anamika Modi

311

answered Sep 27 at 6:28

Anamika Modi

311

lambda args: f(args) is equivalent to f
– BallpointBen
Oct 11 at 17:43

add a comment |

lambda args: f(args) is equivalent to f
– BallpointBen
Oct 11 at 17:43

lambda args: f(args) is equivalent to f
– BallpointBen
Oct 11 at 17:43

add a comment |

up vote
1
down vote

To solve this for several columns of a dataframe:

In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
 ...: :[3,3,3,4,4,4])

In [6]: df
Out[6]: 
 a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4

In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]: 
 b c
a 
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]

This answer was inspired from Anamika Modi's answer. Thank you!

answered Oct 31 at 16:25

Markus Dutschke

1,0011816

add a comment |

up vote
1
down vote

To solve this for several columns of a dataframe:

In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
 ...: :[3,3,3,4,4,4])

In [6]: df
Out[6]: 
 a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4

In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]: 
 b c
a 
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]

This answer was inspired from Anamika Modi's answer. Thank you!

answered Oct 31 at 16:25

Markus Dutschke

1,0011816

add a comment |

up vote
1
down vote

To solve this for several columns of a dataframe:

In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
 ...: :[3,3,3,4,4,4])

In [6]: df
Out[6]: 
 a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4

In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]: 
 b c
a 
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]

This answer was inspired from Anamika Modi's answer. Thank you!

answered Oct 31 at 16:25

Markus Dutschke

1,0011816

To solve this for several columns of a dataframe:

In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
 ...: :[3,3,3,4,4,4])

In [6]: df
Out[6]: 
 a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4

In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]: 
 b c
a 
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]

This answer was inspired from Anamika Modi's answer. Thank you!

answered Oct 31 at 16:25

Markus Dutschke

1,0011816

answered Oct 31 at 16:25

Markus Dutschke

1,0011816

answered Oct 31 at 16:25

Markus Dutschke

1,0011816

answered Oct 31 at 16:25

Markus Dutschke

1,0011816

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

This page is only for reference, If you need detailed information, please check here

iTUEpPPlyqy,u,tSK

搜尋此網誌

Odtnhj