grouping rows in list in pandas groupby
up vote
111
down vote
favorite
I have a pandas data frame like:
A 1
A 2
B 5
B 5
B 4
C 6
I want to group by the first column and get second column as lists in rows:
A [1,2]
B [5,5,4]
C [6]
Is it possible to do something like this using pandas groupby?
python pandas
add a comment |
up vote
111
down vote
favorite
I have a pandas data frame like:
A 1
A 2
B 5
B 5
B 4
C 6
I want to group by the first column and get second column as lists in rows:
A [1,2]
B [5,5,4]
C [6]
Is it possible to do something like this using pandas groupby?
python pandas
1
Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35
1
list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41
I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52
Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11
add a comment |
up vote
111
down vote
favorite
up vote
111
down vote
favorite
I have a pandas data frame like:
A 1
A 2
B 5
B 5
B 4
C 6
I want to group by the first column and get second column as lists in rows:
A [1,2]
B [5,5,4]
C [6]
Is it possible to do something like this using pandas groupby?
python pandas
I have a pandas data frame like:
A 1
A 2
B 5
B 5
B 4
C 6
I want to group by the first column and get second column as lists in rows:
A [1,2]
B [5,5,4]
C [6]
Is it possible to do something like this using pandas groupby?
python pandas
python pandas
asked Mar 6 '14 at 8:31
Abhishek Thakur
5,94393869
5,94393869
1
Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35
1
list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41
I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52
Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11
add a comment |
1
Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35
1
list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41
I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52
Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11
1
1
Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35
Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35
1
1
list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41
list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41
I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52
I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52
Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11
Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11
add a comment |
5 Answers
5
active
oldest
votes
up vote
167
down vote
accepted
You can do this using groupby
to group on the column of interest and then apply
list
to every group:
In [1]:
# create the dataframe
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6
[6 rows x 2 columns]
In [76]:
df.groupby('a')['b'].apply(list)
Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object
3
This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12
4
groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32
@AbhishekThakur actually that won't work asidxmin
will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40
1
Unless I'm missing something (no morning coffee yet) you're doing a separategroupby
for each row.
– DSM
Mar 6 '14 at 12:21
1
When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I usedtuple
following the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54
|
show 6 more comments
up vote
21
down vote
If performance is important go down to numpy level:
import numpy as np
df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)
def f(df):
keys, values = df.sort_values('a').values.T
ukeys, index = np.unique(keys,True)
arrays = np.split(values,index[1:])
df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
return df2
Tests:
In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop
In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop
add a comment |
up vote
11
down vote
As you were saying the groupby
method of a pd.DataFrame
object can do the job.
Example
L = ['A','A','B','B','B','C']
N = [1,2,5,5,4,6]
import pandas as pd
df = pd.DataFrame(zip(L,N),columns = list('LN'))
groups = df.groupby(df.L)
groups.groups
'A': [0, 1], 'B': [2, 3, 4], 'C': [5]
which gives and index-wise description of the groups.
To get elements of single groups, you can do, for instance
groups.get_group('A')
L N
0 A 1
1 A 2
groups.get_group('B')
L N
2 B 5
3 B 5
4 B 4
add a comment |
up vote
3
down vote
A handy way to achieve this would be:
df.groupby('a').agg('b':lambda x: list(x))
Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py
lambda args: f(args)
is equivalent tof
– BallpointBen
Oct 11 at 17:43
add a comment |
up vote
1
down vote
To solve this for several columns of a dataframe:
In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
...: :[3,3,3,4,4,4])
In [6]: df
Out[6]:
a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4
In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]:
b c
a
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]
This answer was inspired from Anamika Modi's answer. Thank you!
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
167
down vote
accepted
You can do this using groupby
to group on the column of interest and then apply
list
to every group:
In [1]:
# create the dataframe
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6
[6 rows x 2 columns]
In [76]:
df.groupby('a')['b'].apply(list)
Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object
3
This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12
4
groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32
@AbhishekThakur actually that won't work asidxmin
will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40
1
Unless I'm missing something (no morning coffee yet) you're doing a separategroupby
for each row.
– DSM
Mar 6 '14 at 12:21
1
When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I usedtuple
following the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54
|
show 6 more comments
up vote
167
down vote
accepted
You can do this using groupby
to group on the column of interest and then apply
list
to every group:
In [1]:
# create the dataframe
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6
[6 rows x 2 columns]
In [76]:
df.groupby('a')['b'].apply(list)
Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object
3
This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12
4
groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32
@AbhishekThakur actually that won't work asidxmin
will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40
1
Unless I'm missing something (no morning coffee yet) you're doing a separategroupby
for each row.
– DSM
Mar 6 '14 at 12:21
1
When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I usedtuple
following the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54
|
show 6 more comments
up vote
167
down vote
accepted
up vote
167
down vote
accepted
You can do this using groupby
to group on the column of interest and then apply
list
to every group:
In [1]:
# create the dataframe
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6
[6 rows x 2 columns]
In [76]:
df.groupby('a')['b'].apply(list)
Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object
You can do this using groupby
to group on the column of interest and then apply
list
to every group:
In [1]:
# create the dataframe
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6
[6 rows x 2 columns]
In [76]:
df.groupby('a')['b'].apply(list)
Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object
edited Sep 28 '16 at 12:09
answered Mar 6 '14 at 10:28
EdChum
165k31349299
165k31349299
3
This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12
4
groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32
@AbhishekThakur actually that won't work asidxmin
will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40
1
Unless I'm missing something (no morning coffee yet) you're doing a separategroupby
for each row.
– DSM
Mar 6 '14 at 12:21
1
When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I usedtuple
following the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54
|
show 6 more comments
3
This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12
4
groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32
@AbhishekThakur actually that won't work asidxmin
will not work for strings, you would need to store the beginning and end indices values
– EdChum
Mar 6 '14 at 11:40
1
Unless I'm missing something (no morning coffee yet) you're doing a separategroupby
for each row.
– DSM
Mar 6 '14 at 12:21
1
When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I usedtuple
following the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
– Andarin
Jun 24 '16 at 10:54
3
3
This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12
This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
– Abhishek Thakur
Mar 6 '14 at 11:12
4
4
groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32
groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
– EdChum
Mar 6 '14 at 11:32
@AbhishekThakur actually that won't work as
idxmin
will not work for strings, you would need to store the beginning and end indices values– EdChum
Mar 6 '14 at 11:40
@AbhishekThakur actually that won't work as
idxmin
will not work for strings, you would need to store the beginning and end indices values– EdChum
Mar 6 '14 at 11:40
1
1
Unless I'm missing something (no morning coffee yet) you're doing a separate
groupby
for each row.– DSM
Mar 6 '14 at 12:21
Unless I'm missing something (no morning coffee yet) you're doing a separate
groupby
for each row.– DSM
Mar 6 '14 at 12:21
1
1
When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used
tuple
following the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.– Andarin
Jun 24 '16 at 10:54
When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used
tuple
following the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.– Andarin
Jun 24 '16 at 10:54
|
show 6 more comments
up vote
21
down vote
If performance is important go down to numpy level:
import numpy as np
df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)
def f(df):
keys, values = df.sort_values('a').values.T
ukeys, index = np.unique(keys,True)
arrays = np.split(values,index[1:])
df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
return df2
Tests:
In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop
In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop
add a comment |
up vote
21
down vote
If performance is important go down to numpy level:
import numpy as np
df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)
def f(df):
keys, values = df.sort_values('a').values.T
ukeys, index = np.unique(keys,True)
arrays = np.split(values,index[1:])
df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
return df2
Tests:
In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop
In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop
add a comment |
up vote
21
down vote
up vote
21
down vote
If performance is important go down to numpy level:
import numpy as np
df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)
def f(df):
keys, values = df.sort_values('a').values.T
ukeys, index = np.unique(keys,True)
arrays = np.split(values,index[1:])
df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
return df2
Tests:
In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop
In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop
If performance is important go down to numpy level:
import numpy as np
df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)
def f(df):
keys, values = df.sort_values('a').values.T
ukeys, index = np.unique(keys,True)
arrays = np.split(values,index[1:])
df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
return df2
Tests:
In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop
In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop
edited Aug 27 at 16:13
Seanny123
2,19833261
2,19833261
answered Mar 2 '17 at 8:42
B. M.
11.6k11934
11.6k11934
add a comment |
add a comment |
up vote
11
down vote
As you were saying the groupby
method of a pd.DataFrame
object can do the job.
Example
L = ['A','A','B','B','B','C']
N = [1,2,5,5,4,6]
import pandas as pd
df = pd.DataFrame(zip(L,N),columns = list('LN'))
groups = df.groupby(df.L)
groups.groups
'A': [0, 1], 'B': [2, 3, 4], 'C': [5]
which gives and index-wise description of the groups.
To get elements of single groups, you can do, for instance
groups.get_group('A')
L N
0 A 1
1 A 2
groups.get_group('B')
L N
2 B 5
3 B 5
4 B 4
add a comment |
up vote
11
down vote
As you were saying the groupby
method of a pd.DataFrame
object can do the job.
Example
L = ['A','A','B','B','B','C']
N = [1,2,5,5,4,6]
import pandas as pd
df = pd.DataFrame(zip(L,N),columns = list('LN'))
groups = df.groupby(df.L)
groups.groups
'A': [0, 1], 'B': [2, 3, 4], 'C': [5]
which gives and index-wise description of the groups.
To get elements of single groups, you can do, for instance
groups.get_group('A')
L N
0 A 1
1 A 2
groups.get_group('B')
L N
2 B 5
3 B 5
4 B 4
add a comment |
up vote
11
down vote
up vote
11
down vote
As you were saying the groupby
method of a pd.DataFrame
object can do the job.
Example
L = ['A','A','B','B','B','C']
N = [1,2,5,5,4,6]
import pandas as pd
df = pd.DataFrame(zip(L,N),columns = list('LN'))
groups = df.groupby(df.L)
groups.groups
'A': [0, 1], 'B': [2, 3, 4], 'C': [5]
which gives and index-wise description of the groups.
To get elements of single groups, you can do, for instance
groups.get_group('A')
L N
0 A 1
1 A 2
groups.get_group('B')
L N
2 B 5
3 B 5
4 B 4
As you were saying the groupby
method of a pd.DataFrame
object can do the job.
Example
L = ['A','A','B','B','B','C']
N = [1,2,5,5,4,6]
import pandas as pd
df = pd.DataFrame(zip(L,N),columns = list('LN'))
groups = df.groupby(df.L)
groups.groups
'A': [0, 1], 'B': [2, 3, 4], 'C': [5]
which gives and index-wise description of the groups.
To get elements of single groups, you can do, for instance
groups.get_group('A')
L N
0 A 1
1 A 2
groups.get_group('B')
L N
2 B 5
3 B 5
4 B 4
edited Mar 6 '14 at 10:17
answered Mar 6 '14 at 10:12
Acorbe
6,74632651
6,74632651
add a comment |
add a comment |
up vote
3
down vote
A handy way to achieve this would be:
df.groupby('a').agg('b':lambda x: list(x))
Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py
lambda args: f(args)
is equivalent tof
– BallpointBen
Oct 11 at 17:43
add a comment |
up vote
3
down vote
A handy way to achieve this would be:
df.groupby('a').agg('b':lambda x: list(x))
Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py
lambda args: f(args)
is equivalent tof
– BallpointBen
Oct 11 at 17:43
add a comment |
up vote
3
down vote
up vote
3
down vote
A handy way to achieve this would be:
df.groupby('a').agg('b':lambda x: list(x))
Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py
A handy way to achieve this would be:
df.groupby('a').agg('b':lambda x: list(x))
Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py
answered Sep 27 at 6:28
Anamika Modi
311
311
lambda args: f(args)
is equivalent tof
– BallpointBen
Oct 11 at 17:43
add a comment |
lambda args: f(args)
is equivalent tof
– BallpointBen
Oct 11 at 17:43
lambda args: f(args)
is equivalent to f
– BallpointBen
Oct 11 at 17:43
lambda args: f(args)
is equivalent to f
– BallpointBen
Oct 11 at 17:43
add a comment |
up vote
1
down vote
To solve this for several columns of a dataframe:
In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
...: :[3,3,3,4,4,4])
In [6]: df
Out[6]:
a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4
In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]:
b c
a
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]
This answer was inspired from Anamika Modi's answer. Thank you!
add a comment |
up vote
1
down vote
To solve this for several columns of a dataframe:
In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
...: :[3,3,3,4,4,4])
In [6]: df
Out[6]:
a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4
In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]:
b c
a
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]
This answer was inspired from Anamika Modi's answer. Thank you!
add a comment |
up vote
1
down vote
up vote
1
down vote
To solve this for several columns of a dataframe:
In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
...: :[3,3,3,4,4,4])
In [6]: df
Out[6]:
a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4
In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]:
b c
a
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]
This answer was inspired from Anamika Modi's answer. Thank you!
To solve this for several columns of a dataframe:
In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
...: :[3,3,3,4,4,4])
In [6]: df
Out[6]:
a b c
0 A 1 3
1 A 2 3
2 B 5 3
3 B 5 4
4 B 4 4
5 C 6 4
In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]:
b c
a
A [1, 2] [3, 3]
B [5, 5, 4] [3, 4, 4]
C [6] [4]
This answer was inspired from Anamika Modi's answer. Thank you!
answered Oct 31 at 16:25
Markus Dutschke
1,0011816
1,0011816
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22219004%2fgrouping-rows-in-list-in-pandas-groupby%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35
1
list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41
I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52
Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11