collating multiple rows of a column in a panda to one row while maintaining the data type of the column
up vote
1
down vote
favorite
I have a panda with a few columns like this
username A time place
AAA B 1 YYY
AAA C 2 YYY
AAA D 1 YYY
AAA B 3 ZZZ
AAA C 4 ZZZ
AAA B 3 ZZZ
BBB B 1 YYY
BBB C 2 YYY
BBB D 1 YYY
BBB B 7 ZZZ
BBB C 8 ZZZ
BBB B 9 ZZZ
CCC B 6 YYY
CCC C 5 YYY
CCC D 8 YYY
CCC B 7 ZZZ
CCC C 8 ZZZ
CCC B 9 ZZZ
in the above panda, all the columns except time are strings. TIme is a float column.
I am trying create a sequence such that for every username, I want the all the rows of a username collated to one row. The output dataframe wants to look like this.
username A time place
AAA B+C+D+B+C+B 1+2+1+3+4+3 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
BBB B+C+D+B+C+B 1+2+1+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
CCC B+C+D+B+C+B 6+5+8+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
I am using the '+' as a separator, but it can be any character generally used for separators(like ,/ ..etc)
I have been able to do that for all the columns using
df.groupby('username')['A].apply('+',join).reset_index()
and the same for all columns. I am finally merging all the individual df`s to get the form I want.
For the time column I am able to do but am looking to get a column of type floats. I am having difficulty doing that. Hoping somebody more knowledgeable can guide me here.
I have even tried changing the output column after the fact with
df['time'].astype(float)
but am getting all NaN`s.
python pandas
add a comment |
up vote
1
down vote
favorite
I have a panda with a few columns like this
username A time place
AAA B 1 YYY
AAA C 2 YYY
AAA D 1 YYY
AAA B 3 ZZZ
AAA C 4 ZZZ
AAA B 3 ZZZ
BBB B 1 YYY
BBB C 2 YYY
BBB D 1 YYY
BBB B 7 ZZZ
BBB C 8 ZZZ
BBB B 9 ZZZ
CCC B 6 YYY
CCC C 5 YYY
CCC D 8 YYY
CCC B 7 ZZZ
CCC C 8 ZZZ
CCC B 9 ZZZ
in the above panda, all the columns except time are strings. TIme is a float column.
I am trying create a sequence such that for every username, I want the all the rows of a username collated to one row. The output dataframe wants to look like this.
username A time place
AAA B+C+D+B+C+B 1+2+1+3+4+3 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
BBB B+C+D+B+C+B 1+2+1+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
CCC B+C+D+B+C+B 6+5+8+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
I am using the '+' as a separator, but it can be any character generally used for separators(like ,/ ..etc)
I have been able to do that for all the columns using
df.groupby('username')['A].apply('+',join).reset_index()
and the same for all columns. I am finally merging all the individual df`s to get the form I want.
For the time column I am able to do but am looking to get a column of type floats. I am having difficulty doing that. Hoping somebody more knowledgeable can guide me here.
I have even tried changing the output column after the fact with
df['time'].astype(float)
but am getting all NaN`s.
python pandas
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a panda with a few columns like this
username A time place
AAA B 1 YYY
AAA C 2 YYY
AAA D 1 YYY
AAA B 3 ZZZ
AAA C 4 ZZZ
AAA B 3 ZZZ
BBB B 1 YYY
BBB C 2 YYY
BBB D 1 YYY
BBB B 7 ZZZ
BBB C 8 ZZZ
BBB B 9 ZZZ
CCC B 6 YYY
CCC C 5 YYY
CCC D 8 YYY
CCC B 7 ZZZ
CCC C 8 ZZZ
CCC B 9 ZZZ
in the above panda, all the columns except time are strings. TIme is a float column.
I am trying create a sequence such that for every username, I want the all the rows of a username collated to one row. The output dataframe wants to look like this.
username A time place
AAA B+C+D+B+C+B 1+2+1+3+4+3 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
BBB B+C+D+B+C+B 1+2+1+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
CCC B+C+D+B+C+B 6+5+8+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
I am using the '+' as a separator, but it can be any character generally used for separators(like ,/ ..etc)
I have been able to do that for all the columns using
df.groupby('username')['A].apply('+',join).reset_index()
and the same for all columns. I am finally merging all the individual df`s to get the form I want.
For the time column I am able to do but am looking to get a column of type floats. I am having difficulty doing that. Hoping somebody more knowledgeable can guide me here.
I have even tried changing the output column after the fact with
df['time'].astype(float)
but am getting all NaN`s.
python pandas
I have a panda with a few columns like this
username A time place
AAA B 1 YYY
AAA C 2 YYY
AAA D 1 YYY
AAA B 3 ZZZ
AAA C 4 ZZZ
AAA B 3 ZZZ
BBB B 1 YYY
BBB C 2 YYY
BBB D 1 YYY
BBB B 7 ZZZ
BBB C 8 ZZZ
BBB B 9 ZZZ
CCC B 6 YYY
CCC C 5 YYY
CCC D 8 YYY
CCC B 7 ZZZ
CCC C 8 ZZZ
CCC B 9 ZZZ
in the above panda, all the columns except time are strings. TIme is a float column.
I am trying create a sequence such that for every username, I want the all the rows of a username collated to one row. The output dataframe wants to look like this.
username A time place
AAA B+C+D+B+C+B 1+2+1+3+4+3 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
BBB B+C+D+B+C+B 1+2+1+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
CCC B+C+D+B+C+B 6+5+8+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
I am using the '+' as a separator, but it can be any character generally used for separators(like ,/ ..etc)
I have been able to do that for all the columns using
df.groupby('username')['A].apply('+',join).reset_index()
and the same for all columns. I am finally merging all the individual df`s to get the form I want.
For the time column I am able to do but am looking to get a column of type floats. I am having difficulty doing that. Hoping somebody more knowledgeable can guide me here.
I have even tried changing the output column after the fact with
df['time'].astype(float)
but am getting all NaN`s.
python pandas
python pandas
asked Nov 10 at 21:08
Acinonyx
327
327
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
I believe you need convert all columns to strings with agg
:
df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
print (df)
username A time place
0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
If need sum
numeric columns and join by +
strings columns:
df = (df.groupby('username', as_index=False)
.agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
print (df)
username A time place
0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
– Acinonyx
Nov 10 at 23:15
So forAAA
need14.0
for time?
– jezrael
Nov 10 at 23:16
@Acinonyx - Please check edited answer.
– jezrael
Nov 11 at 3:07
Cannot vote due to lack of reputation points. My Q is answered.
– Acinonyx
Nov 11 at 6:23
@Acinonyx - You can upvote now ;)
– jezrael
Nov 11 at 6:24
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
I believe you need convert all columns to strings with agg
:
df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
print (df)
username A time place
0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
If need sum
numeric columns and join by +
strings columns:
df = (df.groupby('username', as_index=False)
.agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
print (df)
username A time place
0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
– Acinonyx
Nov 10 at 23:15
So forAAA
need14.0
for time?
– jezrael
Nov 10 at 23:16
@Acinonyx - Please check edited answer.
– jezrael
Nov 11 at 3:07
Cannot vote due to lack of reputation points. My Q is answered.
– Acinonyx
Nov 11 at 6:23
@Acinonyx - You can upvote now ;)
– jezrael
Nov 11 at 6:24
add a comment |
up vote
1
down vote
accepted
I believe you need convert all columns to strings with agg
:
df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
print (df)
username A time place
0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
If need sum
numeric columns and join by +
strings columns:
df = (df.groupby('username', as_index=False)
.agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
print (df)
username A time place
0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
– Acinonyx
Nov 10 at 23:15
So forAAA
need14.0
for time?
– jezrael
Nov 10 at 23:16
@Acinonyx - Please check edited answer.
– jezrael
Nov 11 at 3:07
Cannot vote due to lack of reputation points. My Q is answered.
– Acinonyx
Nov 11 at 6:23
@Acinonyx - You can upvote now ;)
– jezrael
Nov 11 at 6:24
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
I believe you need convert all columns to strings with agg
:
df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
print (df)
username A time place
0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
If need sum
numeric columns and join by +
strings columns:
df = (df.groupby('username', as_index=False)
.agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
print (df)
username A time place
0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
I believe you need convert all columns to strings with agg
:
df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
print (df)
username A time place
0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
If need sum
numeric columns and join by +
strings columns:
df = (df.groupby('username', as_index=False)
.agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
print (df)
username A time place
0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
edited Nov 10 at 23:32
answered Nov 10 at 21:10
jezrael
308k20244319
308k20244319
I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
– Acinonyx
Nov 10 at 23:15
So forAAA
need14.0
for time?
– jezrael
Nov 10 at 23:16
@Acinonyx - Please check edited answer.
– jezrael
Nov 11 at 3:07
Cannot vote due to lack of reputation points. My Q is answered.
– Acinonyx
Nov 11 at 6:23
@Acinonyx - You can upvote now ;)
– jezrael
Nov 11 at 6:24
add a comment |
I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
– Acinonyx
Nov 10 at 23:15
So forAAA
need14.0
for time?
– jezrael
Nov 10 at 23:16
@Acinonyx - Please check edited answer.
– jezrael
Nov 11 at 3:07
Cannot vote due to lack of reputation points. My Q is answered.
– Acinonyx
Nov 11 at 6:23
@Acinonyx - You can upvote now ;)
– jezrael
Nov 11 at 6:24
I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
– Acinonyx
Nov 10 at 23:15
I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
– Acinonyx
Nov 10 at 23:15
So for
AAA
need 14.0
for time?– jezrael
Nov 10 at 23:16
So for
AAA
need 14.0
for time?– jezrael
Nov 10 at 23:16
@Acinonyx - Please check edited answer.
– jezrael
Nov 11 at 3:07
@Acinonyx - Please check edited answer.
– jezrael
Nov 11 at 3:07
Cannot vote due to lack of reputation points. My Q is answered.
– Acinonyx
Nov 11 at 6:23
Cannot vote due to lack of reputation points. My Q is answered.
– Acinonyx
Nov 11 at 6:23
@Acinonyx - You can upvote now ;)
– jezrael
Nov 11 at 6:24
@Acinonyx - You can upvote now ;)
– jezrael
Nov 11 at 6:24
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53243419%2fcollating-multiple-rows-of-a-column-in-a-panda-to-one-row-while-maintaining-the%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown