Converting a Dataframe into a Series with cells containing arrays in Pandas

up vote
1
down vote

favorite

Sorry if this has been answered before, but I'm having trouble with the solution.

I have a 2D DataFrame with column names, where the elements contain both non-null and null values. I'd like to 'flatten' the 2D DataFrame to a 1D Series, where I preserve only the non-null data as a list in the series cell with the corresponding header.

ie: the following:

Going from (type pandas.Dataframe):

| asset | name | id |
---------------------
| a | john | 001|
| a | NaN | 002|
| NaN | dave | 003|

To (type pandas.Series):

| asset | name | id |
------------------------------------------
| [a] | [john, dave] | [001, 002, 003] |

Thank you!

EDIT: Why I would need this:

I am starting with a large DataFrame that has multiple duplicated attributes with timestamped 'rows'. At any given timestamp, the information in the rows could be added to, to deleted. I have used df.where() to return a dataframe of the unique values, and am attempting to flatten it down to one attribute collection of 'ids' per row.

In practice, the example table is from a a single GroupBy object.

edited Nov 10 at 23:26

asked Nov 10 at 23:15

Paul Choi

6018

1

Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16

add a comment |

up vote
1
down vote

favorite

Sorry if this has been answered before, but I'm having trouble with the solution.

ie: the following:

Going from (type pandas.Dataframe):

| asset | name | id |
---------------------
| a | john | 001|
| a | NaN | 002|
| NaN | dave | 003|

To (type pandas.Series):

| asset | name | id |
------------------------------------------
| [a] | [john, dave] | [001, 002, 003] |

Thank you!

EDIT: Why I would need this:

In practice, the example table is from a a single GroupBy object.

edited Nov 10 at 23:26

asked Nov 10 at 23:15

Paul Choi

6018

1

Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16

add a comment |

up vote
1
down vote

favorite

Sorry if this has been answered before, but I'm having trouble with the solution.

ie: the following:

Going from (type pandas.Dataframe):

| asset | name | id |
---------------------
| a | john | 001|
| a | NaN | 002|
| NaN | dave | 003|

To (type pandas.Series):

| asset | name | id |
------------------------------------------
| [a] | [john, dave] | [001, 002, 003] |

Thank you!

EDIT: Why I would need this:

In practice, the example table is from a a single GroupBy object.

edited Nov 10 at 23:26

asked Nov 10 at 23:15

Paul Choi

6018

Sorry if this has been answered before, but I'm having trouble with the solution.

ie: the following:

Going from (type pandas.Dataframe):

| asset | name | id |
---------------------
| a | john | 001|
| a | NaN | 002|
| NaN | dave | 003|

To (type pandas.Series):

| asset | name | id |
------------------------------------------
| [a] | [john, dave] | [001, 002, 003] |

Thank you!

EDIT: Why I would need this:

In practice, the example table is from a a single GroupBy object.

python pandas dataframe series

edited Nov 10 at 23:26

asked Nov 10 at 23:15

Paul Choi

6018

edited Nov 10 at 23:26

asked Nov 10 at 23:15

Paul Choi

6018

edited Nov 10 at 23:26

asked Nov 10 at 23:15

Paul Choi

6018

asked Nov 10 at 23:15

Paul Choi

6018

asked Nov 10 at 23:15

Paul Choi

6018

1

Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16

add a comment |

1

Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16

Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

Instantiate a new series using a dict comprehension (this should be faster than an apply based solution).

pd.Series(c : df[c].dropna().unique().tolist() for c in df.columns)

asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object

If you want a single rowed DataFrame instead, use

pd.Series(
 c : df[c].dropna().unique().tolist() for c in df.columns
).to_frame().T

 asset name id
0 [a] [john, dave] [1, 2, 3]

answered Nov 10 at 23:18

coldspeed

111k17101169

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244352%2fconverting-a-dataframe-into-a-series-with-cells-containing-arrays-in-pandas%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

Instantiate a new series using a dict comprehension (this should be faster than an apply based solution).

pd.Series(c : df[c].dropna().unique().tolist() for c in df.columns)

asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object

If you want a single rowed DataFrame instead, use

pd.Series(
 c : df[c].dropna().unique().tolist() for c in df.columns
).to_frame().T

 asset name id
0 [a] [john, dave] [1, 2, 3]

answered Nov 10 at 23:18

coldspeed

111k17101169

add a comment |

up vote
1
down vote

accepted

Instantiate a new series using a dict comprehension (this should be faster than an apply based solution).

pd.Series(c : df[c].dropna().unique().tolist() for c in df.columns)

asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object

If you want a single rowed DataFrame instead, use

pd.Series(
 c : df[c].dropna().unique().tolist() for c in df.columns
).to_frame().T

 asset name id
0 [a] [john, dave] [1, 2, 3]

answered Nov 10 at 23:18

coldspeed

111k17101169

add a comment |

up vote
1
down vote

accepted

Instantiate a new series using a dict comprehension (this should be faster than an apply based solution).

pd.Series(c : df[c].dropna().unique().tolist() for c in df.columns)

asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object

If you want a single rowed DataFrame instead, use

pd.Series(
 c : df[c].dropna().unique().tolist() for c in df.columns
).to_frame().T

 asset name id
0 [a] [john, dave] [1, 2, 3]

answered Nov 10 at 23:18

coldspeed

111k17101169

Instantiate a new series using a dict comprehension (this should be faster than an apply based solution).

pd.Series(c : df[c].dropna().unique().tolist() for c in df.columns)

asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object

If you want a single rowed DataFrame instead, use

pd.Series(
 c : df[c].dropna().unique().tolist() for c in df.columns
).to_frame().T

 asset name id
0 [a] [john, dave] [1, 2, 3]

answered Nov 10 at 23:18

coldspeed

111k17101169

answered Nov 10 at 23:18

coldspeed

111k17101169

answered Nov 10 at 23:18

coldspeed

111k17101169

answered Nov 10 at 23:18

coldspeed

111k17101169

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Odtnhj