How to bin data in data frame in pandas

up vote
1
down vote

favorite

I have a time series data, say machine reading as follows(Say)

df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....]

How to change the data frame like following

If data in dataframe <= 25 percentile, value = 0.25, 
if 25p < data <=50p value = 0.50,
if 50p<data <= 75p, value = 0.75,
if data>75p , value = 1

I have tried

p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile 
p50 = df['machine_r'].quantile(0.5)
p75 = df['machine_r'].quantile(0.8)
p100 = df['machine_r'].quantile(1)
bins = [-100,p25,p50,p75,p100]
labels = [0.25, 0.5,0.75,1]
df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)

but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?

asked 21 hours ago

Ranjan Mondal

New contributor

add a comment |

up vote
1
down vote

favorite

I have a time series data, say machine reading as follows(Say)

df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....]

How to change the data frame like following

If data in dataframe <= 25 percentile, value = 0.25, 
if 25p < data <=50p value = 0.50,
if 50p<data <= 75p, value = 0.75,
if data>75p , value = 1

I have tried

p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile 
p50 = df['machine_r'].quantile(0.5)
p75 = df['machine_r'].quantile(0.8)
p100 = df['machine_r'].quantile(1)
bins = [-100,p25,p50,p75,p100]
labels = [0.25, 0.5,0.75,1]
df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)

but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?

asked 21 hours ago

Ranjan Mondal

New contributor

add a comment |

up vote
1
down vote

favorite

I have a time series data, say machine reading as follows(Say)

df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....]

How to change the data frame like following

If data in dataframe <= 25 percentile, value = 0.25, 
if 25p < data <=50p value = 0.50,
if 50p<data <= 75p, value = 0.75,
if data>75p , value = 1

I have tried

p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile 
p50 = df['machine_r'].quantile(0.5)
p75 = df['machine_r'].quantile(0.8)
p100 = df['machine_r'].quantile(1)
bins = [-100,p25,p50,p75,p100]
labels = [0.25, 0.5,0.75,1]
df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)

but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?

asked 21 hours ago

Ranjan Mondal

New contributor

I have a time series data, say machine reading as follows(Say)

df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....]

How to change the data frame like following

If data in dataframe <= 25 percentile, value = 0.25, 
if 25p < data <=50p value = 0.50,
if 50p<data <= 75p, value = 0.75,
if data>75p , value = 1

I have tried

p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile 
p50 = df['machine_r'].quantile(0.5)
p75 = df['machine_r'].quantile(0.8)
p100 = df['machine_r'].quantile(1)
bins = [-100,p25,p50,p75,p100]
labels = [0.25, 0.5,0.75,1]
df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)

but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?

python pandas dataframe statistics

asked 21 hours ago

Ranjan Mondal

New contributor

asked 21 hours ago

Ranjan Mondal

New contributor

asked 21 hours ago

Ranjan Mondal

New contributor

asked 21 hours ago

Ranjan Mondal

asked 21 hours ago

Ranjan Mondal

New contributor

Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can cast it to float by astype:

df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)

Also better is use qcut like mentioned Sandeep Kadapa:

df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
 machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50

print (df.dtypes)
machine_r int64
new float64
dtype: object

edited 20 hours ago

answered 21 hours ago

jezrael

304k20237314

1

@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago

1

@jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago

Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237197%2fhow-to-bin-data-in-data-frame-in-pandas%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can cast it to float by astype:

df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)

Also better is use qcut like mentioned Sandeep Kadapa:

df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
 machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50

print (df.dtypes)
machine_r int64
new float64
dtype: object

edited 20 hours ago

answered 21 hours ago

jezrael

304k20237314

1

@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago

1

@jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago

Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago

add a comment |

up vote
1
down vote

accepted

You can cast it to float by astype:

df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)

Also better is use qcut like mentioned Sandeep Kadapa:

df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
 machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50

print (df.dtypes)
machine_r int64
new float64
dtype: object

edited 20 hours ago

answered 21 hours ago

jezrael

304k20237314

1

@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago

1

@jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago

Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago

add a comment |

up vote
1
down vote

accepted

You can cast it to float by astype:

df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)

Also better is use qcut like mentioned Sandeep Kadapa:

df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
 machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50

print (df.dtypes)
machine_r int64
new float64
dtype: object

edited 20 hours ago

answered 21 hours ago

jezrael

304k20237314

You can cast it to float by astype:

df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)

Also better is use qcut like mentioned Sandeep Kadapa:

df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
 machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50

print (df.dtypes)
machine_r int64
new float64
dtype: object

edited 20 hours ago

answered 21 hours ago

jezrael

304k20237314

edited 20 hours ago

answered 21 hours ago

jezrael

304k20237314

answered 21 hours ago

jezrael

304k20237314

answered 21 hours ago

jezrael

304k20237314

1

@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago

1

@jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago

Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago

add a comment |

1

@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago

1

@jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago

Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago

@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago

@jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago

Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago

add a comment |

Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

This page is only for reference, If you need detailed information, please check here

3C9mhMimxN1SrynUHi m54sC38gULWjBLh2Z,KB7vWqwkiUnfkHw8m,txQObbhgD3weNf5MmqBjPMS,7efSP0qP7jYUZLpm87

搜尋此網誌

Odtnhj