How to bin data in data frame in pandas
up vote
1
down vote
favorite
I have a time series data, say machine reading as follows(Say)
df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....]
How to change the data frame like following
If data in dataframe <= 25 percentile, value = 0.25,
if 25p < data <=50p value = 0.50,
if 50p<data <= 75p, value = 0.75,
if data>75p , value = 1
I have tried
p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile
p50 = df['machine_r'].quantile(0.5)
p75 = df['machine_r'].quantile(0.8)
p100 = df['machine_r'].quantile(1)
bins = [-100,p25,p50,p75,p100]
labels = [0.25, 0.5,0.75,1]
df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)
but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?
python pandas dataframe statistics
New contributor
add a comment |
up vote
1
down vote
favorite
I have a time series data, say machine reading as follows(Say)
df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....]
How to change the data frame like following
If data in dataframe <= 25 percentile, value = 0.25,
if 25p < data <=50p value = 0.50,
if 50p<data <= 75p, value = 0.75,
if data>75p , value = 1
I have tried
p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile
p50 = df['machine_r'].quantile(0.5)
p75 = df['machine_r'].quantile(0.8)
p100 = df['machine_r'].quantile(1)
bins = [-100,p25,p50,p75,p100]
labels = [0.25, 0.5,0.75,1]
df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)
but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?
python pandas dataframe statistics
New contributor
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a time series data, say machine reading as follows(Say)
df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....]
How to change the data frame like following
If data in dataframe <= 25 percentile, value = 0.25,
if 25p < data <=50p value = 0.50,
if 50p<data <= 75p, value = 0.75,
if data>75p , value = 1
I have tried
p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile
p50 = df['machine_r'].quantile(0.5)
p75 = df['machine_r'].quantile(0.8)
p100 = df['machine_r'].quantile(1)
bins = [-100,p25,p50,p75,p100]
labels = [0.25, 0.5,0.75,1]
df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)
but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?
python pandas dataframe statistics
New contributor
I have a time series data, say machine reading as follows(Say)
df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....]
How to change the data frame like following
If data in dataframe <= 25 percentile, value = 0.25,
if 25p < data <=50p value = 0.50,
if 50p<data <= 75p, value = 0.75,
if data>75p , value = 1
I have tried
p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile
p50 = df['machine_r'].quantile(0.5)
p75 = df['machine_r'].quantile(0.8)
p100 = df['machine_r'].quantile(1)
bins = [-100,p25,p50,p75,p100]
labels = [0.25, 0.5,0.75,1]
df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)
but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?
python pandas dataframe statistics
python pandas dataframe statistics
New contributor
New contributor
New contributor
asked 21 hours ago
Ranjan Mondal
83
83
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You can cast it to float by astype
:
df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)
Also better is use qcut
like mentioned Sandeep Kadapa:
df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50
print (df.dtypes)
machine_r int64
new float64
dtype: object
1
@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago
1
@jezrael Better to usepd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float)
than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago
Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You can cast it to float by astype
:
df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)
Also better is use qcut
like mentioned Sandeep Kadapa:
df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50
print (df.dtypes)
machine_r int64
new float64
dtype: object
1
@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago
1
@jezrael Better to usepd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float)
than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago
Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago
add a comment |
up vote
1
down vote
accepted
You can cast it to float by astype
:
df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)
Also better is use qcut
like mentioned Sandeep Kadapa:
df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50
print (df.dtypes)
machine_r int64
new float64
dtype: object
1
@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago
1
@jezrael Better to usepd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float)
than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago
Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You can cast it to float by astype
:
df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)
Also better is use qcut
like mentioned Sandeep Kadapa:
df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50
print (df.dtypes)
machine_r int64
new float64
dtype: object
You can cast it to float by astype
:
df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)
Also better is use qcut
like mentioned Sandeep Kadapa:
df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
print (df)
machine_r new
0 1 0.25
1 2 0.50
2 1 0.25
3 5 0.75
4 3 0.50
5 4 0.75
6 5 0.75
7 1 0.25
8 2 0.50
9 3 0.50
10 4 0.75
11 5 0.75
12 7 1.00
13 8 1.00
14 1 0.25
15 2 0.50
print (df.dtypes)
machine_r int64
new float64
dtype: object
edited 20 hours ago
answered 21 hours ago
jezrael
304k20237314
304k20237314
1
@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago
1
@jezrael Better to usepd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float)
than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago
Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago
add a comment |
1
@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago
1
@jezrael Better to usepd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float)
than calculating each quantile seperately and binning.
– Sandeep Kadapa
20 hours ago
Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago
1
1
@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago
@RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
20 hours ago
1
1
@jezrael Better to use
pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float)
than calculating each quantile seperately and binning.– Sandeep Kadapa
20 hours ago
@jezrael Better to use
pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float)
than calculating each quantile seperately and binning.– Sandeep Kadapa
20 hours ago
Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago
Thanks Sandeep Kadapa . This code made it a lot easier.
– Ranjan Mondal
19 hours ago
add a comment |
Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.
Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.
Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.
Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237197%2fhow-to-bin-data-in-data-frame-in-pandas%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password