How to bin data in data frame in pandas









up vote
1
down vote

favorite












I have a time series data, say machine reading as follows(Say)



df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....] 


How to change the data frame like following



If data in dataframe <= 25 percentile, value = 0.25, 
if 25p < data <=50p value = 0.50,
if 50p<data <= 75p, value = 0.75,
if data>75p , value = 1


I have tried



p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile 
p50 = df['machine_r'].quantile(0.5)
p75 = df['machine_r'].quantile(0.8)
p100 = df['machine_r'].quantile(1)
bins = [-100,p25,p50,p75,p100]
labels = [0.25, 0.5,0.75,1]
df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)


but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?










share|improve this question







New contributor




Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.























    up vote
    1
    down vote

    favorite












    I have a time series data, say machine reading as follows(Say)



    df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....] 


    How to change the data frame like following



    If data in dataframe <= 25 percentile, value = 0.25, 
    if 25p < data <=50p value = 0.50,
    if 50p<data <= 75p, value = 0.75,
    if data>75p , value = 1


    I have tried



    p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile 
    p50 = df['machine_r'].quantile(0.5)
    p75 = df['machine_r'].quantile(0.8)
    p100 = df['machine_r'].quantile(1)
    bins = [-100,p25,p50,p75,p100]
    labels = [0.25, 0.5,0.75,1]
    df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)


    but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?










    share|improve this question







    New contributor




    Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have a time series data, say machine reading as follows(Say)



      df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....] 


      How to change the data frame like following



      If data in dataframe <= 25 percentile, value = 0.25, 
      if 25p < data <=50p value = 0.50,
      if 50p<data <= 75p, value = 0.75,
      if data>75p , value = 1


      I have tried



      p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile 
      p50 = df['machine_r'].quantile(0.5)
      p75 = df['machine_r'].quantile(0.8)
      p100 = df['machine_r'].quantile(1)
      bins = [-100,p25,p50,p75,p100]
      labels = [0.25, 0.5,0.75,1]
      df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)


      but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?










      share|improve this question







      New contributor




      Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I have a time series data, say machine reading as follows(Say)



      df['machine_r'] = [1,2,1,5,3,4,5,1,2,3,4,5,7,8,1,2.....] 


      How to change the data frame like following



      If data in dataframe <= 25 percentile, value = 0.25, 
      if 25p < data <=50p value = 0.50,
      if 50p<data <= 75p, value = 0.75,
      if data>75p , value = 1


      I have tried



      p25 = df['machine_r'].quantile(0.25) ## p25 is 25 percentile 
      p50 = df['machine_r'].quantile(0.5)
      p75 = df['machine_r'].quantile(0.8)
      p100 = df['machine_r'].quantile(1)
      bins = [-100,p25,p50,p75,p100]
      labels = [0.25, 0.5,0.75,1]
      df['machine_r'] = pd.cut(df['copper'], bins=bins,labels=labels)


      but it is returning 0, 0.25, 0.5, 0.75, 1 as categorical values but I need them as float for further analysis. How can it be done?







      python pandas dataframe statistics






      share|improve this question







      New contributor




      Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 21 hours ago









      Ranjan Mondal

      83




      83




      New contributor




      Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Ranjan Mondal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          You can cast it to float by astype:



          df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)


          Also better is use qcut like mentioned Sandeep Kadapa:



          df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
          print (df)
          machine_r new
          0 1 0.25
          1 2 0.50
          2 1 0.25
          3 5 0.75
          4 3 0.50
          5 4 0.75
          6 5 0.75
          7 1 0.25
          8 2 0.50
          9 3 0.50
          10 4 0.75
          11 5 0.75
          12 7 1.00
          13 8 1.00
          14 1 0.25
          15 2 0.50

          print (df.dtypes)
          machine_r int64
          new float64
          dtype: object





          share|improve this answer


















          • 1




            @RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
            – jezrael
            20 hours ago






          • 1




            @jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
            – Sandeep Kadapa
            20 hours ago











          • Thanks Sandeep Kadapa . This code made it a lot easier.
            – Ranjan Mondal
            19 hours ago











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.









           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237197%2fhow-to-bin-data-in-data-frame-in-pandas%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          You can cast it to float by astype:



          df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)


          Also better is use qcut like mentioned Sandeep Kadapa:



          df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
          print (df)
          machine_r new
          0 1 0.25
          1 2 0.50
          2 1 0.25
          3 5 0.75
          4 3 0.50
          5 4 0.75
          6 5 0.75
          7 1 0.25
          8 2 0.50
          9 3 0.50
          10 4 0.75
          11 5 0.75
          12 7 1.00
          13 8 1.00
          14 1 0.25
          15 2 0.50

          print (df.dtypes)
          machine_r int64
          new float64
          dtype: object





          share|improve this answer


















          • 1




            @RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
            – jezrael
            20 hours ago






          • 1




            @jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
            – Sandeep Kadapa
            20 hours ago











          • Thanks Sandeep Kadapa . This code made it a lot easier.
            – Ranjan Mondal
            19 hours ago















          up vote
          1
          down vote



          accepted










          You can cast it to float by astype:



          df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)


          Also better is use qcut like mentioned Sandeep Kadapa:



          df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
          print (df)
          machine_r new
          0 1 0.25
          1 2 0.50
          2 1 0.25
          3 5 0.75
          4 3 0.50
          5 4 0.75
          6 5 0.75
          7 1 0.25
          8 2 0.50
          9 3 0.50
          10 4 0.75
          11 5 0.75
          12 7 1.00
          13 8 1.00
          14 1 0.25
          15 2 0.50

          print (df.dtypes)
          machine_r int64
          new float64
          dtype: object





          share|improve this answer


















          • 1




            @RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
            – jezrael
            20 hours ago






          • 1




            @jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
            – Sandeep Kadapa
            20 hours ago











          • Thanks Sandeep Kadapa . This code made it a lot easier.
            – Ranjan Mondal
            19 hours ago













          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          You can cast it to float by astype:



          df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)


          Also better is use qcut like mentioned Sandeep Kadapa:



          df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
          print (df)
          machine_r new
          0 1 0.25
          1 2 0.50
          2 1 0.25
          3 5 0.75
          4 3 0.50
          5 4 0.75
          6 5 0.75
          7 1 0.25
          8 2 0.50
          9 3 0.50
          10 4 0.75
          11 5 0.75
          12 7 1.00
          13 8 1.00
          14 1 0.25
          15 2 0.50

          print (df.dtypes)
          machine_r int64
          new float64
          dtype: object





          share|improve this answer














          You can cast it to float by astype:



          df['new'] = pd.cut(df['machine_r'], bins=bins,labels=labels).astype(float)


          Also better is use qcut like mentioned Sandeep Kadapa:



          df['new'] = pd.qcut(x=df.machine_r, q=[0, .25, .5, .8, 1.], labels=labels).astype(float)
          print (df)
          machine_r new
          0 1 0.25
          1 2 0.50
          2 1 0.25
          3 5 0.75
          4 3 0.50
          5 4 0.75
          6 5 0.75
          7 1 0.25
          8 2 0.50
          9 3 0.50
          10 4 0.75
          11 5 0.75
          12 7 1.00
          13 8 1.00
          14 1 0.25
          15 2 0.50

          print (df.dtypes)
          machine_r int64
          new float64
          dtype: object






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 20 hours ago

























          answered 21 hours ago









          jezrael

          304k20237314




          304k20237314







          • 1




            @RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
            – jezrael
            20 hours ago






          • 1




            @jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
            – Sandeep Kadapa
            20 hours ago











          • Thanks Sandeep Kadapa . This code made it a lot easier.
            – Ranjan Mondal
            19 hours ago













          • 1




            @RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
            – jezrael
            20 hours ago






          • 1




            @jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
            – Sandeep Kadapa
            20 hours ago











          • Thanks Sandeep Kadapa . This code made it a lot easier.
            – Ranjan Mondal
            19 hours ago








          1




          1




          @RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
          – jezrael
          20 hours ago




          @RanjanMondal - You are welcome! If my answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
          – jezrael
          20 hours ago




          1




          1




          @jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
          – Sandeep Kadapa
          20 hours ago





          @jezrael Better to use pd.qcut(x=df.machine_r,q=[0, .25, .5, .8, 1.],labels=labels).astype(float) than calculating each quantile seperately and binning.
          – Sandeep Kadapa
          20 hours ago













          Thanks Sandeep Kadapa . This code made it a lot easier.
          – Ranjan Mondal
          19 hours ago





          Thanks Sandeep Kadapa . This code made it a lot easier.
          – Ranjan Mondal
          19 hours ago











          Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.









           

          draft saved


          draft discarded


















          Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.












          Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.











          Ranjan Mondal is a new contributor. Be nice, and check out our Code of Conduct.













           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237197%2fhow-to-bin-data-in-data-frame-in-pandas%23new-answer', 'question_page');

          );

          Post as a guest














































































          這個網誌中的熱門文章

          Barbados

          How to read a connectionString WITH PROVIDER in .NET Core?

          Node.js Script on GitHub Pages or Amazon S3