Operating on histogram bins Python










3















I am trying to find the median of values within a bin range generated by the np.histrogram function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:



x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]


y values can have any sort of x value associated with them, for example:



hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185]
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
0.99999265]


So, I am trying to find the median y value of the 129 values in the first bin generated, etc.










share|improve this question
























  • I'm having a bit of trouble believing your histogram, but I understand your point.

    – Mad Physicist
    Nov 14 '18 at 4:25















3















I am trying to find the median of values within a bin range generated by the np.histrogram function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:



x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]


y values can have any sort of x value associated with them, for example:



hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185]
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
0.99999265]


So, I am trying to find the median y value of the 129 values in the first bin generated, etc.










share|improve this question
























  • I'm having a bit of trouble believing your histogram, but I understand your point.

    – Mad Physicist
    Nov 14 '18 at 4:25













3












3








3








I am trying to find the median of values within a bin range generated by the np.histrogram function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:



x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]


y values can have any sort of x value associated with them, for example:



hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185]
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
0.99999265]


So, I am trying to find the median y value of the 129 values in the first bin generated, etc.










share|improve this question
















I am trying to find the median of values within a bin range generated by the np.histrogram function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:



x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]


y values can have any sort of x value associated with them, for example:



hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185]
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
0.99999265]


So, I am trying to find the median y value of the 129 values in the first bin generated, etc.







python numpy histogram median






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 4:43









Mad Physicist

36.5k1671101




36.5k1671101










asked Nov 14 '18 at 3:14









hlku2334hlku2334

366




366












  • I'm having a bit of trouble believing your histogram, but I understand your point.

    – Mad Physicist
    Nov 14 '18 at 4:25

















  • I'm having a bit of trouble believing your histogram, but I understand your point.

    – Mad Physicist
    Nov 14 '18 at 4:25
















I'm having a bit of trouble believing your histogram, but I understand your point.

– Mad Physicist
Nov 14 '18 at 4:25





I'm having a bit of trouble believing your histogram, but I understand your point.

– Mad Physicist
Nov 14 '18 at 4:25












3 Answers
3






active

oldest

votes


















2














One way is with pandas.cut():



>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)

>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64


If you want to stay in NumPy, you might want to check out np.digitize().






share|improve this answer






























    0














    You can do this by slicing a sorted version of your data using the counts as indices:



    x = np.random.rand(1000)
    hist,bins = np.histogram(x)

    ix = [0] + hist.cumsum().tolist()
    # if don't mind sorting your original data, use x.sort() instead
    xsorted = np.sort(x)
    ix = [0] + hist.cumsum()
    [np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]


    which will out the medians as a standard Python list.






    share|improve this answer























    • Take a look at np.split

      – Mad Physicist
      Nov 14 '18 at 4:45


















    0














    np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).



    If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:



    x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
    h, b = np.histogram(x)
    ind = np.searchsorted(b, x, side='right')


    Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:



    m = [np.median(x[ind == label]) for label in range(b.size - 1)]


    If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:



    x.sort()
    sections = np.split(x, np.cumsum(h[:-1]))
    m = [np.median(arr) for arr in sections]





    share|improve this answer






















      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292680%2foperating-on-histogram-bins-python%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2














      One way is with pandas.cut():



      >>> import pandas as pd
      >>> import numpy as np
      >>> np.random.seed(444)

      >>> x = np.random.randint(0, 25, size=100)
      >>> _, bins = np.histogram(x)
      >>> pd.Series(x).groupby(pd.cut(x, bins)).median()
      (0.0, 2.4] 2.0
      (2.4, 4.8] 3.0
      (4.8, 7.2] 6.0
      (7.2, 9.6] 8.5
      (9.6, 12.0] 10.5
      (12.0, 14.4] 13.0
      (14.4, 16.8] 15.5
      (16.8, 19.2] 18.0
      (19.2, 21.6] 20.5
      (21.6, 24.0] 23.0
      dtype: float64


      If you want to stay in NumPy, you might want to check out np.digitize().






      share|improve this answer



























        2














        One way is with pandas.cut():



        >>> import pandas as pd
        >>> import numpy as np
        >>> np.random.seed(444)

        >>> x = np.random.randint(0, 25, size=100)
        >>> _, bins = np.histogram(x)
        >>> pd.Series(x).groupby(pd.cut(x, bins)).median()
        (0.0, 2.4] 2.0
        (2.4, 4.8] 3.0
        (4.8, 7.2] 6.0
        (7.2, 9.6] 8.5
        (9.6, 12.0] 10.5
        (12.0, 14.4] 13.0
        (14.4, 16.8] 15.5
        (16.8, 19.2] 18.0
        (19.2, 21.6] 20.5
        (21.6, 24.0] 23.0
        dtype: float64


        If you want to stay in NumPy, you might want to check out np.digitize().






        share|improve this answer

























          2












          2








          2







          One way is with pandas.cut():



          >>> import pandas as pd
          >>> import numpy as np
          >>> np.random.seed(444)

          >>> x = np.random.randint(0, 25, size=100)
          >>> _, bins = np.histogram(x)
          >>> pd.Series(x).groupby(pd.cut(x, bins)).median()
          (0.0, 2.4] 2.0
          (2.4, 4.8] 3.0
          (4.8, 7.2] 6.0
          (7.2, 9.6] 8.5
          (9.6, 12.0] 10.5
          (12.0, 14.4] 13.0
          (14.4, 16.8] 15.5
          (16.8, 19.2] 18.0
          (19.2, 21.6] 20.5
          (21.6, 24.0] 23.0
          dtype: float64


          If you want to stay in NumPy, you might want to check out np.digitize().






          share|improve this answer













          One way is with pandas.cut():



          >>> import pandas as pd
          >>> import numpy as np
          >>> np.random.seed(444)

          >>> x = np.random.randint(0, 25, size=100)
          >>> _, bins = np.histogram(x)
          >>> pd.Series(x).groupby(pd.cut(x, bins)).median()
          (0.0, 2.4] 2.0
          (2.4, 4.8] 3.0
          (4.8, 7.2] 6.0
          (7.2, 9.6] 8.5
          (9.6, 12.0] 10.5
          (12.0, 14.4] 13.0
          (14.4, 16.8] 15.5
          (16.8, 19.2] 18.0
          (19.2, 21.6] 20.5
          (21.6, 24.0] 23.0
          dtype: float64


          If you want to stay in NumPy, you might want to check out np.digitize().







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 14 '18 at 3:25









          Brad SolomonBrad Solomon

          13.7k73484




          13.7k73484























              0














              You can do this by slicing a sorted version of your data using the counts as indices:



              x = np.random.rand(1000)
              hist,bins = np.histogram(x)

              ix = [0] + hist.cumsum().tolist()
              # if don't mind sorting your original data, use x.sort() instead
              xsorted = np.sort(x)
              ix = [0] + hist.cumsum()
              [np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]


              which will out the medians as a standard Python list.






              share|improve this answer























              • Take a look at np.split

                – Mad Physicist
                Nov 14 '18 at 4:45















              0














              You can do this by slicing a sorted version of your data using the counts as indices:



              x = np.random.rand(1000)
              hist,bins = np.histogram(x)

              ix = [0] + hist.cumsum().tolist()
              # if don't mind sorting your original data, use x.sort() instead
              xsorted = np.sort(x)
              ix = [0] + hist.cumsum()
              [np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]


              which will out the medians as a standard Python list.






              share|improve this answer























              • Take a look at np.split

                – Mad Physicist
                Nov 14 '18 at 4:45













              0












              0








              0







              You can do this by slicing a sorted version of your data using the counts as indices:



              x = np.random.rand(1000)
              hist,bins = np.histogram(x)

              ix = [0] + hist.cumsum().tolist()
              # if don't mind sorting your original data, use x.sort() instead
              xsorted = np.sort(x)
              ix = [0] + hist.cumsum()
              [np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]


              which will out the medians as a standard Python list.






              share|improve this answer













              You can do this by slicing a sorted version of your data using the counts as indices:



              x = np.random.rand(1000)
              hist,bins = np.histogram(x)

              ix = [0] + hist.cumsum().tolist()
              # if don't mind sorting your original data, use x.sort() instead
              xsorted = np.sort(x)
              ix = [0] + hist.cumsum()
              [np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]


              which will out the medians as a standard Python list.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 14 '18 at 4:13









              teltel

              7,31121431




              7,31121431












              • Take a look at np.split

                – Mad Physicist
                Nov 14 '18 at 4:45

















              • Take a look at np.split

                – Mad Physicist
                Nov 14 '18 at 4:45
















              Take a look at np.split

              – Mad Physicist
              Nov 14 '18 at 4:45





              Take a look at np.split

              – Mad Physicist
              Nov 14 '18 at 4:45











              0














              np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).



              If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:



              x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
              h, b = np.histogram(x)
              ind = np.searchsorted(b, x, side='right')


              Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:



              m = [np.median(x[ind == label]) for label in range(b.size - 1)]


              If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:



              x.sort()
              sections = np.split(x, np.cumsum(h[:-1]))
              m = [np.median(arr) for arr in sections]





              share|improve this answer



























                0














                np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).



                If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:



                x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
                h, b = np.histogram(x)
                ind = np.searchsorted(b, x, side='right')


                Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:



                m = [np.median(x[ind == label]) for label in range(b.size - 1)]


                If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:



                x.sort()
                sections = np.split(x, np.cumsum(h[:-1]))
                m = [np.median(arr) for arr in sections]





                share|improve this answer

























                  0












                  0








                  0







                  np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).



                  If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:



                  x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
                  h, b = np.histogram(x)
                  ind = np.searchsorted(b, x, side='right')


                  Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:



                  m = [np.median(x[ind == label]) for label in range(b.size - 1)]


                  If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:



                  x.sort()
                  sections = np.split(x, np.cumsum(h[:-1]))
                  m = [np.median(arr) for arr in sections]





                  share|improve this answer













                  np.digitize and np.searchsorted will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).



                  If you look at the documentation of np.histogram (Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:



                  x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
                  h, b = np.histogram(x)
                  ind = np.searchsorted(b, x, side='right')


                  Now ind contains a label for each number indicating which bin it belongs to. You can compute medians:



                  m = [np.median(x[ind == label]) for label in range(b.size - 1)]


                  If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split is a good choice in this case:



                  x.sort()
                  sections = np.split(x, np.cumsum(h[:-1]))
                  m = [np.median(arr) for arr in sections]






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 14 '18 at 4:41









                  Mad PhysicistMad Physicist

                  36.5k1671101




                  36.5k1671101



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292680%2foperating-on-histogram-bins-python%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      這個網誌中的熱門文章

                      How to read a connectionString WITH PROVIDER in .NET Core?

                      In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

                      Museum of Modern and Contemporary Art of Trento and Rovereto