Operating on histogram bins Python
I am trying to find the median of values within a bin range generated by the np.histrogram
function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:
x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
y values can have any sort of x value associated with them, for example:
hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185]
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
0.99999265]
So, I am trying to find the median y value of the 129 values in the first bin generated, etc.
python numpy histogram median
add a comment |
I am trying to find the median of values within a bin range generated by the np.histrogram
function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:
x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
y values can have any sort of x value associated with them, for example:
hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185]
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
0.99999265]
So, I am trying to find the median y value of the 129 values in the first bin generated, etc.
python numpy histogram median
I'm having a bit of trouble believing your histogram, but I understand your point.
– Mad Physicist
Nov 14 '18 at 4:25
add a comment |
I am trying to find the median of values within a bin range generated by the np.histrogram
function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:
x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
y values can have any sort of x value associated with them, for example:
hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185]
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
0.99999265]
So, I am trying to find the median y value of the 129 values in the first bin generated, etc.
python numpy histogram median
I am trying to find the median of values within a bin range generated by the np.histrogram
function. How would I select the values only within the bin range and operate on those specific values? Below is an example of my data and what I am trying to do:
x = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
y values can have any sort of x value associated with them, for example:
hist, bins = np.histogram(x)
hist = [129, 126, 94, 133, 179, 206, 142, 147, 90, 185]
bins = [0., 0.09999926, 0.19999853, 0.29999779, 0.39999706,
0.49999632, 0.59999559, 0.69999485, 0.79999412, 0.8999933,
0.99999265]
So, I am trying to find the median y value of the 129 values in the first bin generated, etc.
python numpy histogram median
python numpy histogram median
edited Nov 14 '18 at 4:43
Mad Physicist
36.5k1671101
36.5k1671101
asked Nov 14 '18 at 3:14
hlku2334hlku2334
366
366
I'm having a bit of trouble believing your histogram, but I understand your point.
– Mad Physicist
Nov 14 '18 at 4:25
add a comment |
I'm having a bit of trouble believing your histogram, but I understand your point.
– Mad Physicist
Nov 14 '18 at 4:25
I'm having a bit of trouble believing your histogram, but I understand your point.
– Mad Physicist
Nov 14 '18 at 4:25
I'm having a bit of trouble believing your histogram, but I understand your point.
– Mad Physicist
Nov 14 '18 at 4:25
add a comment |
3 Answers
3
active
oldest
votes
One way is with pandas.cut()
:
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)
>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64
If you want to stay in NumPy, you might want to check out np.digitize()
.
add a comment |
You can do this by slicing a sorted version of your data using the counts as indices:
x = np.random.rand(1000)
hist,bins = np.histogram(x)
ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]
which will out the medians as a standard Python list.
Take a look at np.split
– Mad Physicist
Nov 14 '18 at 4:45
add a comment |
np.digitize
and np.searchsorted
will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).
If you look at the documentation of np.histogram
(Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:
x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')
Now ind
contains a label for each number indicating which bin it belongs to. You can compute medians:
m = [np.median(x[ind == label]) for label in range(b.size - 1)]
If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split
is a good choice in this case:
x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292680%2foperating-on-histogram-bins-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
One way is with pandas.cut()
:
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)
>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64
If you want to stay in NumPy, you might want to check out np.digitize()
.
add a comment |
One way is with pandas.cut()
:
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)
>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64
If you want to stay in NumPy, you might want to check out np.digitize()
.
add a comment |
One way is with pandas.cut()
:
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)
>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64
If you want to stay in NumPy, you might want to check out np.digitize()
.
One way is with pandas.cut()
:
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)
>>> x = np.random.randint(0, 25, size=100)
>>> _, bins = np.histogram(x)
>>> pd.Series(x).groupby(pd.cut(x, bins)).median()
(0.0, 2.4] 2.0
(2.4, 4.8] 3.0
(4.8, 7.2] 6.0
(7.2, 9.6] 8.5
(9.6, 12.0] 10.5
(12.0, 14.4] 13.0
(14.4, 16.8] 15.5
(16.8, 19.2] 18.0
(19.2, 21.6] 20.5
(21.6, 24.0] 23.0
dtype: float64
If you want to stay in NumPy, you might want to check out np.digitize()
.
answered Nov 14 '18 at 3:25
Brad SolomonBrad Solomon
13.7k73484
13.7k73484
add a comment |
add a comment |
You can do this by slicing a sorted version of your data using the counts as indices:
x = np.random.rand(1000)
hist,bins = np.histogram(x)
ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]
which will out the medians as a standard Python list.
Take a look at np.split
– Mad Physicist
Nov 14 '18 at 4:45
add a comment |
You can do this by slicing a sorted version of your data using the counts as indices:
x = np.random.rand(1000)
hist,bins = np.histogram(x)
ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]
which will out the medians as a standard Python list.
Take a look at np.split
– Mad Physicist
Nov 14 '18 at 4:45
add a comment |
You can do this by slicing a sorted version of your data using the counts as indices:
x = np.random.rand(1000)
hist,bins = np.histogram(x)
ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]
which will out the medians as a standard Python list.
You can do this by slicing a sorted version of your data using the counts as indices:
x = np.random.rand(1000)
hist,bins = np.histogram(x)
ix = [0] + hist.cumsum().tolist()
# if don't mind sorting your original data, use x.sort() instead
xsorted = np.sort(x)
ix = [0] + hist.cumsum()
[np.median(x[i:j]) for i,j in zip(ix[:-1], ix[1:])]
which will out the medians as a standard Python list.
answered Nov 14 '18 at 4:13
teltel
7,31121431
7,31121431
Take a look at np.split
– Mad Physicist
Nov 14 '18 at 4:45
add a comment |
Take a look at np.split
– Mad Physicist
Nov 14 '18 at 4:45
Take a look at np.split
– Mad Physicist
Nov 14 '18 at 4:45
Take a look at np.split
– Mad Physicist
Nov 14 '18 at 4:45
add a comment |
np.digitize
and np.searchsorted
will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).
If you look at the documentation of np.histogram
(Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:
x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')
Now ind
contains a label for each number indicating which bin it belongs to. You can compute medians:
m = [np.median(x[ind == label]) for label in range(b.size - 1)]
If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split
is a good choice in this case:
x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]
add a comment |
np.digitize
and np.searchsorted
will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).
If you look at the documentation of np.histogram
(Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:
x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')
Now ind
contains a label for each number indicating which bin it belongs to. You can compute medians:
m = [np.median(x[ind == label]) for label in range(b.size - 1)]
If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split
is a good choice in this case:
x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]
add a comment |
np.digitize
and np.searchsorted
will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).
If you look at the documentation of np.histogram
(Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:
x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')
Now ind
contains a label for each number indicating which bin it belongs to. You can compute medians:
m = [np.median(x[ind == label]) for label in range(b.size - 1)]
If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split
is a good choice in this case:
x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]
np.digitize
and np.searchsorted
will match your data with bins. The latter is preferable in this situation because it does fewer unnecessary checks (your bins can safely be assumed to be sorted).
If you look at the documentation of np.histogram
(Notes section), you will notice that the bins are all half-open on the right (except the last one). This means that you can do the following:
x = np.abs(np.random.normal(loc=0.75, scale=0.75, size=10000))
h, b = np.histogram(x)
ind = np.searchsorted(b, x, side='right')
Now ind
contains a label for each number indicating which bin it belongs to. You can compute medians:
m = [np.median(x[ind == label]) for label in range(b.size - 1)]
If you are able to sort the input data, your job becomes easier because you can use views instead of extracting the data for each bin using masking. np.split
is a good choice in this case:
x.sort()
sections = np.split(x, np.cumsum(h[:-1]))
m = [np.median(arr) for arr in sections]
answered Nov 14 '18 at 4:41
Mad PhysicistMad Physicist
36.5k1671101
36.5k1671101
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292680%2foperating-on-histogram-bins-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I'm having a bit of trouble believing your histogram, but I understand your point.
– Mad Physicist
Nov 14 '18 at 4:25