Python Numpy: Structured Arrays vs Same Datatype Array Operation Cost
I want to create an array of arrays of the structure:
[line_number,count,temperature,humidity,sensor1_on,sensor2_on]
Where the first two need to be uint32
, while temperature and humidity can be uint8
, and the sensor_on
s can be of type bool
.
I later need to sort the 2d array based on the combination of line_number
and then count. I also need to perform averages and other statistical computation on lists of all the temperature and humidity data (separately).
I found structured arrays which are convenient for data storage and retrieval:
np_data=np.zeros([num_lines],
dtype='uint32,'#Line No
'uint32,'# Count
'uint8,' #TEMP
'uint8,' #HUMID
'bool,' #S1 On
'bool'#S2 On
)
for this vs
np_data=np.zeros([num_lines,5],dtype='uint32')
# I would pack my bools into the last uint32 and then unpack later
# but it seems like a waste of space
Do I lose anything (numpy processing power, vectorized processing, sorting speed, etc) by creating the structured array vs the one with all the same data types? Is there another solution one would recommend?
python arrays numpy
add a comment |
I want to create an array of arrays of the structure:
[line_number,count,temperature,humidity,sensor1_on,sensor2_on]
Where the first two need to be uint32
, while temperature and humidity can be uint8
, and the sensor_on
s can be of type bool
.
I later need to sort the 2d array based on the combination of line_number
and then count. I also need to perform averages and other statistical computation on lists of all the temperature and humidity data (separately).
I found structured arrays which are convenient for data storage and retrieval:
np_data=np.zeros([num_lines],
dtype='uint32,'#Line No
'uint32,'# Count
'uint8,' #TEMP
'uint8,' #HUMID
'bool,' #S1 On
'bool'#S2 On
)
for this vs
np_data=np.zeros([num_lines,5],dtype='uint32')
# I would pack my bools into the last uint32 and then unpack later
# but it seems like a waste of space
Do I lose anything (numpy processing power, vectorized processing, sorting speed, etc) by creating the structured array vs the one with all the same data types? Is there another solution one would recommend?
python arrays numpy
I think you just need to do some timings on realistic data. We can make guesses from experience, but they'll be just that - guesses.
– hpaulj
Nov 15 '18 at 0:34
add a comment |
I want to create an array of arrays of the structure:
[line_number,count,temperature,humidity,sensor1_on,sensor2_on]
Where the first two need to be uint32
, while temperature and humidity can be uint8
, and the sensor_on
s can be of type bool
.
I later need to sort the 2d array based on the combination of line_number
and then count. I also need to perform averages and other statistical computation on lists of all the temperature and humidity data (separately).
I found structured arrays which are convenient for data storage and retrieval:
np_data=np.zeros([num_lines],
dtype='uint32,'#Line No
'uint32,'# Count
'uint8,' #TEMP
'uint8,' #HUMID
'bool,' #S1 On
'bool'#S2 On
)
for this vs
np_data=np.zeros([num_lines,5],dtype='uint32')
# I would pack my bools into the last uint32 and then unpack later
# but it seems like a waste of space
Do I lose anything (numpy processing power, vectorized processing, sorting speed, etc) by creating the structured array vs the one with all the same data types? Is there another solution one would recommend?
python arrays numpy
I want to create an array of arrays of the structure:
[line_number,count,temperature,humidity,sensor1_on,sensor2_on]
Where the first two need to be uint32
, while temperature and humidity can be uint8
, and the sensor_on
s can be of type bool
.
I later need to sort the 2d array based on the combination of line_number
and then count. I also need to perform averages and other statistical computation on lists of all the temperature and humidity data (separately).
I found structured arrays which are convenient for data storage and retrieval:
np_data=np.zeros([num_lines],
dtype='uint32,'#Line No
'uint32,'# Count
'uint8,' #TEMP
'uint8,' #HUMID
'bool,' #S1 On
'bool'#S2 On
)
for this vs
np_data=np.zeros([num_lines,5],dtype='uint32')
# I would pack my bools into the last uint32 and then unpack later
# but it seems like a waste of space
Do I lose anything (numpy processing power, vectorized processing, sorting speed, etc) by creating the structured array vs the one with all the same data types? Is there another solution one would recommend?
python arrays numpy
python arrays numpy
edited Nov 14 '18 at 23:47
Joel
1,5686719
1,5686719
asked Nov 14 '18 at 23:19
azazelspeaksazazelspeaks
2,4781616
2,4781616
I think you just need to do some timings on realistic data. We can make guesses from experience, but they'll be just that - guesses.
– hpaulj
Nov 15 '18 at 0:34
add a comment |
I think you just need to do some timings on realistic data. We can make guesses from experience, but they'll be just that - guesses.
– hpaulj
Nov 15 '18 at 0:34
I think you just need to do some timings on realistic data. We can make guesses from experience, but they'll be just that - guesses.
– hpaulj
Nov 15 '18 at 0:34
I think you just need to do some timings on realistic data. We can make guesses from experience, but they'll be just that - guesses.
– hpaulj
Nov 15 '18 at 0:34
add a comment |
1 Answer
1
active
oldest
votes
I did some performance testing on several array types. My test results are available as an answer at this topic:
is ndarray faster than recarray access?
(Ignore the downvote on my question. Apparently someone didn't like how I asked it.)
The short version: extracting data from a masked array was much slower than the same operation on a ndarray. Access times for a structured array and a recarray were slower than a ndarray, but all were fractions of a second. Clearly there is overhead when using masked arrays (maybe similar to a record array?). There is a good discussion of the differences between array types here:
numpy-discussion:structured-arrays-recarrays-and-record-arrays
There are other limitations. For example, many (most/all) of the numpy matrix and math operations are limited to ndarrays (require same data type). I don't think these apply to your case, since you are using the structured array like a table.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310226%2fpython-numpy-structured-arrays-vs-same-datatype-array-operation-cost%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I did some performance testing on several array types. My test results are available as an answer at this topic:
is ndarray faster than recarray access?
(Ignore the downvote on my question. Apparently someone didn't like how I asked it.)
The short version: extracting data from a masked array was much slower than the same operation on a ndarray. Access times for a structured array and a recarray were slower than a ndarray, but all were fractions of a second. Clearly there is overhead when using masked arrays (maybe similar to a record array?). There is a good discussion of the differences between array types here:
numpy-discussion:structured-arrays-recarrays-and-record-arrays
There are other limitations. For example, many (most/all) of the numpy matrix and math operations are limited to ndarrays (require same data type). I don't think these apply to your case, since you are using the structured array like a table.
add a comment |
I did some performance testing on several array types. My test results are available as an answer at this topic:
is ndarray faster than recarray access?
(Ignore the downvote on my question. Apparently someone didn't like how I asked it.)
The short version: extracting data from a masked array was much slower than the same operation on a ndarray. Access times for a structured array and a recarray were slower than a ndarray, but all were fractions of a second. Clearly there is overhead when using masked arrays (maybe similar to a record array?). There is a good discussion of the differences between array types here:
numpy-discussion:structured-arrays-recarrays-and-record-arrays
There are other limitations. For example, many (most/all) of the numpy matrix and math operations are limited to ndarrays (require same data type). I don't think these apply to your case, since you are using the structured array like a table.
add a comment |
I did some performance testing on several array types. My test results are available as an answer at this topic:
is ndarray faster than recarray access?
(Ignore the downvote on my question. Apparently someone didn't like how I asked it.)
The short version: extracting data from a masked array was much slower than the same operation on a ndarray. Access times for a structured array and a recarray were slower than a ndarray, but all were fractions of a second. Clearly there is overhead when using masked arrays (maybe similar to a record array?). There is a good discussion of the differences between array types here:
numpy-discussion:structured-arrays-recarrays-and-record-arrays
There are other limitations. For example, many (most/all) of the numpy matrix and math operations are limited to ndarrays (require same data type). I don't think these apply to your case, since you are using the structured array like a table.
I did some performance testing on several array types. My test results are available as an answer at this topic:
is ndarray faster than recarray access?
(Ignore the downvote on my question. Apparently someone didn't like how I asked it.)
The short version: extracting data from a masked array was much slower than the same operation on a ndarray. Access times for a structured array and a recarray were slower than a ndarray, but all were fractions of a second. Clearly there is overhead when using masked arrays (maybe similar to a record array?). There is a good discussion of the differences between array types here:
numpy-discussion:structured-arrays-recarrays-and-record-arrays
There are other limitations. For example, many (most/all) of the numpy matrix and math operations are limited to ndarrays (require same data type). I don't think these apply to your case, since you are using the structured array like a table.
edited Nov 15 '18 at 16:46
answered Nov 15 '18 at 15:13
kcw78kcw78
3451210
3451210
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310226%2fpython-numpy-structured-arrays-vs-same-datatype-array-operation-cost%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I think you just need to do some timings on realistic data. We can make guesses from experience, but they'll be just that - guesses.
– hpaulj
Nov 15 '18 at 0:34