Numpy view contiguous part of non-contiguous array as dtype of bigger size










2















I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:



# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')


Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:



tri = np.lib.stride_tricks.as_strided(a, (len(a)-2,3), a.strides*2)


This generates a trigram list with shape (2**28-2, 3) where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3) so that numpy displays it more "reasonably" (instead of individual chars).



tri = tri.view('S3')


It gives the exception:



ValueError: To change to a dtype of a different size, the array must be C-contiguous


I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.



So I'm wondering how to view contiguous part in non-contiguous np.ndarray as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape and stride freely with np.lib.stride_tricks.as_strided, but I can't force the dtype to be something, which is the problem here.



EDIT



Non-contiguous array can be made by simple slicing. For example:



np.empty((8, 4), 'uint32')[:, :2].view('uint64')


will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.










share|improve this question
























  • What about np.ascontiguousarray(tri).view('S3') ?

    – AndyK
    Nov 14 '18 at 9:44











  • @AndyK I believe OP wants to avoid the copy that this forces.

    – Paul Panzer
    Nov 14 '18 at 9:55











  • The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the [:,:2] case there are 2 elements, then a gap, 2 more elements, etc. Look at the flags. Evidently view isn't going the extra step of verifying that the 8 bytes it needs for each uint64 are contiguous.

    – hpaulj
    Nov 14 '18 at 17:43















2















I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:



# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')


Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:



tri = np.lib.stride_tricks.as_strided(a, (len(a)-2,3), a.strides*2)


This generates a trigram list with shape (2**28-2, 3) where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3) so that numpy displays it more "reasonably" (instead of individual chars).



tri = tri.view('S3')


It gives the exception:



ValueError: To change to a dtype of a different size, the array must be C-contiguous


I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.



So I'm wondering how to view contiguous part in non-contiguous np.ndarray as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape and stride freely with np.lib.stride_tricks.as_strided, but I can't force the dtype to be something, which is the problem here.



EDIT



Non-contiguous array can be made by simple slicing. For example:



np.empty((8, 4), 'uint32')[:, :2].view('uint64')


will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.










share|improve this question
























  • What about np.ascontiguousarray(tri).view('S3') ?

    – AndyK
    Nov 14 '18 at 9:44











  • @AndyK I believe OP wants to avoid the copy that this forces.

    – Paul Panzer
    Nov 14 '18 at 9:55











  • The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the [:,:2] case there are 2 elements, then a gap, 2 more elements, etc. Look at the flags. Evidently view isn't going the extra step of verifying that the 8 bytes it needs for each uint64 are contiguous.

    – hpaulj
    Nov 14 '18 at 17:43













2












2








2


1






I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:



# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')


Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:



tri = np.lib.stride_tricks.as_strided(a, (len(a)-2,3), a.strides*2)


This generates a trigram list with shape (2**28-2, 3) where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3) so that numpy displays it more "reasonably" (instead of individual chars).



tri = tri.view('S3')


It gives the exception:



ValueError: To change to a dtype of a different size, the array must be C-contiguous


I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.



So I'm wondering how to view contiguous part in non-contiguous np.ndarray as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape and stride freely with np.lib.stride_tricks.as_strided, but I can't force the dtype to be something, which is the problem here.



EDIT



Non-contiguous array can be made by simple slicing. For example:



np.empty((8, 4), 'uint32')[:, :2].view('uint64')


will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.










share|improve this question
















I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:



# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')


Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:



tri = np.lib.stride_tricks.as_strided(a, (len(a)-2,3), a.strides*2)


This generates a trigram list with shape (2**28-2, 3) where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3) so that numpy displays it more "reasonably" (instead of individual chars).



tri = tri.view('S3')


It gives the exception:



ValueError: To change to a dtype of a different size, the array must be C-contiguous


I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.



So I'm wondering how to view contiguous part in non-contiguous np.ndarray as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape and stride freely with np.lib.stride_tricks.as_strided, but I can't force the dtype to be something, which is the problem here.



EDIT



Non-contiguous array can be made by simple slicing. For example:



np.empty((8, 4), 'uint32')[:, :2].view('uint64')


will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.







python arrays numpy memory-layout






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 9:11







ZisIsNotZis

















asked Nov 14 '18 at 9:03









ZisIsNotZisZisIsNotZis

725619




725619












  • What about np.ascontiguousarray(tri).view('S3') ?

    – AndyK
    Nov 14 '18 at 9:44











  • @AndyK I believe OP wants to avoid the copy that this forces.

    – Paul Panzer
    Nov 14 '18 at 9:55











  • The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the [:,:2] case there are 2 elements, then a gap, 2 more elements, etc. Look at the flags. Evidently view isn't going the extra step of verifying that the 8 bytes it needs for each uint64 are contiguous.

    – hpaulj
    Nov 14 '18 at 17:43

















  • What about np.ascontiguousarray(tri).view('S3') ?

    – AndyK
    Nov 14 '18 at 9:44











  • @AndyK I believe OP wants to avoid the copy that this forces.

    – Paul Panzer
    Nov 14 '18 at 9:55











  • The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the [:,:2] case there are 2 elements, then a gap, 2 more elements, etc. Look at the flags. Evidently view isn't going the extra step of verifying that the 8 bytes it needs for each uint64 are contiguous.

    – hpaulj
    Nov 14 '18 at 17:43
















What about np.ascontiguousarray(tri).view('S3') ?

– AndyK
Nov 14 '18 at 9:44





What about np.ascontiguousarray(tri).view('S3') ?

– AndyK
Nov 14 '18 at 9:44













@AndyK I believe OP wants to avoid the copy that this forces.

– Paul Panzer
Nov 14 '18 at 9:55





@AndyK I believe OP wants to avoid the copy that this forces.

– Paul Panzer
Nov 14 '18 at 9:55













The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the [:,:2] case there are 2 elements, then a gap, 2 more elements, etc. Look at the flags. Evidently view isn't going the extra step of verifying that the 8 bytes it needs for each uint64 are contiguous.

– hpaulj
Nov 14 '18 at 17:43





The databuffer for any array is contiguous - one long low level array of bytes. But a view of that buffer might not be 'C' contiguous. In the [:,:2] case there are 2 elements, then a gap, 2 more elements, etc. Look at the flags. Evidently view isn't going the extra step of verifying that the 8 bytes it needs for each uint64 are contiguous.

– hpaulj
Nov 14 '18 at 17:43












1 Answer
1






active

oldest

votes


















3














If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.



For example your trigrams can be obtained like so:



>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')


In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided does not do many checks we are essentially free to do whatever we like.



It seems we can always get such a stub by slicing to a size 0 array. For your second example:



>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)





share|improve this answer

























  • That's interesting, although quite difficult to understand why it works. +1

    – AndyK
    Nov 14 '18 at 10:06












  • viewing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!

    – ZisIsNotZis
    Nov 15 '18 at 1:34










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296394%2fnumpy-view-contiguous-part-of-non-contiguous-array-as-dtype-of-bigger-size%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.



For example your trigrams can be obtained like so:



>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')


In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided does not do many checks we are essentially free to do whatever we like.



It seems we can always get such a stub by slicing to a size 0 array. For your second example:



>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)





share|improve this answer

























  • That's interesting, although quite difficult to understand why it works. +1

    – AndyK
    Nov 14 '18 at 10:06












  • viewing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!

    – ZisIsNotZis
    Nov 15 '18 at 1:34















3














If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.



For example your trigrams can be obtained like so:



>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')


In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided does not do many checks we are essentially free to do whatever we like.



It seems we can always get such a stub by slicing to a size 0 array. For your second example:



>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)





share|improve this answer

























  • That's interesting, although quite difficult to understand why it works. +1

    – AndyK
    Nov 14 '18 at 10:06












  • viewing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!

    – ZisIsNotZis
    Nov 15 '18 at 1:34













3












3








3







If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.



For example your trigrams can be obtained like so:



>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')


In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided does not do many checks we are essentially free to do whatever we like.



It seems we can always get such a stub by slicing to a size 0 array. For your second example:



>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)





share|improve this answer















If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.



For example your trigrams can be obtained like so:



>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'xf2', b'xf7', ..., b'xf4', b'xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')xf2xf7', b'xf2xf7x14', b'xf7x14x1b', ...,
b'xc9x14xf4', b'x14xf4xf1', b'xf4xf1z'], dtype='|S3')


In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided does not do many checks we are essentially free to do whatever we like.



It seems we can always get such a stub by slicing to a size 0 array. For your second example:



>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
[ 32],
[ 32083728],
[ 31978800],
[ 0],
[ 29686448],
[ 32],
[ 32362720]], dtype=uint64)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 14 '18 at 10:02

























answered Nov 14 '18 at 9:45









Paul PanzerPaul Panzer

30.1k21240




30.1k21240












  • That's interesting, although quite difficult to understand why it works. +1

    – AndyK
    Nov 14 '18 at 10:06












  • viewing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!

    – ZisIsNotZis
    Nov 15 '18 at 1:34

















  • That's interesting, although quite difficult to understand why it works. +1

    – AndyK
    Nov 14 '18 at 10:06












  • viewing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!

    – ZisIsNotZis
    Nov 15 '18 at 1:34
















That's interesting, although quite difficult to understand why it works. +1

– AndyK
Nov 14 '18 at 10:06






That's interesting, although quite difficult to understand why it works. +1

– AndyK
Nov 14 '18 at 10:06














viewing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!

– ZisIsNotZis
Nov 15 '18 at 1:34





viewing a size-zero array is interesting! I was thinking about somehow create a correct-dtype array (like size-one array from viewing bytes), but size-zero view is definitely more useful!

– ZisIsNotZis
Nov 15 '18 at 1:34



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296394%2fnumpy-view-contiguous-part-of-non-contiguous-array-as-dtype-of-bigger-size%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

How to read a connectionString WITH PROVIDER in .NET Core?

Museum of Modern and Contemporary Art of Trento and Rovereto

In R, how to develop a multiplot heatmap.2 figure showing key labels successfully