Workaround for python MemoryError









up vote
2
down vote

favorite












How can I change this function to make it more efficient? I keep getting MemoryError



def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results


I call the function here:



x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)


Train and Test data are IMDB dataset for sentiment analysis, i.e.



(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.



Here is the Traceback:



Traceback (most recent call last):

File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError









share|improve this question



















  • 1




    Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
    – John Zwinck
    Nov 11 at 14:27







  • 1




    Basically you have two options: use less memory or make more memory available.
    – Klaus D.
    Nov 11 at 14:54










  • @JohnZwinck I have edited the question accordingly. Thanks
    – BlueMango
    Nov 11 at 15:00














up vote
2
down vote

favorite












How can I change this function to make it more efficient? I keep getting MemoryError



def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results


I call the function here:



x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)


Train and Test data are IMDB dataset for sentiment analysis, i.e.



(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.



Here is the Traceback:



Traceback (most recent call last):

File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError









share|improve this question



















  • 1




    Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
    – John Zwinck
    Nov 11 at 14:27







  • 1




    Basically you have two options: use less memory or make more memory available.
    – Klaus D.
    Nov 11 at 14:54










  • @JohnZwinck I have edited the question accordingly. Thanks
    – BlueMango
    Nov 11 at 15:00












up vote
2
down vote

favorite









up vote
2
down vote

favorite











How can I change this function to make it more efficient? I keep getting MemoryError



def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results


I call the function here:



x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)


Train and Test data are IMDB dataset for sentiment analysis, i.e.



(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.



Here is the Traceback:



Traceback (most recent call last):

File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError









share|improve this question















How can I change this function to make it more efficient? I keep getting MemoryError



def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results


I call the function here:



x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)


Train and Test data are IMDB dataset for sentiment analysis, i.e.



(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)


EDIT: I am running this on 64 bit Ubuntu system with 4 GB RAM.



Here is the Traceback:



Traceback (most recent call last):

File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 29, in <module>
x_test = vectorize_sequences(test_data)
File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 20, in vectorize_sequences
results = np.zeros((len(sequences), dimension))
MemoryError






python keras sentiment-analysis






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 14:59

























asked Nov 11 at 14:20









BlueMango

205




205







  • 1




    Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
    – John Zwinck
    Nov 11 at 14:27







  • 1




    Basically you have two options: use less memory or make more memory available.
    – Klaus D.
    Nov 11 at 14:54










  • @JohnZwinck I have edited the question accordingly. Thanks
    – BlueMango
    Nov 11 at 15:00












  • 1




    Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
    – John Zwinck
    Nov 11 at 14:27







  • 1




    Basically you have two options: use less memory or make more memory available.
    – Klaus D.
    Nov 11 at 14:54










  • @JohnZwinck I have edited the question accordingly. Thanks
    – BlueMango
    Nov 11 at 15:00







1




1




Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27





Looks like 2x 763 MB of data which is not gigantic. Please post the full error message including the traceback showing the line where it happened. Please also post the details of the hardware and OS where you're running this.
– John Zwinck
Nov 11 at 14:27





1




1




Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54




Basically you have two options: use less memory or make more memory available.
– Klaus D.
Nov 11 at 14:54












@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00




@JohnZwinck I have edited the question accordingly. Thanks
– BlueMango
Nov 11 at 15:00












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



If you use float32 it will cut the memory usage in half:



np.zeros((len(sequences), dimension), dtype=np.float32)


Or if you only care about 0 and 1, this will cut it by 88%:



np.zeros((len(sequences), dimension), dtype=np.int8)





share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249636%2fworkaround-for-python-memoryerror%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



    If you use float32 it will cut the memory usage in half:



    np.zeros((len(sequences), dimension), dtype=np.float32)


    Or if you only care about 0 and 1, this will cut it by 88%:



    np.zeros((len(sequences), dimension), dtype=np.int8)





    share|improve this answer
























      up vote
      1
      down vote



      accepted










      Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



      If you use float32 it will cut the memory usage in half:



      np.zeros((len(sequences), dimension), dtype=np.float32)


      Or if you only care about 0 and 1, this will cut it by 88%:



      np.zeros((len(sequences), dimension), dtype=np.int8)





      share|improve this answer






















        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



        If you use float32 it will cut the memory usage in half:



        np.zeros((len(sequences), dimension), dtype=np.float32)


        Or if you only care about 0 and 1, this will cut it by 88%:



        np.zeros((len(sequences), dimension), dtype=np.int8)





        share|improve this answer












        Your array appears to be 10k x 10k which is 100 million elements of 64 bits each (because the default dtype is float64). So that's 800 million bytes, aka 763 megabytes.



        If you use float32 it will cut the memory usage in half:



        np.zeros((len(sequences), dimension), dtype=np.float32)


        Or if you only care about 0 and 1, this will cut it by 88%:



        np.zeros((len(sequences), dimension), dtype=np.int8)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 12 at 4:34









        John Zwinck

        149k16175286




        149k16175286



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249636%2fworkaround-for-python-memoryerror%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            How to read a connectionString WITH PROVIDER in .NET Core?

            Node.js Script on GitHub Pages or Amazon S3

            Museum of Modern and Contemporary Art of Trento and Rovereto