Slow simple update query on PostgreSQL database with 3 million rows










26















I am trying a simple UPDATE table SET column1 = 0 on a table with ~3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min. now in my last attempt.



Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help.



Any other ideas?



Thanks,
Ricardo



Update:



This is the table structure:



CREATE TABLE myTable
(
id bigserial NOT NULL,
title text,
description text,
link text,
"type" character varying(255),
generalFreq real,
generalWeight real,
author_id bigint,
status_id bigint,
CONSTRAINT resources_pkey PRIMARY KEY (id),
CONSTRAINT author_pkey FOREIGN KEY (author_id)
REFERENCES users (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT c_unique_status_id UNIQUE (status_id)
);


I am trying to run UPDATE myTable SET generalFreq = 0;










share|improve this question
























  • It would help to see the full table definition and the full query you are running. Also, the output of "explain analyze <query>" would be very helpful.

    – kasperjj
    Jul 29 '10 at 10:07











  • Added the table structure. The command explain analyze UPDATE myTable SET generalFreq = 0; also take a very long time to complete.

    – Ricardo Lage
    Jul 29 '10 at 10:33











  • do you by any chance have an index on generalFreq?

    – kasperjj
    Jul 29 '10 at 11:01











  • Oh.. and sorry, my mistake... you should run just explain, not explain analyze. That should return almost instantly.

    – kasperjj
    Jul 29 '10 at 11:04











  • ok, the explain returns the following: "Seq Scan on myTable (cost=0.00..181915.37 rows=5156537 width=1287)" What does it mean?

    – Ricardo Lage
    Jul 29 '10 at 11:20















26















I am trying a simple UPDATE table SET column1 = 0 on a table with ~3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min. now in my last attempt.



Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help.



Any other ideas?



Thanks,
Ricardo



Update:



This is the table structure:



CREATE TABLE myTable
(
id bigserial NOT NULL,
title text,
description text,
link text,
"type" character varying(255),
generalFreq real,
generalWeight real,
author_id bigint,
status_id bigint,
CONSTRAINT resources_pkey PRIMARY KEY (id),
CONSTRAINT author_pkey FOREIGN KEY (author_id)
REFERENCES users (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT c_unique_status_id UNIQUE (status_id)
);


I am trying to run UPDATE myTable SET generalFreq = 0;










share|improve this question
























  • It would help to see the full table definition and the full query you are running. Also, the output of "explain analyze <query>" would be very helpful.

    – kasperjj
    Jul 29 '10 at 10:07











  • Added the table structure. The command explain analyze UPDATE myTable SET generalFreq = 0; also take a very long time to complete.

    – Ricardo Lage
    Jul 29 '10 at 10:33











  • do you by any chance have an index on generalFreq?

    – kasperjj
    Jul 29 '10 at 11:01











  • Oh.. and sorry, my mistake... you should run just explain, not explain analyze. That should return almost instantly.

    – kasperjj
    Jul 29 '10 at 11:04











  • ok, the explain returns the following: "Seq Scan on myTable (cost=0.00..181915.37 rows=5156537 width=1287)" What does it mean?

    – Ricardo Lage
    Jul 29 '10 at 11:20













26












26








26


19






I am trying a simple UPDATE table SET column1 = 0 on a table with ~3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min. now in my last attempt.



Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help.



Any other ideas?



Thanks,
Ricardo



Update:



This is the table structure:



CREATE TABLE myTable
(
id bigserial NOT NULL,
title text,
description text,
link text,
"type" character varying(255),
generalFreq real,
generalWeight real,
author_id bigint,
status_id bigint,
CONSTRAINT resources_pkey PRIMARY KEY (id),
CONSTRAINT author_pkey FOREIGN KEY (author_id)
REFERENCES users (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT c_unique_status_id UNIQUE (status_id)
);


I am trying to run UPDATE myTable SET generalFreq = 0;










share|improve this question
















I am trying a simple UPDATE table SET column1 = 0 on a table with ~3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min. now in my last attempt.



Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help.



Any other ideas?



Thanks,
Ricardo



Update:



This is the table structure:



CREATE TABLE myTable
(
id bigserial NOT NULL,
title text,
description text,
link text,
"type" character varying(255),
generalFreq real,
generalWeight real,
author_id bigint,
status_id bigint,
CONSTRAINT resources_pkey PRIMARY KEY (id),
CONSTRAINT author_pkey FOREIGN KEY (author_id)
REFERENCES users (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT c_unique_status_id UNIQUE (status_id)
);


I am trying to run UPDATE myTable SET generalFreq = 0;







sql postgresql sql-update






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 1 '12 at 16:15









Juan Mellado

14.1k54051




14.1k54051










asked Jul 29 '10 at 10:01









Ricardo LageRicardo Lage

5,37152855




5,37152855












  • It would help to see the full table definition and the full query you are running. Also, the output of "explain analyze <query>" would be very helpful.

    – kasperjj
    Jul 29 '10 at 10:07











  • Added the table structure. The command explain analyze UPDATE myTable SET generalFreq = 0; also take a very long time to complete.

    – Ricardo Lage
    Jul 29 '10 at 10:33











  • do you by any chance have an index on generalFreq?

    – kasperjj
    Jul 29 '10 at 11:01











  • Oh.. and sorry, my mistake... you should run just explain, not explain analyze. That should return almost instantly.

    – kasperjj
    Jul 29 '10 at 11:04











  • ok, the explain returns the following: "Seq Scan on myTable (cost=0.00..181915.37 rows=5156537 width=1287)" What does it mean?

    – Ricardo Lage
    Jul 29 '10 at 11:20

















  • It would help to see the full table definition and the full query you are running. Also, the output of "explain analyze <query>" would be very helpful.

    – kasperjj
    Jul 29 '10 at 10:07











  • Added the table structure. The command explain analyze UPDATE myTable SET generalFreq = 0; also take a very long time to complete.

    – Ricardo Lage
    Jul 29 '10 at 10:33











  • do you by any chance have an index on generalFreq?

    – kasperjj
    Jul 29 '10 at 11:01











  • Oh.. and sorry, my mistake... you should run just explain, not explain analyze. That should return almost instantly.

    – kasperjj
    Jul 29 '10 at 11:04











  • ok, the explain returns the following: "Seq Scan on myTable (cost=0.00..181915.37 rows=5156537 width=1287)" What does it mean?

    – Ricardo Lage
    Jul 29 '10 at 11:20
















It would help to see the full table definition and the full query you are running. Also, the output of "explain analyze <query>" would be very helpful.

– kasperjj
Jul 29 '10 at 10:07





It would help to see the full table definition and the full query you are running. Also, the output of "explain analyze <query>" would be very helpful.

– kasperjj
Jul 29 '10 at 10:07













Added the table structure. The command explain analyze UPDATE myTable SET generalFreq = 0; also take a very long time to complete.

– Ricardo Lage
Jul 29 '10 at 10:33





Added the table structure. The command explain analyze UPDATE myTable SET generalFreq = 0; also take a very long time to complete.

– Ricardo Lage
Jul 29 '10 at 10:33













do you by any chance have an index on generalFreq?

– kasperjj
Jul 29 '10 at 11:01





do you by any chance have an index on generalFreq?

– kasperjj
Jul 29 '10 at 11:01













Oh.. and sorry, my mistake... you should run just explain, not explain analyze. That should return almost instantly.

– kasperjj
Jul 29 '10 at 11:04





Oh.. and sorry, my mistake... you should run just explain, not explain analyze. That should return almost instantly.

– kasperjj
Jul 29 '10 at 11:04













ok, the explain returns the following: "Seq Scan on myTable (cost=0.00..181915.37 rows=5156537 width=1287)" What does it mean?

– Ricardo Lage
Jul 29 '10 at 11:20





ok, the explain returns the following: "Seq Scan on myTable (cost=0.00..181915.37 rows=5156537 width=1287)" What does it mean?

– Ricardo Lage
Jul 29 '10 at 11:20












9 Answers
9






active

oldest

votes


















13














Take a look at this answer: PostgreSQL slow on a large table with arrays and lots of updates



First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:



SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';


HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article.



Ps. You need version 8.3 or better.






share|improve this answer

























  • Thanks! This clears things up.

    – Ricardo Lage
    Aug 4 '10 at 12:34


















28














I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%).
My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.



  1. Transactions of bunch of "UPDATE myTable SET myField=value WHERE
    myId=id"
    Gives 1,500 updates/sec. which means each run would
    take at least 18 hours.

  2. HOT updates solution as described here with FILLFACTOR=50. Gives
    1,600 updates/sec. I uses SSD's so it's a costly improvement as it
    doubles the storage size.

  3. Insert in a temporary table of updated value and merge them after
    with UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUM
    for each partition; 100,000 up/s otherwise. Cooool.
    Here is the
    sequence of operations:


CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated,
CONSTRAINT tempTable_pkey PRIMARY KEY (id));


Accumulate a bunch of updates in a buffer depending of available RAM
When it's filled, or need to change of table/partition, or completed:



COPY tempTable FROM buffer;
UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id;
COMMIT;
TRUNCATE TABLE tempTable;
VACUUM FULL ANALYZE myTable;


That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included.






share|improve this answer


















  • 3





    I faced a similar problem today and by doing the change you describe (accumulate changes in temp table, then update from it), we saw a 100x increase in performance. 2.5 hours to about 2 minutes.

    – Ron Dahlgren
    Oct 25 '15 at 1:41











  • Sorry - i am not quite getting the buffer part. Could you add more info on how to "Accumulate a bunch of updates in a buffer"?

    – n1000
    Apr 6 '16 at 7:04











  • @n1000 The buffer could be a CSV file or a StringIO object in Python. Each line of buffer will contains the same data structure than your "tempTable" to let the Postgresql COPY command works.

    – Le Droid
    Apr 12 '16 at 21:37











  • Thank you man. I tried this solution and it was supposed to cost nearly 30 hours and now it only need 77.1 s for the update command. Cheers!

    – whyisyoung
    Jan 6 '17 at 1:50











  • Did you verify that it was using HOT? You have to be careful because the modified column can't be a part of any index FWIW... :) See also blog.codacy.com/…

    – rogerdpack
    Nov 23 '17 at 20:36


















7














After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:



CREATE TABLE table2 AS 
SELECT
all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdate
from myTable


Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :)



Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.






share|improve this answer




















  • 4





    Postgresql's MVCC implementation makes updates expensive. If you're updating every row in the table, each row needs to be copied as a new version, and the old version marked as deleted. So it's not surprising that rewriting the table is faster (which is what altering the type of a column does automatically, for instance). not much you can do about it, just a performance characteristic to be aware of.

    – araqnid
    Jul 29 '10 at 17:40











  • Thanks for the explanation, araqnid. I didn't know postgresql did implement updates like that.

    – Ricardo Lage
    Jul 29 '10 at 18:43


















2














Today I've spent many hours with similar issue. I've found a solution: drop all the constraints/indices before the update. No matter whether the column being updated is indexed or not, it seems like psql updates all the indices for all the updated rows. After the update is finished, add the constraints/indices back.






share|improve this answer

























  • If your data is "sparse" (lots of churn/free space within blocks) and the column being updated is not part of an index, then Postgres can use HOT and speed up by not needing to update indices. So this might help that way. See also dba.stackexchange.com/questions/15720/… ... Maybe it helps because an update in Postgres is equivalent to DELETE + INSERT and so it has to update all indexes for that row? Or maybe Postgres does bizarre things like rewriting full blocks worth of the index ("your personal copy of the index") until you commit :

    – rogerdpack
    Nov 24 '17 at 17:23


















2














Try this (note that generalFreq starts as type REAL, and stays the same):



ALTER TABLE myTable ALTER COLUMN generalFreq TYPE REAL USING 0;


This will rewrite the table, similar to a DROP + CREATE, and rebuild all indices. But all in one command. Much faster (about 2x) and you don't have to deal with dependencies and recreating indexes and other stuff, though it does lock the table (access exclusive--i.e. full lock) for the duration. Or maybe that's what you want if you want everything else to queue up behind it. If you aren't updating "too many" rows this way is slower than just an update.






share|improve this answer
































    0














    How are you running it? If you are looping each row and performing an update statement, you are running potentially millions of individual updates which is why it will perform incredibly slowly.



    If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records.






    share|improve this answer


















    • 1





      Hi Tom, thanks. I am running a single update from the psql command line. I understand 3 million is a lot but in my experience with other databases, it shouldn't take more than 10 min. to run a single update in one numeric column.

      – Ricardo Lage
      Jul 29 '10 at 11:00











    • I wouldn't of expected it to take so long either, especially with a constant assignment (setting all fields to 0), memory wise this should be pretty fast for a DB to handle. I've only limited experience with Postgres, but you could try doing it in batches of 100k and timing it to see how long you can expect the 3 million to run, it might just be the case Postgres isn't very good at this unusual operation.

      – Tom Gullen
      Jul 29 '10 at 11:06


















    0














    The first thing I'd suggest (from https://dba.stackexchange.com/questions/118178/does-updating-a-row-with-the-same-value-actually-update-the-row) is to only update rows that "need" it, ex:



     UPDATE myTable SET generalFreq = 0 where generalFreq != 0;


    (might also need an index on generalFreq). Then you'll update fewer rows. Though not if the values are all non zero already, but updating fewer rows "can help" since otherwise it updates them and all indexes regardless of whether the value changed or not.



    Another option: if the stars align in terms of defaults and not-null constraints, you can drop the old column and create another by just adjusting metadata, instant time.






    share|improve this answer
































      0














      In my tests I noticed that a big update, more than 200 000 rows, is slower than 2 updates of 100 000 rows, even with a temporary table.



      My solution is to loop, in each loop create a temporary table of 200 000 rows, in this table I compute my values, then update my main table with the new values aso...



      Every 2 000 000 rows, I manually "VACUUM ANALYSE mytable", I noticed that the auto vacuum doesn't do its job for such updates.






      share|improve this answer






























        -2














        try



        UPDATE myTable SET generalFreq = 0.0;


        Maybe it is a casting issue






        share|improve this answer
























          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f3361291%2fslow-simple-update-query-on-postgresql-database-with-3-million-rows%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          9 Answers
          9






          active

          oldest

          votes








          9 Answers
          9






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          13














          Take a look at this answer: PostgreSQL slow on a large table with arrays and lots of updates



          First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:



          SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';


          HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article.



          Ps. You need version 8.3 or better.






          share|improve this answer

























          • Thanks! This clears things up.

            – Ricardo Lage
            Aug 4 '10 at 12:34















          13














          Take a look at this answer: PostgreSQL slow on a large table with arrays and lots of updates



          First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:



          SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';


          HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article.



          Ps. You need version 8.3 or better.






          share|improve this answer

























          • Thanks! This clears things up.

            – Ricardo Lage
            Aug 4 '10 at 12:34













          13












          13








          13







          Take a look at this answer: PostgreSQL slow on a large table with arrays and lots of updates



          First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:



          SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';


          HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article.



          Ps. You need version 8.3 or better.






          share|improve this answer















          Take a look at this answer: PostgreSQL slow on a large table with arrays and lots of updates



          First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:



          SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';


          HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article.



          Ps. You need version 8.3 or better.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 23 '17 at 20:09









          rogerdpack

          35.1k17136256




          35.1k17136256










          answered Jul 30 '10 at 8:55









          Frank HeikensFrank Heikens

          73.8k22107113




          73.8k22107113












          • Thanks! This clears things up.

            – Ricardo Lage
            Aug 4 '10 at 12:34

















          • Thanks! This clears things up.

            – Ricardo Lage
            Aug 4 '10 at 12:34
















          Thanks! This clears things up.

          – Ricardo Lage
          Aug 4 '10 at 12:34





          Thanks! This clears things up.

          – Ricardo Lage
          Aug 4 '10 at 12:34













          28














          I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%).
          My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.



          1. Transactions of bunch of "UPDATE myTable SET myField=value WHERE
            myId=id"
            Gives 1,500 updates/sec. which means each run would
            take at least 18 hours.

          2. HOT updates solution as described here with FILLFACTOR=50. Gives
            1,600 updates/sec. I uses SSD's so it's a costly improvement as it
            doubles the storage size.

          3. Insert in a temporary table of updated value and merge them after
            with UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUM
            for each partition; 100,000 up/s otherwise. Cooool.
            Here is the
            sequence of operations:


          CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated,
          CONSTRAINT tempTable_pkey PRIMARY KEY (id));


          Accumulate a bunch of updates in a buffer depending of available RAM
          When it's filled, or need to change of table/partition, or completed:



          COPY tempTable FROM buffer;
          UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id;
          COMMIT;
          TRUNCATE TABLE tempTable;
          VACUUM FULL ANALYZE myTable;


          That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included.






          share|improve this answer


















          • 3





            I faced a similar problem today and by doing the change you describe (accumulate changes in temp table, then update from it), we saw a 100x increase in performance. 2.5 hours to about 2 minutes.

            – Ron Dahlgren
            Oct 25 '15 at 1:41











          • Sorry - i am not quite getting the buffer part. Could you add more info on how to "Accumulate a bunch of updates in a buffer"?

            – n1000
            Apr 6 '16 at 7:04











          • @n1000 The buffer could be a CSV file or a StringIO object in Python. Each line of buffer will contains the same data structure than your "tempTable" to let the Postgresql COPY command works.

            – Le Droid
            Apr 12 '16 at 21:37











          • Thank you man. I tried this solution and it was supposed to cost nearly 30 hours and now it only need 77.1 s for the update command. Cheers!

            – whyisyoung
            Jan 6 '17 at 1:50











          • Did you verify that it was using HOT? You have to be careful because the modified column can't be a part of any index FWIW... :) See also blog.codacy.com/…

            – rogerdpack
            Nov 23 '17 at 20:36















          28














          I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%).
          My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.



          1. Transactions of bunch of "UPDATE myTable SET myField=value WHERE
            myId=id"
            Gives 1,500 updates/sec. which means each run would
            take at least 18 hours.

          2. HOT updates solution as described here with FILLFACTOR=50. Gives
            1,600 updates/sec. I uses SSD's so it's a costly improvement as it
            doubles the storage size.

          3. Insert in a temporary table of updated value and merge them after
            with UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUM
            for each partition; 100,000 up/s otherwise. Cooool.
            Here is the
            sequence of operations:


          CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated,
          CONSTRAINT tempTable_pkey PRIMARY KEY (id));


          Accumulate a bunch of updates in a buffer depending of available RAM
          When it's filled, or need to change of table/partition, or completed:



          COPY tempTable FROM buffer;
          UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id;
          COMMIT;
          TRUNCATE TABLE tempTable;
          VACUUM FULL ANALYZE myTable;


          That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included.






          share|improve this answer


















          • 3





            I faced a similar problem today and by doing the change you describe (accumulate changes in temp table, then update from it), we saw a 100x increase in performance. 2.5 hours to about 2 minutes.

            – Ron Dahlgren
            Oct 25 '15 at 1:41











          • Sorry - i am not quite getting the buffer part. Could you add more info on how to "Accumulate a bunch of updates in a buffer"?

            – n1000
            Apr 6 '16 at 7:04











          • @n1000 The buffer could be a CSV file or a StringIO object in Python. Each line of buffer will contains the same data structure than your "tempTable" to let the Postgresql COPY command works.

            – Le Droid
            Apr 12 '16 at 21:37











          • Thank you man. I tried this solution and it was supposed to cost nearly 30 hours and now it only need 77.1 s for the update command. Cheers!

            – whyisyoung
            Jan 6 '17 at 1:50











          • Did you verify that it was using HOT? You have to be careful because the modified column can't be a part of any index FWIW... :) See also blog.codacy.com/…

            – rogerdpack
            Nov 23 '17 at 20:36













          28












          28








          28







          I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%).
          My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.



          1. Transactions of bunch of "UPDATE myTable SET myField=value WHERE
            myId=id"
            Gives 1,500 updates/sec. which means each run would
            take at least 18 hours.

          2. HOT updates solution as described here with FILLFACTOR=50. Gives
            1,600 updates/sec. I uses SSD's so it's a costly improvement as it
            doubles the storage size.

          3. Insert in a temporary table of updated value and merge them after
            with UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUM
            for each partition; 100,000 up/s otherwise. Cooool.
            Here is the
            sequence of operations:


          CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated,
          CONSTRAINT tempTable_pkey PRIMARY KEY (id));


          Accumulate a bunch of updates in a buffer depending of available RAM
          When it's filled, or need to change of table/partition, or completed:



          COPY tempTable FROM buffer;
          UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id;
          COMMIT;
          TRUNCATE TABLE tempTable;
          VACUUM FULL ANALYZE myTable;


          That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included.






          share|improve this answer













          I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%).
          My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.



          1. Transactions of bunch of "UPDATE myTable SET myField=value WHERE
            myId=id"
            Gives 1,500 updates/sec. which means each run would
            take at least 18 hours.

          2. HOT updates solution as described here with FILLFACTOR=50. Gives
            1,600 updates/sec. I uses SSD's so it's a costly improvement as it
            doubles the storage size.

          3. Insert in a temporary table of updated value and merge them after
            with UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUM
            for each partition; 100,000 up/s otherwise. Cooool.
            Here is the
            sequence of operations:


          CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated,
          CONSTRAINT tempTable_pkey PRIMARY KEY (id));


          Accumulate a bunch of updates in a buffer depending of available RAM
          When it's filled, or need to change of table/partition, or completed:



          COPY tempTable FROM buffer;
          UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id;
          COMMIT;
          TRUNCATE TABLE tempTable;
          VACUUM FULL ANALYZE myTable;


          That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jul 17 '14 at 18:36









          Le DroidLe Droid

          2,50312426




          2,50312426







          • 3





            I faced a similar problem today and by doing the change you describe (accumulate changes in temp table, then update from it), we saw a 100x increase in performance. 2.5 hours to about 2 minutes.

            – Ron Dahlgren
            Oct 25 '15 at 1:41











          • Sorry - i am not quite getting the buffer part. Could you add more info on how to "Accumulate a bunch of updates in a buffer"?

            – n1000
            Apr 6 '16 at 7:04











          • @n1000 The buffer could be a CSV file or a StringIO object in Python. Each line of buffer will contains the same data structure than your "tempTable" to let the Postgresql COPY command works.

            – Le Droid
            Apr 12 '16 at 21:37











          • Thank you man. I tried this solution and it was supposed to cost nearly 30 hours and now it only need 77.1 s for the update command. Cheers!

            – whyisyoung
            Jan 6 '17 at 1:50











          • Did you verify that it was using HOT? You have to be careful because the modified column can't be a part of any index FWIW... :) See also blog.codacy.com/…

            – rogerdpack
            Nov 23 '17 at 20:36












          • 3





            I faced a similar problem today and by doing the change you describe (accumulate changes in temp table, then update from it), we saw a 100x increase in performance. 2.5 hours to about 2 minutes.

            – Ron Dahlgren
            Oct 25 '15 at 1:41











          • Sorry - i am not quite getting the buffer part. Could you add more info on how to "Accumulate a bunch of updates in a buffer"?

            – n1000
            Apr 6 '16 at 7:04











          • @n1000 The buffer could be a CSV file or a StringIO object in Python. Each line of buffer will contains the same data structure than your "tempTable" to let the Postgresql COPY command works.

            – Le Droid
            Apr 12 '16 at 21:37











          • Thank you man. I tried this solution and it was supposed to cost nearly 30 hours and now it only need 77.1 s for the update command. Cheers!

            – whyisyoung
            Jan 6 '17 at 1:50











          • Did you verify that it was using HOT? You have to be careful because the modified column can't be a part of any index FWIW... :) See also blog.codacy.com/…

            – rogerdpack
            Nov 23 '17 at 20:36







          3




          3





          I faced a similar problem today and by doing the change you describe (accumulate changes in temp table, then update from it), we saw a 100x increase in performance. 2.5 hours to about 2 minutes.

          – Ron Dahlgren
          Oct 25 '15 at 1:41





          I faced a similar problem today and by doing the change you describe (accumulate changes in temp table, then update from it), we saw a 100x increase in performance. 2.5 hours to about 2 minutes.

          – Ron Dahlgren
          Oct 25 '15 at 1:41













          Sorry - i am not quite getting the buffer part. Could you add more info on how to "Accumulate a bunch of updates in a buffer"?

          – n1000
          Apr 6 '16 at 7:04





          Sorry - i am not quite getting the buffer part. Could you add more info on how to "Accumulate a bunch of updates in a buffer"?

          – n1000
          Apr 6 '16 at 7:04













          @n1000 The buffer could be a CSV file or a StringIO object in Python. Each line of buffer will contains the same data structure than your "tempTable" to let the Postgresql COPY command works.

          – Le Droid
          Apr 12 '16 at 21:37





          @n1000 The buffer could be a CSV file or a StringIO object in Python. Each line of buffer will contains the same data structure than your "tempTable" to let the Postgresql COPY command works.

          – Le Droid
          Apr 12 '16 at 21:37













          Thank you man. I tried this solution and it was supposed to cost nearly 30 hours and now it only need 77.1 s for the update command. Cheers!

          – whyisyoung
          Jan 6 '17 at 1:50





          Thank you man. I tried this solution and it was supposed to cost nearly 30 hours and now it only need 77.1 s for the update command. Cheers!

          – whyisyoung
          Jan 6 '17 at 1:50













          Did you verify that it was using HOT? You have to be careful because the modified column can't be a part of any index FWIW... :) See also blog.codacy.com/…

          – rogerdpack
          Nov 23 '17 at 20:36





          Did you verify that it was using HOT? You have to be careful because the modified column can't be a part of any index FWIW... :) See also blog.codacy.com/…

          – rogerdpack
          Nov 23 '17 at 20:36











          7














          After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:



          CREATE TABLE table2 AS 
          SELECT
          all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdate
          from myTable


          Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :)



          Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.






          share|improve this answer




















          • 4





            Postgresql's MVCC implementation makes updates expensive. If you're updating every row in the table, each row needs to be copied as a new version, and the old version marked as deleted. So it's not surprising that rewriting the table is faster (which is what altering the type of a column does automatically, for instance). not much you can do about it, just a performance characteristic to be aware of.

            – araqnid
            Jul 29 '10 at 17:40











          • Thanks for the explanation, araqnid. I didn't know postgresql did implement updates like that.

            – Ricardo Lage
            Jul 29 '10 at 18:43















          7














          After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:



          CREATE TABLE table2 AS 
          SELECT
          all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdate
          from myTable


          Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :)



          Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.






          share|improve this answer




















          • 4





            Postgresql's MVCC implementation makes updates expensive. If you're updating every row in the table, each row needs to be copied as a new version, and the old version marked as deleted. So it's not surprising that rewriting the table is faster (which is what altering the type of a column does automatically, for instance). not much you can do about it, just a performance characteristic to be aware of.

            – araqnid
            Jul 29 '10 at 17:40











          • Thanks for the explanation, araqnid. I didn't know postgresql did implement updates like that.

            – Ricardo Lage
            Jul 29 '10 at 18:43













          7












          7








          7







          After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:



          CREATE TABLE table2 AS 
          SELECT
          all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdate
          from myTable


          Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :)



          Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.






          share|improve this answer















          After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:



          CREATE TABLE table2 AS 
          SELECT
          all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdate
          from myTable


          Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :)



          Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 24 '17 at 17:47









          rogerdpack

          35.1k17136256




          35.1k17136256










          answered Jul 29 '10 at 11:18









          Ricardo LageRicardo Lage

          5,37152855




          5,37152855







          • 4





            Postgresql's MVCC implementation makes updates expensive. If you're updating every row in the table, each row needs to be copied as a new version, and the old version marked as deleted. So it's not surprising that rewriting the table is faster (which is what altering the type of a column does automatically, for instance). not much you can do about it, just a performance characteristic to be aware of.

            – araqnid
            Jul 29 '10 at 17:40











          • Thanks for the explanation, araqnid. I didn't know postgresql did implement updates like that.

            – Ricardo Lage
            Jul 29 '10 at 18:43












          • 4





            Postgresql's MVCC implementation makes updates expensive. If you're updating every row in the table, each row needs to be copied as a new version, and the old version marked as deleted. So it's not surprising that rewriting the table is faster (which is what altering the type of a column does automatically, for instance). not much you can do about it, just a performance characteristic to be aware of.

            – araqnid
            Jul 29 '10 at 17:40











          • Thanks for the explanation, araqnid. I didn't know postgresql did implement updates like that.

            – Ricardo Lage
            Jul 29 '10 at 18:43







          4




          4





          Postgresql's MVCC implementation makes updates expensive. If you're updating every row in the table, each row needs to be copied as a new version, and the old version marked as deleted. So it's not surprising that rewriting the table is faster (which is what altering the type of a column does automatically, for instance). not much you can do about it, just a performance characteristic to be aware of.

          – araqnid
          Jul 29 '10 at 17:40





          Postgresql's MVCC implementation makes updates expensive. If you're updating every row in the table, each row needs to be copied as a new version, and the old version marked as deleted. So it's not surprising that rewriting the table is faster (which is what altering the type of a column does automatically, for instance). not much you can do about it, just a performance characteristic to be aware of.

          – araqnid
          Jul 29 '10 at 17:40













          Thanks for the explanation, araqnid. I didn't know postgresql did implement updates like that.

          – Ricardo Lage
          Jul 29 '10 at 18:43





          Thanks for the explanation, araqnid. I didn't know postgresql did implement updates like that.

          – Ricardo Lage
          Jul 29 '10 at 18:43











          2














          Today I've spent many hours with similar issue. I've found a solution: drop all the constraints/indices before the update. No matter whether the column being updated is indexed or not, it seems like psql updates all the indices for all the updated rows. After the update is finished, add the constraints/indices back.






          share|improve this answer

























          • If your data is "sparse" (lots of churn/free space within blocks) and the column being updated is not part of an index, then Postgres can use HOT and speed up by not needing to update indices. So this might help that way. See also dba.stackexchange.com/questions/15720/… ... Maybe it helps because an update in Postgres is equivalent to DELETE + INSERT and so it has to update all indexes for that row? Or maybe Postgres does bizarre things like rewriting full blocks worth of the index ("your personal copy of the index") until you commit :

            – rogerdpack
            Nov 24 '17 at 17:23















          2














          Today I've spent many hours with similar issue. I've found a solution: drop all the constraints/indices before the update. No matter whether the column being updated is indexed or not, it seems like psql updates all the indices for all the updated rows. After the update is finished, add the constraints/indices back.






          share|improve this answer

























          • If your data is "sparse" (lots of churn/free space within blocks) and the column being updated is not part of an index, then Postgres can use HOT and speed up by not needing to update indices. So this might help that way. See also dba.stackexchange.com/questions/15720/… ... Maybe it helps because an update in Postgres is equivalent to DELETE + INSERT and so it has to update all indexes for that row? Or maybe Postgres does bizarre things like rewriting full blocks worth of the index ("your personal copy of the index") until you commit :

            – rogerdpack
            Nov 24 '17 at 17:23













          2












          2








          2







          Today I've spent many hours with similar issue. I've found a solution: drop all the constraints/indices before the update. No matter whether the column being updated is indexed or not, it seems like psql updates all the indices for all the updated rows. After the update is finished, add the constraints/indices back.






          share|improve this answer















          Today I've spent many hours with similar issue. I've found a solution: drop all the constraints/indices before the update. No matter whether the column being updated is indexed or not, it seems like psql updates all the indices for all the updated rows. After the update is finished, add the constraints/indices back.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 23 '17 at 20:15









          rogerdpack

          35.1k17136256




          35.1k17136256










          answered Feb 26 '16 at 19:57









          TregoregTregoreg

          4,50573354




          4,50573354












          • If your data is "sparse" (lots of churn/free space within blocks) and the column being updated is not part of an index, then Postgres can use HOT and speed up by not needing to update indices. So this might help that way. See also dba.stackexchange.com/questions/15720/… ... Maybe it helps because an update in Postgres is equivalent to DELETE + INSERT and so it has to update all indexes for that row? Or maybe Postgres does bizarre things like rewriting full blocks worth of the index ("your personal copy of the index") until you commit :

            – rogerdpack
            Nov 24 '17 at 17:23

















          • If your data is "sparse" (lots of churn/free space within blocks) and the column being updated is not part of an index, then Postgres can use HOT and speed up by not needing to update indices. So this might help that way. See also dba.stackexchange.com/questions/15720/… ... Maybe it helps because an update in Postgres is equivalent to DELETE + INSERT and so it has to update all indexes for that row? Or maybe Postgres does bizarre things like rewriting full blocks worth of the index ("your personal copy of the index") until you commit :

            – rogerdpack
            Nov 24 '17 at 17:23
















          If your data is "sparse" (lots of churn/free space within blocks) and the column being updated is not part of an index, then Postgres can use HOT and speed up by not needing to update indices. So this might help that way. See also dba.stackexchange.com/questions/15720/… ... Maybe it helps because an update in Postgres is equivalent to DELETE + INSERT and so it has to update all indexes for that row? Or maybe Postgres does bizarre things like rewriting full blocks worth of the index ("your personal copy of the index") until you commit :

          – rogerdpack
          Nov 24 '17 at 17:23





          If your data is "sparse" (lots of churn/free space within blocks) and the column being updated is not part of an index, then Postgres can use HOT and speed up by not needing to update indices. So this might help that way. See also dba.stackexchange.com/questions/15720/… ... Maybe it helps because an update in Postgres is equivalent to DELETE + INSERT and so it has to update all indexes for that row? Or maybe Postgres does bizarre things like rewriting full blocks worth of the index ("your personal copy of the index") until you commit :

          – rogerdpack
          Nov 24 '17 at 17:23











          2














          Try this (note that generalFreq starts as type REAL, and stays the same):



          ALTER TABLE myTable ALTER COLUMN generalFreq TYPE REAL USING 0;


          This will rewrite the table, similar to a DROP + CREATE, and rebuild all indices. But all in one command. Much faster (about 2x) and you don't have to deal with dependencies and recreating indexes and other stuff, though it does lock the table (access exclusive--i.e. full lock) for the duration. Or maybe that's what you want if you want everything else to queue up behind it. If you aren't updating "too many" rows this way is slower than just an update.






          share|improve this answer





























            2














            Try this (note that generalFreq starts as type REAL, and stays the same):



            ALTER TABLE myTable ALTER COLUMN generalFreq TYPE REAL USING 0;


            This will rewrite the table, similar to a DROP + CREATE, and rebuild all indices. But all in one command. Much faster (about 2x) and you don't have to deal with dependencies and recreating indexes and other stuff, though it does lock the table (access exclusive--i.e. full lock) for the duration. Or maybe that's what you want if you want everything else to queue up behind it. If you aren't updating "too many" rows this way is slower than just an update.






            share|improve this answer



























              2












              2








              2







              Try this (note that generalFreq starts as type REAL, and stays the same):



              ALTER TABLE myTable ALTER COLUMN generalFreq TYPE REAL USING 0;


              This will rewrite the table, similar to a DROP + CREATE, and rebuild all indices. But all in one command. Much faster (about 2x) and you don't have to deal with dependencies and recreating indexes and other stuff, though it does lock the table (access exclusive--i.e. full lock) for the duration. Or maybe that's what you want if you want everything else to queue up behind it. If you aren't updating "too many" rows this way is slower than just an update.






              share|improve this answer















              Try this (note that generalFreq starts as type REAL, and stays the same):



              ALTER TABLE myTable ALTER COLUMN generalFreq TYPE REAL USING 0;


              This will rewrite the table, similar to a DROP + CREATE, and rebuild all indices. But all in one command. Much faster (about 2x) and you don't have to deal with dependencies and recreating indexes and other stuff, though it does lock the table (access exclusive--i.e. full lock) for the duration. Or maybe that's what you want if you want everything else to queue up behind it. If you aren't updating "too many" rows this way is slower than just an update.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 25 '17 at 0:13









              rogerdpack

              35.1k17136256




              35.1k17136256










              answered Apr 19 '16 at 22:28









              Fabiano BoninFabiano Bonin

              213




              213





















                  0














                  How are you running it? If you are looping each row and performing an update statement, you are running potentially millions of individual updates which is why it will perform incredibly slowly.



                  If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records.






                  share|improve this answer


















                  • 1





                    Hi Tom, thanks. I am running a single update from the psql command line. I understand 3 million is a lot but in my experience with other databases, it shouldn't take more than 10 min. to run a single update in one numeric column.

                    – Ricardo Lage
                    Jul 29 '10 at 11:00











                  • I wouldn't of expected it to take so long either, especially with a constant assignment (setting all fields to 0), memory wise this should be pretty fast for a DB to handle. I've only limited experience with Postgres, but you could try doing it in batches of 100k and timing it to see how long you can expect the 3 million to run, it might just be the case Postgres isn't very good at this unusual operation.

                    – Tom Gullen
                    Jul 29 '10 at 11:06















                  0














                  How are you running it? If you are looping each row and performing an update statement, you are running potentially millions of individual updates which is why it will perform incredibly slowly.



                  If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records.






                  share|improve this answer


















                  • 1





                    Hi Tom, thanks. I am running a single update from the psql command line. I understand 3 million is a lot but in my experience with other databases, it shouldn't take more than 10 min. to run a single update in one numeric column.

                    – Ricardo Lage
                    Jul 29 '10 at 11:00











                  • I wouldn't of expected it to take so long either, especially with a constant assignment (setting all fields to 0), memory wise this should be pretty fast for a DB to handle. I've only limited experience with Postgres, but you could try doing it in batches of 100k and timing it to see how long you can expect the 3 million to run, it might just be the case Postgres isn't very good at this unusual operation.

                    – Tom Gullen
                    Jul 29 '10 at 11:06













                  0












                  0








                  0







                  How are you running it? If you are looping each row and performing an update statement, you are running potentially millions of individual updates which is why it will perform incredibly slowly.



                  If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records.






                  share|improve this answer













                  How are you running it? If you are looping each row and performing an update statement, you are running potentially millions of individual updates which is why it will perform incredibly slowly.



                  If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jul 29 '10 at 10:06









                  Tom GullenTom Gullen

                  33.9k70241407




                  33.9k70241407







                  • 1





                    Hi Tom, thanks. I am running a single update from the psql command line. I understand 3 million is a lot but in my experience with other databases, it shouldn't take more than 10 min. to run a single update in one numeric column.

                    – Ricardo Lage
                    Jul 29 '10 at 11:00











                  • I wouldn't of expected it to take so long either, especially with a constant assignment (setting all fields to 0), memory wise this should be pretty fast for a DB to handle. I've only limited experience with Postgres, but you could try doing it in batches of 100k and timing it to see how long you can expect the 3 million to run, it might just be the case Postgres isn't very good at this unusual operation.

                    – Tom Gullen
                    Jul 29 '10 at 11:06












                  • 1





                    Hi Tom, thanks. I am running a single update from the psql command line. I understand 3 million is a lot but in my experience with other databases, it shouldn't take more than 10 min. to run a single update in one numeric column.

                    – Ricardo Lage
                    Jul 29 '10 at 11:00











                  • I wouldn't of expected it to take so long either, especially with a constant assignment (setting all fields to 0), memory wise this should be pretty fast for a DB to handle. I've only limited experience with Postgres, but you could try doing it in batches of 100k and timing it to see how long you can expect the 3 million to run, it might just be the case Postgres isn't very good at this unusual operation.

                    – Tom Gullen
                    Jul 29 '10 at 11:06







                  1




                  1





                  Hi Tom, thanks. I am running a single update from the psql command line. I understand 3 million is a lot but in my experience with other databases, it shouldn't take more than 10 min. to run a single update in one numeric column.

                  – Ricardo Lage
                  Jul 29 '10 at 11:00





                  Hi Tom, thanks. I am running a single update from the psql command line. I understand 3 million is a lot but in my experience with other databases, it shouldn't take more than 10 min. to run a single update in one numeric column.

                  – Ricardo Lage
                  Jul 29 '10 at 11:00













                  I wouldn't of expected it to take so long either, especially with a constant assignment (setting all fields to 0), memory wise this should be pretty fast for a DB to handle. I've only limited experience with Postgres, but you could try doing it in batches of 100k and timing it to see how long you can expect the 3 million to run, it might just be the case Postgres isn't very good at this unusual operation.

                  – Tom Gullen
                  Jul 29 '10 at 11:06





                  I wouldn't of expected it to take so long either, especially with a constant assignment (setting all fields to 0), memory wise this should be pretty fast for a DB to handle. I've only limited experience with Postgres, but you could try doing it in batches of 100k and timing it to see how long you can expect the 3 million to run, it might just be the case Postgres isn't very good at this unusual operation.

                  – Tom Gullen
                  Jul 29 '10 at 11:06











                  0














                  The first thing I'd suggest (from https://dba.stackexchange.com/questions/118178/does-updating-a-row-with-the-same-value-actually-update-the-row) is to only update rows that "need" it, ex:



                   UPDATE myTable SET generalFreq = 0 where generalFreq != 0;


                  (might also need an index on generalFreq). Then you'll update fewer rows. Though not if the values are all non zero already, but updating fewer rows "can help" since otherwise it updates them and all indexes regardless of whether the value changed or not.



                  Another option: if the stars align in terms of defaults and not-null constraints, you can drop the old column and create another by just adjusting metadata, instant time.






                  share|improve this answer





























                    0














                    The first thing I'd suggest (from https://dba.stackexchange.com/questions/118178/does-updating-a-row-with-the-same-value-actually-update-the-row) is to only update rows that "need" it, ex:



                     UPDATE myTable SET generalFreq = 0 where generalFreq != 0;


                    (might also need an index on generalFreq). Then you'll update fewer rows. Though not if the values are all non zero already, but updating fewer rows "can help" since otherwise it updates them and all indexes regardless of whether the value changed or not.



                    Another option: if the stars align in terms of defaults and not-null constraints, you can drop the old column and create another by just adjusting metadata, instant time.






                    share|improve this answer



























                      0












                      0








                      0







                      The first thing I'd suggest (from https://dba.stackexchange.com/questions/118178/does-updating-a-row-with-the-same-value-actually-update-the-row) is to only update rows that "need" it, ex:



                       UPDATE myTable SET generalFreq = 0 where generalFreq != 0;


                      (might also need an index on generalFreq). Then you'll update fewer rows. Though not if the values are all non zero already, but updating fewer rows "can help" since otherwise it updates them and all indexes regardless of whether the value changed or not.



                      Another option: if the stars align in terms of defaults and not-null constraints, you can drop the old column and create another by just adjusting metadata, instant time.






                      share|improve this answer















                      The first thing I'd suggest (from https://dba.stackexchange.com/questions/118178/does-updating-a-row-with-the-same-value-actually-update-the-row) is to only update rows that "need" it, ex:



                       UPDATE myTable SET generalFreq = 0 where generalFreq != 0;


                      (might also need an index on generalFreq). Then you'll update fewer rows. Though not if the values are all non zero already, but updating fewer rows "can help" since otherwise it updates them and all indexes regardless of whether the value changed or not.



                      Another option: if the stars align in terms of defaults and not-null constraints, you can drop the old column and create another by just adjusting metadata, instant time.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Nov 24 '17 at 17:49

























                      answered Nov 23 '17 at 19:55









                      rogerdpackrogerdpack

                      35.1k17136256




                      35.1k17136256





















                          0














                          In my tests I noticed that a big update, more than 200 000 rows, is slower than 2 updates of 100 000 rows, even with a temporary table.



                          My solution is to loop, in each loop create a temporary table of 200 000 rows, in this table I compute my values, then update my main table with the new values aso...



                          Every 2 000 000 rows, I manually "VACUUM ANALYSE mytable", I noticed that the auto vacuum doesn't do its job for such updates.






                          share|improve this answer



























                            0














                            In my tests I noticed that a big update, more than 200 000 rows, is slower than 2 updates of 100 000 rows, even with a temporary table.



                            My solution is to loop, in each loop create a temporary table of 200 000 rows, in this table I compute my values, then update my main table with the new values aso...



                            Every 2 000 000 rows, I manually "VACUUM ANALYSE mytable", I noticed that the auto vacuum doesn't do its job for such updates.






                            share|improve this answer

























                              0












                              0








                              0







                              In my tests I noticed that a big update, more than 200 000 rows, is slower than 2 updates of 100 000 rows, even with a temporary table.



                              My solution is to loop, in each loop create a temporary table of 200 000 rows, in this table I compute my values, then update my main table with the new values aso...



                              Every 2 000 000 rows, I manually "VACUUM ANALYSE mytable", I noticed that the auto vacuum doesn't do its job for such updates.






                              share|improve this answer













                              In my tests I noticed that a big update, more than 200 000 rows, is slower than 2 updates of 100 000 rows, even with a temporary table.



                              My solution is to loop, in each loop create a temporary table of 200 000 rows, in this table I compute my values, then update my main table with the new values aso...



                              Every 2 000 000 rows, I manually "VACUUM ANALYSE mytable", I noticed that the auto vacuum doesn't do its job for such updates.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Nov 15 '18 at 17:23









                              RolintocourRolintocour

                              77011229




                              77011229





















                                  -2














                                  try



                                  UPDATE myTable SET generalFreq = 0.0;


                                  Maybe it is a casting issue






                                  share|improve this answer





























                                    -2














                                    try



                                    UPDATE myTable SET generalFreq = 0.0;


                                    Maybe it is a casting issue






                                    share|improve this answer



























                                      -2












                                      -2








                                      -2







                                      try



                                      UPDATE myTable SET generalFreq = 0.0;


                                      Maybe it is a casting issue






                                      share|improve this answer















                                      try



                                      UPDATE myTable SET generalFreq = 0.0;


                                      Maybe it is a casting issue







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited May 9 '13 at 3:30









                                      Mudassir Hasan

                                      20.7k1373112




                                      20.7k1373112










                                      answered Aug 10 '10 at 21:08









                                      ChocolimChocolim

                                      5212




                                      5212



























                                          draft saved

                                          draft discarded
















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid


                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.

                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f3361291%2fslow-simple-update-query-on-postgresql-database-with-3-million-rows%23new-answer', 'question_page');

                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          這個網誌中的熱門文章

                                          How to read a connectionString WITH PROVIDER in .NET Core?

                                          Guadeloupe

                                          Node.js Script on GitHub Pages or Amazon S3