bigquery loadJob of a json - forcing a field to be String in the schema auto-detect










0















if in the beginning the json contains




"label": "foo"




and later it is




"label": "123"




bigquery returns



Invalid schema update. Field label has changed type from STRING to INTEGER 


although it is "123" and not 123.



file is being loaded with




autodetect: true




is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?










share|improve this question


























    0















    if in the beginning the json contains




    "label": "foo"




    and later it is




    "label": "123"




    bigquery returns



    Invalid schema update. Field label has changed type from STRING to INTEGER 


    although it is "123" and not 123.



    file is being loaded with




    autodetect: true




    is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?










    share|improve this question
























      0












      0








      0








      if in the beginning the json contains




      "label": "foo"




      and later it is




      "label": "123"




      bigquery returns



      Invalid schema update. Field label has changed type from STRING to INTEGER 


      although it is "123" and not 123.



      file is being loaded with




      autodetect: true




      is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?










      share|improve this question














      if in the beginning the json contains




      "label": "foo"




      and later it is




      "label": "123"




      bigquery returns



      Invalid schema update. Field label has changed type from STRING to INTEGER 


      although it is "123" and not 123.



      file is being loaded with




      autodetect: true




      is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?







      google-bigquery






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 15 '18 at 10:32









      ShaharTShaharT

      20925




      20925






















          1 Answer
          1






          active

          oldest

          votes


















          0














          The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.



          UDATE:



          I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:




          Note: You can view the schema of an existing table in JSON format by
          entering the following command: bq show --format=prettyjson
          [DATASET].[TABLE].




          and use it for further dynamic loads






          share|improve this answer

























          • thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

            – ShaharT
            Nov 15 '18 at 20:20











          • @ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

            – enle lin
            Nov 16 '18 at 12:55











          • the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

            – ShaharT
            Nov 18 '18 at 6:54











          • I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

            – enle lin
            Nov 20 '18 at 13:13










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53317427%2fbigquery-loadjob-of-a-json-forcing-a-field-to-be-string-in-the-schema-auto-det%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.



          UDATE:



          I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:




          Note: You can view the schema of an existing table in JSON format by
          entering the following command: bq show --format=prettyjson
          [DATASET].[TABLE].




          and use it for further dynamic loads






          share|improve this answer

























          • thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

            – ShaharT
            Nov 15 '18 at 20:20











          • @ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

            – enle lin
            Nov 16 '18 at 12:55











          • the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

            – ShaharT
            Nov 18 '18 at 6:54











          • I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

            – enle lin
            Nov 20 '18 at 13:13















          0














          The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.



          UDATE:



          I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:




          Note: You can view the schema of an existing table in JSON format by
          entering the following command: bq show --format=prettyjson
          [DATASET].[TABLE].




          and use it for further dynamic loads






          share|improve this answer

























          • thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

            – ShaharT
            Nov 15 '18 at 20:20











          • @ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

            – enle lin
            Nov 16 '18 at 12:55











          • the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

            – ShaharT
            Nov 18 '18 at 6:54











          • I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

            – enle lin
            Nov 20 '18 at 13:13













          0












          0








          0







          The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.



          UDATE:



          I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:




          Note: You can view the schema of an existing table in JSON format by
          entering the following command: bq show --format=prettyjson
          [DATASET].[TABLE].




          and use it for further dynamic loads






          share|improve this answer















          The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.



          UDATE:



          I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:




          Note: You can view the schema of an existing table in JSON format by
          entering the following command: bq show --format=prettyjson
          [DATASET].[TABLE].




          and use it for further dynamic loads







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 16 '18 at 12:50

























          answered Nov 15 '18 at 17:32









          enle linenle lin

          927110




          927110












          • thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

            – ShaharT
            Nov 15 '18 at 20:20











          • @ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

            – enle lin
            Nov 16 '18 at 12:55











          • the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

            – ShaharT
            Nov 18 '18 at 6:54











          • I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

            – enle lin
            Nov 20 '18 at 13:13

















          • thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

            – ShaharT
            Nov 15 '18 at 20:20











          • @ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

            – enle lin
            Nov 16 '18 at 12:55











          • the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

            – ShaharT
            Nov 18 '18 at 6:54











          • I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

            – enle lin
            Nov 20 '18 at 13:13
















          thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

          – ShaharT
          Nov 15 '18 at 20:20





          thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

          – ShaharT
          Nov 15 '18 at 20:20













          @ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

          – enle lin
          Nov 16 '18 at 12:55





          @ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

          – enle lin
          Nov 16 '18 at 12:55













          the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

          – ShaharT
          Nov 18 '18 at 6:54





          the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

          – ShaharT
          Nov 18 '18 at 6:54













          I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

          – enle lin
          Nov 20 '18 at 13:13





          I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

          – enle lin
          Nov 20 '18 at 13:13



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53317427%2fbigquery-loadjob-of-a-json-forcing-a-field-to-be-string-in-the-schema-auto-det%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          What does pagestruct do in Eviews?

          Dutch intervention in Lombok and Karangasem

          Channel Islands