bigquery loadJob of a json - forcing a field to be String in the schema auto-detect
if in the beginning the json contains
"label": "foo"
and later it is
"label": "123"
bigquery returns
Invalid schema update. Field label has changed type from STRING to INTEGER
although it is "123" and not 123.
file is being loaded with
autodetect: true
is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?
google-bigquery
add a comment |
if in the beginning the json contains
"label": "foo"
and later it is
"label": "123"
bigquery returns
Invalid schema update. Field label has changed type from STRING to INTEGER
although it is "123" and not 123.
file is being loaded with
autodetect: true
is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?
google-bigquery
add a comment |
if in the beginning the json contains
"label": "foo"
and later it is
"label": "123"
bigquery returns
Invalid schema update. Field label has changed type from STRING to INTEGER
although it is "123" and not 123.
file is being loaded with
autodetect: true
is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?
google-bigquery
if in the beginning the json contains
"label": "foo"
and later it is
"label": "123"
bigquery returns
Invalid schema update. Field label has changed type from STRING to INTEGER
although it is "123" and not 123.
file is being loaded with
autodetect: true
is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?
google-bigquery
google-bigquery
asked Nov 15 '18 at 10:32
ShaharTShaharT
20925
20925
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.
UDATE:
I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:
Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].
and use it for further dynamic loads
thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(
– ShaharT
Nov 15 '18 at 20:20
@ShaharT Auto detection interprets"123"asINTEGERand you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.
– enle lin
Nov 16 '18 at 12:55
the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)
– ShaharT
Nov 18 '18 at 6:54
I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field asINTEGER(no matter if it was detected asSTRINGin the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.
– enle lin
Nov 20 '18 at 13:13
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53317427%2fbigquery-loadjob-of-a-json-forcing-a-field-to-be-string-in-the-schema-auto-det%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.
UDATE:
I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:
Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].
and use it for further dynamic loads
thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(
– ShaharT
Nov 15 '18 at 20:20
@ShaharT Auto detection interprets"123"asINTEGERand you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.
– enle lin
Nov 16 '18 at 12:55
the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)
– ShaharT
Nov 18 '18 at 6:54
I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field asINTEGER(no matter if it was detected asSTRINGin the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.
– enle lin
Nov 20 '18 at 13:13
add a comment |
The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.
UDATE:
I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:
Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].
and use it for further dynamic loads
thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(
– ShaharT
Nov 15 '18 at 20:20
@ShaharT Auto detection interprets"123"asINTEGERand you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.
– enle lin
Nov 16 '18 at 12:55
the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)
– ShaharT
Nov 18 '18 at 6:54
I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field asINTEGER(no matter if it was detected asSTRINGin the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.
– enle lin
Nov 20 '18 at 13:13
add a comment |
The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.
UDATE:
I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:
Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].
and use it for further dynamic loads
The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.
UDATE:
I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:
Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].
and use it for further dynamic loads
edited Nov 16 '18 at 12:50
answered Nov 15 '18 at 17:32
enle linenle lin
927110
927110
thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(
– ShaharT
Nov 15 '18 at 20:20
@ShaharT Auto detection interprets"123"asINTEGERand you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.
– enle lin
Nov 16 '18 at 12:55
the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)
– ShaharT
Nov 18 '18 at 6:54
I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field asINTEGER(no matter if it was detected asSTRINGin the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.
– enle lin
Nov 20 '18 at 13:13
add a comment |
thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(
– ShaharT
Nov 15 '18 at 20:20
@ShaharT Auto detection interprets"123"asINTEGERand you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.
– enle lin
Nov 16 '18 at 12:55
the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)
– ShaharT
Nov 18 '18 at 6:54
I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field asINTEGER(no matter if it was detected asSTRINGin the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.
– enle lin
Nov 20 '18 at 13:13
thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(
– ShaharT
Nov 15 '18 at 20:20
thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(
– ShaharT
Nov 15 '18 at 20:20
@ShaharT Auto detection interprets
"123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.– enle lin
Nov 16 '18 at 12:55
@ShaharT Auto detection interprets
"123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.– enle lin
Nov 16 '18 at 12:55
the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)
– ShaharT
Nov 18 '18 at 6:54
the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)
– ShaharT
Nov 18 '18 at 6:54
I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as
INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.– enle lin
Nov 20 '18 at 13:13
I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as
INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.– enle lin
Nov 20 '18 at 13:13
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53317427%2fbigquery-loadjob-of-a-json-forcing-a-field-to-be-string-in-the-schema-auto-det%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown