bigquery loadJob of a json - forcing a field to be String in the schema auto-detect

if in the beginning the json contains

"label": "foo"

and later it is

"label": "123"

bigquery returns

Invalid schema update. Field label has changed type from STRING to INTEGER

although it is "123" and not 123.

file is being loaded with

autodetect: true

is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?

asked Nov 15 '18 at 10:32

ShaharT

20925

add a comment |

if in the beginning the json contains

"label": "foo"

and later it is

"label": "123"

bigquery returns

Invalid schema update. Field label has changed type from STRING to INTEGER

although it is "123" and not 123.

file is being loaded with

autodetect: true

is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?

asked Nov 15 '18 at 10:32

ShaharT

20925

add a comment |

if in the beginning the json contains

"label": "foo"

and later it is

"label": "123"

bigquery returns

Invalid schema update. Field label has changed type from STRING to INTEGER

although it is "123" and not 123.

file is being loaded with

autodetect: true

is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?

asked Nov 15 '18 at 10:32

ShaharT

20925

if in the beginning the json contains

"label": "foo"

and later it is

"label": "123"

bigquery returns

Invalid schema update. Field label has changed type from STRING to INTEGER

although it is "123" and not 123.

file is being loaded with

autodetect: true

is there a way to force bigquery to make any field as string when it applies its auto-detect, or the only way is using csv instead ?

google-bigquery

asked Nov 15 '18 at 10:32

ShaharT

20925

asked Nov 15 '18 at 10:32

ShaharT

20925

asked Nov 15 '18 at 10:32

ShaharT

20925

asked Nov 15 '18 at 10:32

ShaharT

20925

asked Nov 15 '18 at 10:32

ShaharT

20925

add a comment |

1 Answer
1

active

oldest

votes

The auto-detection is based on the best effort to recognize the data type by scanning up to 100 rows of data to use as a representative sample. There is no way to give insight about which kind of type it is. You may consider to specify manually the schema for your use case.

UDATE:

I have tested to load a file with only "label" : "123" and it is recognized as INTEGER. Therefore, the auto detection recognizes "123" as INGETER no matter if there are quotes or not. For your case, you may consider to export the schema from the existent table as explained in the documentation:

Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].

and use it for further dynamic loads

edited Nov 16 '18 at 12:50

answered Nov 15 '18 at 17:32

enle lin

927110

thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

– ShaharT
Nov 15 '18 at 20:20

@ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

– enle lin
Nov 16 '18 at 12:55

the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

– ShaharT
Nov 18 '18 at 6:54

I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

– enle lin
Nov 20 '18 at 13:13

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53317427%2fbigquery-loadjob-of-a-json-forcing-a-field-to-be-string-in-the-schema-auto-det%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

UDATE:

Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].

and use it for further dynamic loads

edited Nov 16 '18 at 12:50

answered Nov 15 '18 at 17:32

enle lin

927110

thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

– ShaharT
Nov 15 '18 at 20:20

@ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

– enle lin
Nov 16 '18 at 12:55

the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

– ShaharT
Nov 18 '18 at 6:54

I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

– enle lin
Nov 20 '18 at 13:13

add a comment |

UDATE:

Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].

and use it for further dynamic loads

edited Nov 16 '18 at 12:50

answered Nov 15 '18 at 17:32

enle lin

927110

thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

– ShaharT
Nov 15 '18 at 20:20

@ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

– enle lin
Nov 16 '18 at 12:55

the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

– ShaharT
Nov 18 '18 at 6:54

I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

– enle lin
Nov 20 '18 at 13:13

add a comment |

UDATE:

Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].

and use it for further dynamic loads

edited Nov 16 '18 at 12:50

answered Nov 15 '18 at 17:32

enle lin

927110

UDATE:

Note: You can view the schema of an existing table in JSON format by
entering the following command: bq show --format=prettyjson
[DATASET].[TABLE].

and use it for further dynamic loads

edited Nov 16 '18 at 12:50

answered Nov 15 '18 at 17:32

enle lin

927110

edited Nov 16 '18 at 12:50

answered Nov 15 '18 at 17:32

enle lin

927110

answered Nov 15 '18 at 17:32

enle lin

927110

answered Nov 15 '18 at 17:32

enle lin

927110

thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

– ShaharT
Nov 15 '18 at 20:20

@ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

– enle lin
Nov 16 '18 at 12:55

the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

– ShaharT
Nov 18 '18 at 6:54

I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

– enle lin
Nov 20 '18 at 13:13

add a comment |

thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

– ShaharT
Nov 15 '18 at 20:20

@ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

– enle lin
Nov 16 '18 at 12:55

the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

– ShaharT
Nov 18 '18 at 6:54

I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

– enle lin
Nov 20 '18 at 13:13

thank you. Yes, the issue is that each json is dynamic and might contain new elements, that's the reason for the auto-detect config. Once bigquery already determined the type of an element from the 'history' as a string, why it can't treat a new value as a string anymore, even though it looks like a integer. Moreover, the value contains quotes :(

– ShaharT
Nov 15 '18 at 20:20

@ShaharT Auto detection interprets "123" as INTEGER and you should specify schema for JSON file when loading it to BigQuery. Notice that you can export schema from an existent table. See my updated answer for more details.

– enle lin
Nov 16 '18 at 12:55

the issue is that in the first value, bigquery loaded it as string, and in the next json load, I expect bigquery to force it to be string (as it already auto-detected it before)

– ShaharT
Nov 18 '18 at 6:54

I think each load job is treated independently so the auto-detect will apply again for each load. Therefore, if some load has only numbers in the 100 rows sample to determinate the schema, it will detect the field as INTEGER (no matter if it was detected as STRING in the previous load). The auto detection is done by best effort and you may specify the schema manually in your case.

– enle lin
Nov 20 '18 at 13:13

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Odtnhj