python3, UnicodeDecodeError when reading contents of unzipped file
I have some code to download some zipped csv files, unzip them and then concatenate the data into a single dataframe. The problem is that I get the error
import pandas as pd
import requests
from io import BytesIO
from zipfile import ZipFile
from bs4 import BeautifulSoup
def findZipLinks(url):
r = requests.get(url)
bs = BeautifulSoup(r.content, features="html.parser")
links = [agecaredata_url + a.get('data-link') for a in bs.findAll('a', "class": "downloadhrefp_lt_WebPartZone6_znMC_pageplaceholder_p_lt_WebPartZone2_ZoneA_znPublicationFooterItem_znPublicationFooterItem_zone_Stacker_MultiColumns u-dtb u-w100p u-bgc-primary u-c-fff c-publication__download u-mb-gutter0p25x") if "zip" in a.get("data-link")]
return links
exits = findZipLinks('https://www.gen-agedcaredata.gov.au/Resources/Access-data/2018/June/GEN-data-People-leaving-aged-care')
dfs =
for exit_url in exits:
r = requests.get(exit_url)
zipfile = ZipFile(BytesIO(r.content))
dfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
pd.concat(df for df in dfs).reset_index(drop=True)
The problem is that I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte
on the append line. I have tried calling .decode('utf-8') and .decode('windows-1252') but I receive similar errors. Can anyone help me figure out what is wrong?
python pandas zip
add a comment |
I have some code to download some zipped csv files, unzip them and then concatenate the data into a single dataframe. The problem is that I get the error
import pandas as pd
import requests
from io import BytesIO
from zipfile import ZipFile
from bs4 import BeautifulSoup
def findZipLinks(url):
r = requests.get(url)
bs = BeautifulSoup(r.content, features="html.parser")
links = [agecaredata_url + a.get('data-link') for a in bs.findAll('a', "class": "downloadhrefp_lt_WebPartZone6_znMC_pageplaceholder_p_lt_WebPartZone2_ZoneA_znPublicationFooterItem_znPublicationFooterItem_zone_Stacker_MultiColumns u-dtb u-w100p u-bgc-primary u-c-fff c-publication__download u-mb-gutter0p25x") if "zip" in a.get("data-link")]
return links
exits = findZipLinks('https://www.gen-agedcaredata.gov.au/Resources/Access-data/2018/June/GEN-data-People-leaving-aged-care')
dfs =
for exit_url in exits:
r = requests.get(exit_url)
zipfile = ZipFile(BytesIO(r.content))
dfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
pd.concat(df for df in dfs).reset_index(drop=True)
The problem is that I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte
on the append line. I have tried calling .decode('utf-8') and .decode('windows-1252') but I receive similar errors. Can anyone help me figure out what is wrong?
python pandas zip
What line are you referring to that gives you the error?
– ritlew
Nov 13 '18 at 16:17
The error is ondfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
– oska boska
Nov 14 '18 at 8:57
add a comment |
I have some code to download some zipped csv files, unzip them and then concatenate the data into a single dataframe. The problem is that I get the error
import pandas as pd
import requests
from io import BytesIO
from zipfile import ZipFile
from bs4 import BeautifulSoup
def findZipLinks(url):
r = requests.get(url)
bs = BeautifulSoup(r.content, features="html.parser")
links = [agecaredata_url + a.get('data-link') for a in bs.findAll('a', "class": "downloadhrefp_lt_WebPartZone6_znMC_pageplaceholder_p_lt_WebPartZone2_ZoneA_znPublicationFooterItem_znPublicationFooterItem_zone_Stacker_MultiColumns u-dtb u-w100p u-bgc-primary u-c-fff c-publication__download u-mb-gutter0p25x") if "zip" in a.get("data-link")]
return links
exits = findZipLinks('https://www.gen-agedcaredata.gov.au/Resources/Access-data/2018/June/GEN-data-People-leaving-aged-care')
dfs =
for exit_url in exits:
r = requests.get(exit_url)
zipfile = ZipFile(BytesIO(r.content))
dfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
pd.concat(df for df in dfs).reset_index(drop=True)
The problem is that I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte
on the append line. I have tried calling .decode('utf-8') and .decode('windows-1252') but I receive similar errors. Can anyone help me figure out what is wrong?
python pandas zip
I have some code to download some zipped csv files, unzip them and then concatenate the data into a single dataframe. The problem is that I get the error
import pandas as pd
import requests
from io import BytesIO
from zipfile import ZipFile
from bs4 import BeautifulSoup
def findZipLinks(url):
r = requests.get(url)
bs = BeautifulSoup(r.content, features="html.parser")
links = [agecaredata_url + a.get('data-link') for a in bs.findAll('a', "class": "downloadhrefp_lt_WebPartZone6_znMC_pageplaceholder_p_lt_WebPartZone2_ZoneA_znPublicationFooterItem_znPublicationFooterItem_zone_Stacker_MultiColumns u-dtb u-w100p u-bgc-primary u-c-fff c-publication__download u-mb-gutter0p25x") if "zip" in a.get("data-link")]
return links
exits = findZipLinks('https://www.gen-agedcaredata.gov.au/Resources/Access-data/2018/June/GEN-data-People-leaving-aged-care')
dfs =
for exit_url in exits:
r = requests.get(exit_url)
zipfile = ZipFile(BytesIO(r.content))
dfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
pd.concat(df for df in dfs).reset_index(drop=True)
The problem is that I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte
on the append line. I have tried calling .decode('utf-8') and .decode('windows-1252') but I receive similar errors. Can anyone help me figure out what is wrong?
python pandas zip
python pandas zip
edited Nov 14 '18 at 10:43
kcorlidy
2,2102418
2,2102418
asked Nov 13 '18 at 16:14
oska boskaoska boska
178
178
What line are you referring to that gives you the error?
– ritlew
Nov 13 '18 at 16:17
The error is ondfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
– oska boska
Nov 14 '18 at 8:57
add a comment |
What line are you referring to that gives you the error?
– ritlew
Nov 13 '18 at 16:17
The error is ondfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
– oska boska
Nov 14 '18 at 8:57
What line are you referring to that gives you the error?
– ritlew
Nov 13 '18 at 16:17
What line are you referring to that gives you the error?
– ritlew
Nov 13 '18 at 16:17
The error is on
dfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
– oska boska
Nov 14 '18 at 8:57
The error is on
dfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
– oska boska
Nov 14 '18 at 8:57
add a comment |
1 Answer
1
active
oldest
votes
When you read the file specify the read mode as wb
zipfile.open(zipfile.namelist()[0], 'wb')
When I do this, I get the errorValueError: open() requires mode "r" or "w"
. Do I need to add anything else to make this work?
– oska boska
Nov 14 '18 at 9:03
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285154%2fpython3-unicodedecodeerror-when-reading-contents-of-unzipped-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
When you read the file specify the read mode as wb
zipfile.open(zipfile.namelist()[0], 'wb')
When I do this, I get the errorValueError: open() requires mode "r" or "w"
. Do I need to add anything else to make this work?
– oska boska
Nov 14 '18 at 9:03
add a comment |
When you read the file specify the read mode as wb
zipfile.open(zipfile.namelist()[0], 'wb')
When I do this, I get the errorValueError: open() requires mode "r" or "w"
. Do I need to add anything else to make this work?
– oska boska
Nov 14 '18 at 9:03
add a comment |
When you read the file specify the read mode as wb
zipfile.open(zipfile.namelist()[0], 'wb')
When you read the file specify the read mode as wb
zipfile.open(zipfile.namelist()[0], 'wb')
answered Nov 13 '18 at 16:28
F.LavenirF.Lavenir
33
33
When I do this, I get the errorValueError: open() requires mode "r" or "w"
. Do I need to add anything else to make this work?
– oska boska
Nov 14 '18 at 9:03
add a comment |
When I do this, I get the errorValueError: open() requires mode "r" or "w"
. Do I need to add anything else to make this work?
– oska boska
Nov 14 '18 at 9:03
When I do this, I get the error
ValueError: open() requires mode "r" or "w"
. Do I need to add anything else to make this work?– oska boska
Nov 14 '18 at 9:03
When I do this, I get the error
ValueError: open() requires mode "r" or "w"
. Do I need to add anything else to make this work?– oska boska
Nov 14 '18 at 9:03
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285154%2fpython3-unicodedecodeerror-when-reading-contents-of-unzipped-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What line are you referring to that gives you the error?
– ritlew
Nov 13 '18 at 16:17
The error is on
dfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
– oska boska
Nov 14 '18 at 8:57