zipfile header language encoding bit set differently between Python2 and Python3









up vote
3
down vote

favorite
1












I would like this code to work the same when run with Python 2 or Python 3



from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)


However, under Python 2 out.zip starts:



50 4b 03 04 14 00 00 08


Under Python3, it starts:



50 4b 03 04 14 00 00 00


The differing part is flag_bits, set to 0x800 for Python 2, 0x00 for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii") throws.



I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().



I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.



EDIT: Updated to add the info.flag_bits = 0x800 line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:



pythonX test.py









share|improve this question























  • Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
    – Algorithmic Canary
    Nov 12 at 1:06










  • Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
    – martineau
    Nov 12 at 2:50











  • @martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
    – Keeely
    Nov 12 at 9:36










  • Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
    – martineau
    Nov 12 at 9:42










  • @martineau, indeed that is my last resort, but it's a pretty horrible solution.
    – Keeely
    Nov 12 at 9:47














up vote
3
down vote

favorite
1












I would like this code to work the same when run with Python 2 or Python 3



from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)


However, under Python 2 out.zip starts:



50 4b 03 04 14 00 00 08


Under Python3, it starts:



50 4b 03 04 14 00 00 00


The differing part is flag_bits, set to 0x800 for Python 2, 0x00 for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii") throws.



I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().



I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.



EDIT: Updated to add the info.flag_bits = 0x800 line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:



pythonX test.py









share|improve this question























  • Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
    – Algorithmic Canary
    Nov 12 at 1:06










  • Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
    – martineau
    Nov 12 at 2:50











  • @martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
    – Keeely
    Nov 12 at 9:36










  • Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
    – martineau
    Nov 12 at 9:42










  • @martineau, indeed that is my last resort, but it's a pretty horrible solution.
    – Keeely
    Nov 12 at 9:47












up vote
3
down vote

favorite
1









up vote
3
down vote

favorite
1






1





I would like this code to work the same when run with Python 2 or Python 3



from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)


However, under Python 2 out.zip starts:



50 4b 03 04 14 00 00 08


Under Python3, it starts:



50 4b 03 04 14 00 00 00


The differing part is flag_bits, set to 0x800 for Python 2, 0x00 for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii") throws.



I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().



I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.



EDIT: Updated to add the info.flag_bits = 0x800 line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:



pythonX test.py









share|improve this question















I would like this code to work the same when run with Python 2 or Python 3



from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
info.file_size = len(content)
zf.writestr(info, content)


However, under Python 2 out.zip starts:



50 4b 03 04 14 00 00 08


Under Python3, it starts:



50 4b 03 04 14 00 00 00


The differing part is flag_bits, set to 0x800 for Python 2, 0x00 for Python 3. That's BIT11: language encoding. BIT11 seems to get set if filename.encode("ascii") throws.



I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().



I wonder if anyone here has a good solution. Ideally I'd like both outputs to have the flag set, because that mirrors what the jar utility does.



EDIT: Updated to add the info.flag_bits = 0x800 line just to spell out what I'm trying to achieve. I've reproduced this on Windows:
ActivePython 3.6.0.3600, vs ActivePython 2.7.14.2717, Windows 10.
And on Linux:
Python 3.6.6 vs Python 2.7.11
In case it matters, I am running this exactly as my example, no hashbang, invoking the interpreter directly:



pythonX test.py






python python-2.7 zipfile python-3.7






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 at 9:52

























asked Nov 12 at 0:29









Keeely

30129




30129











  • Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
    – Algorithmic Canary
    Nov 12 at 1:06










  • Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
    – martineau
    Nov 12 at 2:50











  • @martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
    – Keeely
    Nov 12 at 9:36










  • Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
    – martineau
    Nov 12 at 9:42










  • @martineau, indeed that is my last resort, but it's a pretty horrible solution.
    – Keeely
    Nov 12 at 9:47
















  • Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
    – Algorithmic Canary
    Nov 12 at 1:06










  • Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
    – martineau
    Nov 12 at 2:50











  • @martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
    – Keeely
    Nov 12 at 9:36










  • Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
    – martineau
    Nov 12 at 9:42










  • @martineau, indeed that is my last resort, but it's a pretty horrible solution.
    – Keeely
    Nov 12 at 9:47















Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
– Algorithmic Canary
Nov 12 at 1:06




Perhaps I am mistaken but I seem to get the output 50 4b 03 04 14 00 00 00 for both Python 2 and Python 3 on my Debian machine under Python 3.5.3 and Python 2.7.13
– Algorithmic Canary
Nov 12 at 1:06












Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 at 2:50





Likewise, it's the same output on Windows with Python 2 and 3 for me (as what you show for Python 3 in your question). Sounds like something OS-dependent. What are you running?
– martineau
Nov 12 at 2:50













@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 at 9:36




@martineau that's still not what I want, I want the bit set for both, I've changed my question as it wasn't so clear before. Thanks for testing this, it's useful feedback, perhaps you can post your versions.
– Keeely
Nov 12 at 9:36












Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 at 9:42




Keeely: Got it. Your code creates a file. You want make sure a bit is set at a certain offset in that file. Seems like if nothing else you could modify the file manually after it's created using binary file I/O.
– martineau
Nov 12 at 9:42












@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 at 9:47




@martineau, indeed that is my last resort, but it's a pretty horrible solution.
– Keeely
Nov 12 at 9:47












2 Answers
2






active

oldest

votes

















up vote
1
down vote













Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



$ cat zipf.py
from __future__ import print_function

from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
# don't set info.file_size here: zf.writestr() does that
zf.writestr(info, content)

with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print(':02x'.format(i), end=' ')
print()


Run as:



$ python2.7 zipf.py
50 4b 03 04 14 00 00 08


but:



$ python3.6 zipf.py
50 4b 03 04 14 00 00 00


It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



from __future__ import print_function

from zipfile import ZipFile, ZipInfo

with ZipFile("out.zip", 'w') as zf:
info = ZipInfo()
info.filename = "file.txt"
content = "content"
if not isinstance(content, bytes):
content = content.encode('utf8')
info.file_size = len(content)
with zf.open(info, 'w') as stream:
info.flag_bits = 0x800
stream.write(content)

with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print(':02x'.format(i), end=' ')
print()


It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



Original answer below



I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



def _encodeFilenameFlags(self):
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits


(Python 2.7 zipfile.py source) or:



def _encodeFilenameFlags(self):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800


(Python 3.6 zipfile.py source).



To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


(this notation works with both Python 2.7 and 3.6).




I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




If I add:



info.filename = "file.txt"
info.flag_bits |= 0x0800


(just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).






share|improve this answer






















  • Can you post your full code if you got the bit set for filename==file.txt with Python3?
    – Keeely
    Nov 12 at 9:34











  • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
    – torek
    Nov 12 at 9:55










  • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
    – Keeely
    Nov 12 at 10:33










  • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
    – torek
    Nov 12 at 11:26


















up vote
0
down vote













I am using something like this for the time being:



from zipfile import ZipFile, ZipInfo
import struct

orig_function = ZipInfo.FileHeader

def new_function(self, zip64=None):
header = orig_function(self, zip64)
fmt = "B"*len(header)
blist = list(struct.unpack(fmt, header))
blist[7] |= 0x8
return struct.pack(fmt, *blist)

setattr(ZipInfo, "FileHeader", new_function)

with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.file_size = len(content)
zf.writestr(info, content)


Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.






share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254622%2fzipfile-header-language-encoding-bit-set-differently-between-python2-and-python3%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



    $ cat zipf.py
    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.flag_bits = 0x800
    # don't set info.file_size here: zf.writestr() does that
    zf.writestr(info, content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    Run as:



    $ python2.7 zipf.py
    50 4b 03 04 14 00 00 08


    but:



    $ python3.6 zipf.py
    50 4b 03 04 14 00 00 00


    It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    info = ZipInfo()
    info.filename = "file.txt"
    content = "content"
    if not isinstance(content, bytes):
    content = content.encode('utf8')
    info.file_size = len(content)
    with zf.open(info, 'w') as stream:
    info.flag_bits = 0x800
    stream.write(content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



    Original answer below



    I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



    def _encodeFilenameFlags(self):
    if isinstance(self.filename, unicode):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
    else:
    return self.filename, self.flag_bits


    (Python 2.7 zipfile.py source) or:



    def _encodeFilenameFlags(self):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800


    (Python 3.6 zipfile.py source).



    To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



    info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


    (this notation works with both Python 2.7 and 3.6).




    I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




    If I add:



    info.filename = "file.txt"
    info.flag_bits |= 0x0800


    (just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).






    share|improve this answer






















    • Can you post your full code if you got the bit set for filename==file.txt with Python3?
      – Keeely
      Nov 12 at 9:34











    • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
      – torek
      Nov 12 at 9:55










    • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
      – Keeely
      Nov 12 at 10:33










    • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
      – torek
      Nov 12 at 11:26















    up vote
    1
    down vote













    Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



    $ cat zipf.py
    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.flag_bits = 0x800
    # don't set info.file_size here: zf.writestr() does that
    zf.writestr(info, content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    Run as:



    $ python2.7 zipf.py
    50 4b 03 04 14 00 00 08


    but:



    $ python3.6 zipf.py
    50 4b 03 04 14 00 00 00


    It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    info = ZipInfo()
    info.filename = "file.txt"
    content = "content"
    if not isinstance(content, bytes):
    content = content.encode('utf8')
    info.file_size = len(content)
    with zf.open(info, 'w') as stream:
    info.flag_bits = 0x800
    stream.write(content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



    Original answer below



    I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



    def _encodeFilenameFlags(self):
    if isinstance(self.filename, unicode):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
    else:
    return self.filename, self.flag_bits


    (Python 2.7 zipfile.py source) or:



    def _encodeFilenameFlags(self):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800


    (Python 3.6 zipfile.py source).



    To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



    info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


    (this notation works with both Python 2.7 and 3.6).




    I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




    If I add:



    info.filename = "file.txt"
    info.flag_bits |= 0x0800


    (just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).






    share|improve this answer






















    • Can you post your full code if you got the bit set for filename==file.txt with Python3?
      – Keeely
      Nov 12 at 9:34











    • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
      – torek
      Nov 12 at 9:55










    • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
      – Keeely
      Nov 12 at 10:33










    • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
      – torek
      Nov 12 at 11:26













    up vote
    1
    down vote










    up vote
    1
    down vote









    Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



    $ cat zipf.py
    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.flag_bits = 0x800
    # don't set info.file_size here: zf.writestr() does that
    zf.writestr(info, content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    Run as:



    $ python2.7 zipf.py
    50 4b 03 04 14 00 00 08


    but:



    $ python3.6 zipf.py
    50 4b 03 04 14 00 00 00


    It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    info = ZipInfo()
    info.filename = "file.txt"
    content = "content"
    if not isinstance(content, bytes):
    content = content.encode('utf8')
    info.file_size = len(content)
    with zf.open(info, 'w') as stream:
    info.flag_bits = 0x800
    stream.write(content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



    Original answer below



    I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



    def _encodeFilenameFlags(self):
    if isinstance(self.filename, unicode):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
    else:
    return self.filename, self.flag_bits


    (Python 2.7 zipfile.py source) or:



    def _encodeFilenameFlags(self):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800


    (Python 3.6 zipfile.py source).



    To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



    info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


    (this notation works with both Python 2.7 and 3.6).




    I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




    If I add:



    info.filename = "file.txt"
    info.flag_bits |= 0x0800


    (just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).






    share|improve this answer














    Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):



    $ cat zipf.py
    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.flag_bits = 0x800
    # don't set info.file_size here: zf.writestr() does that
    zf.writestr(info, content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    Run as:



    $ python2.7 zipf.py
    50 4b 03 04 14 00 00 08


    but:



    $ python3.6 zipf.py
    50 4b 03 04 14 00 00 00


    It's certainly possible to make it work, by making sure the file is opened before creating the info entry. However, then you must avoid writestr, and this only works with Python 3.6 (and seems rather abusive):



    from __future__ import print_function

    from zipfile import ZipFile, ZipInfo

    with ZipFile("out.zip", 'w') as zf:
    info = ZipInfo()
    info.filename = "file.txt"
    content = "content"
    if not isinstance(content, bytes):
    content = content.encode('utf8')
    info.file_size = len(content)
    with zf.open(info, 'w') as stream:
    info.flag_bits = 0x800
    stream.write(content)

    with open('out.zip', 'rb') as stream:
    byteseq = stream.read(8)
    for i in byteseq:
    if isinstance(i, str): i = ord(i)
    print(':02x'.format(i), end=' ')
    print()


    It's probably the case that 3.6 resetting all the info.flag_bits (through the internal open that it does) is just incorrect, although it's not really clear to me.



    Original answer below



    I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:



    def _encodeFilenameFlags(self):
    if isinstance(self.filename, unicode):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
    else:
    return self.filename, self.flag_bits


    (Python 2.7 zipfile.py source) or:



    def _encodeFilenameFlags(self):
    try:
    return self.filename.encode('ascii'), self.flag_bits
    except UnicodeEncodeError:
    return self.filename.encode('utf-8'), self.flag_bits | 0x800


    (Python 3.6 zipfile.py source).



    To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:



    info.filename = u"schNlatin small letter o with diaeresisn" # "file.txt"


    (this notation works with both Python 2.7 and 3.6).




    I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().




    If I add:



    info.filename = "file.txt"
    info.flag_bits |= 0x0800


    (just after setting the filename to u"schön") and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 12 at 11:59

























    answered Nov 12 at 1:09









    torek

    181k17231313




    181k17231313











    • Can you post your full code if you got the bit set for filename==file.txt with Python3?
      – Keeely
      Nov 12 at 9:34











    • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
      – torek
      Nov 12 at 9:55










    • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
      – Keeely
      Nov 12 at 10:33










    • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
      – torek
      Nov 12 at 11:26

















    • Can you post your full code if you got the bit set for filename==file.txt with Python3?
      – Keeely
      Nov 12 at 9:34











    • @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
      – torek
      Nov 12 at 9:55










    • thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
      – Keeely
      Nov 12 at 10:33










    • One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
      – torek
      Nov 12 at 11:26
















    Can you post your full code if you got the bit set for filename==file.txt with Python3?
    – Keeely
    Nov 12 at 9:34





    Can you post your full code if you got the bit set for filename==file.txt with Python3?
    – Keeely
    Nov 12 at 9:34













    @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
    – torek
    Nov 12 at 9:55




    @Keeely: I deleted it after posting, but I started by copying your sample from before the last edit. It essentially matched your current sample. I ran it on FreeBSD but the behavior should be the same as long as the zipfile library code is the same...
    – torek
    Nov 12 at 9:55












    thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
    – Keeely
    Nov 12 at 10:33




    thanks, but can I have your precise major+ minor versions for all Pythons used. I have up-voted the post, but at the moment it doesn't exactly give a solution (bit set for both Python versions) so cannot accept.
    – Keeely
    Nov 12 at 10:33












    One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
    – torek
    Nov 12 at 11:26





    One is sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0), the other is sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0). Let me try re-creating the test, too.
    – torek
    Nov 12 at 11:26













    up vote
    0
    down vote













    I am using something like this for the time being:



    from zipfile import ZipFile, ZipInfo
    import struct

    orig_function = ZipInfo.FileHeader

    def new_function(self, zip64=None):
    header = orig_function(self, zip64)
    fmt = "B"*len(header)
    blist = list(struct.unpack(fmt, header))
    blist[7] |= 0x8
    return struct.pack(fmt, *blist)

    setattr(ZipInfo, "FileHeader", new_function)

    with ZipFile("out.zip", 'w') as zf:
    content = "content"
    info = ZipInfo()
    info.filename = "file.txt"
    info.file_size = len(content)
    zf.writestr(info, content)


    Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.






    share|improve this answer
























      up vote
      0
      down vote













      I am using something like this for the time being:



      from zipfile import ZipFile, ZipInfo
      import struct

      orig_function = ZipInfo.FileHeader

      def new_function(self, zip64=None):
      header = orig_function(self, zip64)
      fmt = "B"*len(header)
      blist = list(struct.unpack(fmt, header))
      blist[7] |= 0x8
      return struct.pack(fmt, *blist)

      setattr(ZipInfo, "FileHeader", new_function)

      with ZipFile("out.zip", 'w') as zf:
      content = "content"
      info = ZipInfo()
      info.filename = "file.txt"
      info.file_size = len(content)
      zf.writestr(info, content)


      Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.






      share|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        I am using something like this for the time being:



        from zipfile import ZipFile, ZipInfo
        import struct

        orig_function = ZipInfo.FileHeader

        def new_function(self, zip64=None):
        header = orig_function(self, zip64)
        fmt = "B"*len(header)
        blist = list(struct.unpack(fmt, header))
        blist[7] |= 0x8
        return struct.pack(fmt, *blist)

        setattr(ZipInfo, "FileHeader", new_function)

        with ZipFile("out.zip", 'w') as zf:
        content = "content"
        info = ZipInfo()
        info.filename = "file.txt"
        info.file_size = len(content)
        zf.writestr(info, content)


        Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.






        share|improve this answer












        I am using something like this for the time being:



        from zipfile import ZipFile, ZipInfo
        import struct

        orig_function = ZipInfo.FileHeader

        def new_function(self, zip64=None):
        header = orig_function(self, zip64)
        fmt = "B"*len(header)
        blist = list(struct.unpack(fmt, header))
        blist[7] |= 0x8
        return struct.pack(fmt, *blist)

        setattr(ZipInfo, "FileHeader", new_function)

        with ZipFile("out.zip", 'w') as zf:
        content = "content"
        info = ZipInfo()
        info.filename = "file.txt"
        info.file_size = len(content)
        zf.writestr(info, content)


        Hopefully it won't break too soon, FileHeader() seems like something that won't be changing in the future.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 12 at 13:48









        Keeely

        30129




        30129



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254622%2fzipfile-header-language-encoding-bit-set-differently-between-python2-and-python3%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            How to read a connectionString WITH PROVIDER in .NET Core?

            Node.js Script on GitHub Pages or Amazon S3

            Museum of Modern and Contemporary Art of Trento and Rovereto