Finding a bit pattern in a binary file using Python and memory map










3















I am processing a binary file that is not byte aligned at the start. Shortly in the file there is a 24 bit pattern 0xfaf330 that is a sync marker that marks subsequent byte aligned data. I am using Python mmap on the file and desire to use Python memoryview once the marker is found to process the remaining part of the file. So, how do I find the 24 bit pattern and then use mmap and memoryview from that point forward?










share|improve this question
























  • Is there are reason why you mmap the file and don't just open and stream it?

    – MisterMiyagi
    Nov 15 '18 at 12:51











  • The file is very large and memory mapping helps to manage it.

    – GAF
    Nov 15 '18 at 12:55











  • Using open will only buffer a portion of the file at any time. Do you need random access? Your description sounds ideal for stream processing.

    – MisterMiyagi
    Nov 15 '18 at 12:57











  • Subsequently, memoryview helps to process the remaining byte aligned data in chunks based on the file format specification.

    – GAF
    Nov 15 '18 at 12:57











  • The data read is subject to Python's regular garbage collection. Unless you hang on to it, it is reclaimed.

    – MisterMiyagi
    Nov 15 '18 at 13:22















3















I am processing a binary file that is not byte aligned at the start. Shortly in the file there is a 24 bit pattern 0xfaf330 that is a sync marker that marks subsequent byte aligned data. I am using Python mmap on the file and desire to use Python memoryview once the marker is found to process the remaining part of the file. So, how do I find the 24 bit pattern and then use mmap and memoryview from that point forward?










share|improve this question
























  • Is there are reason why you mmap the file and don't just open and stream it?

    – MisterMiyagi
    Nov 15 '18 at 12:51











  • The file is very large and memory mapping helps to manage it.

    – GAF
    Nov 15 '18 at 12:55











  • Using open will only buffer a portion of the file at any time. Do you need random access? Your description sounds ideal for stream processing.

    – MisterMiyagi
    Nov 15 '18 at 12:57











  • Subsequently, memoryview helps to process the remaining byte aligned data in chunks based on the file format specification.

    – GAF
    Nov 15 '18 at 12:57











  • The data read is subject to Python's regular garbage collection. Unless you hang on to it, it is reclaimed.

    – MisterMiyagi
    Nov 15 '18 at 13:22













3












3








3








I am processing a binary file that is not byte aligned at the start. Shortly in the file there is a 24 bit pattern 0xfaf330 that is a sync marker that marks subsequent byte aligned data. I am using Python mmap on the file and desire to use Python memoryview once the marker is found to process the remaining part of the file. So, how do I find the 24 bit pattern and then use mmap and memoryview from that point forward?










share|improve this question
















I am processing a binary file that is not byte aligned at the start. Shortly in the file there is a 24 bit pattern 0xfaf330 that is a sync marker that marks subsequent byte aligned data. I am using Python mmap on the file and desire to use Python memoryview once the marker is found to process the remaining part of the file. So, how do I find the 24 bit pattern and then use mmap and memoryview from that point forward?







python-3.x binaryfiles






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 12:52







GAF

















asked Nov 15 '18 at 12:49









GAFGAF

7624




7624












  • Is there are reason why you mmap the file and don't just open and stream it?

    – MisterMiyagi
    Nov 15 '18 at 12:51











  • The file is very large and memory mapping helps to manage it.

    – GAF
    Nov 15 '18 at 12:55











  • Using open will only buffer a portion of the file at any time. Do you need random access? Your description sounds ideal for stream processing.

    – MisterMiyagi
    Nov 15 '18 at 12:57











  • Subsequently, memoryview helps to process the remaining byte aligned data in chunks based on the file format specification.

    – GAF
    Nov 15 '18 at 12:57











  • The data read is subject to Python's regular garbage collection. Unless you hang on to it, it is reclaimed.

    – MisterMiyagi
    Nov 15 '18 at 13:22

















  • Is there are reason why you mmap the file and don't just open and stream it?

    – MisterMiyagi
    Nov 15 '18 at 12:51











  • The file is very large and memory mapping helps to manage it.

    – GAF
    Nov 15 '18 at 12:55











  • Using open will only buffer a portion of the file at any time. Do you need random access? Your description sounds ideal for stream processing.

    – MisterMiyagi
    Nov 15 '18 at 12:57











  • Subsequently, memoryview helps to process the remaining byte aligned data in chunks based on the file format specification.

    – GAF
    Nov 15 '18 at 12:57











  • The data read is subject to Python's regular garbage collection. Unless you hang on to it, it is reclaimed.

    – MisterMiyagi
    Nov 15 '18 at 13:22
















Is there are reason why you mmap the file and don't just open and stream it?

– MisterMiyagi
Nov 15 '18 at 12:51





Is there are reason why you mmap the file and don't just open and stream it?

– MisterMiyagi
Nov 15 '18 at 12:51













The file is very large and memory mapping helps to manage it.

– GAF
Nov 15 '18 at 12:55





The file is very large and memory mapping helps to manage it.

– GAF
Nov 15 '18 at 12:55













Using open will only buffer a portion of the file at any time. Do you need random access? Your description sounds ideal for stream processing.

– MisterMiyagi
Nov 15 '18 at 12:57





Using open will only buffer a portion of the file at any time. Do you need random access? Your description sounds ideal for stream processing.

– MisterMiyagi
Nov 15 '18 at 12:57













Subsequently, memoryview helps to process the remaining byte aligned data in chunks based on the file format specification.

– GAF
Nov 15 '18 at 12:57





Subsequently, memoryview helps to process the remaining byte aligned data in chunks based on the file format specification.

– GAF
Nov 15 '18 at 12:57













The data read is subject to Python's regular garbage collection. Unless you hang on to it, it is reclaimed.

– MisterMiyagi
Nov 15 '18 at 13:22





The data read is subject to Python's regular garbage collection. Unless you hang on to it, it is reclaimed.

– MisterMiyagi
Nov 15 '18 at 13:22












2 Answers
2






active

oldest

votes


















0














If you do not need random access, you can use open to stream the file. Using file.read, you can get consecutive bytes from the file. If your file were byte-aligned, you could directly search through it:



in_stream = open('/dev/urandom', 'rb')
# discard individual bytes until first marker byte
while in_stream.peek(1) != b'xfaxf3x30':
in_stream.read(1)
# in_stream is now positioned directly after the marker
print(in_stream.tell())


By default, open uses a small read buffer but never loads the entire file. You can stream through the file using further in_stream.read calls.



Alternatively, you can use the result of in_stream.tell() to jump to the correct position in an mmap'ed file.




Searching non-aligned bits



To manage non-byte aligned data, you must sift through bytes manually: bit-shifting allows to inspect sub-ranges of bytes. Note that Python only allows bit-shifting int, not bytes.



>>> pattern = 0xfaf330
>>> bin((pattern << 4) + 0b1011) # pattern shifted by 4 plus garbage
0b1111101011110011001100001011


You can use this to scan a window of bytes:



def find_bits(pattern: int, window: int, n: int):
"""Find an n-byte bit pattern in an n+1-byte window and return the offset"""
for offset in range(8):
window_slice = (window >> offset) & (2 ** (n*8) -1)
if pattern == window_slice:
return offset
raise IndexError('pattern not in window')


You can again use this to scan the file stream:



in_stream = open('/dev/urandom', 'rb')
# discard individual bytes until first marker byte
while True:
try:
offset = find_bits(
0xfaf330,
int.from_bytes(in_stream.peek(3)[:4], 'big'),
3
)
except IndexError:
in_stream.read(1)
else:
break
# in_stream is now positioned directly after the marker
print('byte-offset:', in_stream.tell(), 'bit-offset:', offset)


Alternatively, you can use binary representation to literally find the pattern in the window. Note that you have to mind padding of zero bits, so it is about the same work.




Reading non-aligned bits



Once you have the bit-offset, you can read-and-align data from the file. Basically, read one byte more than you need, then shift as needed:



def align_read(file, num_bytes: int, bit_offset: int):
if bit_offset == 0:
return file.read(num_bytes)
window = file.peek(num_bytes + 1)[:num_bytes + 1]
file.read(num_bytes)
data = (int.from_bytes(window, 'big') >> bit_offset) & (2 ** (num_bytes*8) - 1)
return data.to_bytes(num_bytes, 'big')





share|improve this answer

























  • This will not work because the beginning of the file is not byte aligned. Meaning that I could read several bytes and come to the sync marker but read just a few bits of it. A subsequent read of one byte would read another misaligned part of the sync marker. Therefore, the marker could be read and not recognized. Thanks for your suggestion.

    – GAF
    Nov 15 '18 at 13:23












  • @GAF Sorry, missed that one. AFAIK Python does not support a resolution smaller than bytes - neither for open nor mmap nor other means. You will have to bit-shift each chunk.

    – MisterMiyagi
    Nov 15 '18 at 13:33











  • Thanks. Appreciate the follow up.

    – GAF
    Nov 15 '18 at 14:15











  • @GAF Added a (working) draft how to handle the shifting to find the offset and re-align data. This is probably worth using Cython if your file is large and you read only small chunks at a time.

    – MisterMiyagi
    Nov 15 '18 at 14:29


















0














MisterMiyagi's answer is a good solution. Another solution uses the bitstring module.



aFile = open(someFilePath, 'rb')
aBinaryStream = bitstring.ConstBitStream(aFile)
aTuple = aBinaryStream.find('0b111110101111001100100000') #the sync marker


If found, the position in the file is moved to the found location. Then you can read byte aligned data.



aBuffer = aBinaryStream.read('bytes:1024') # to read 1024 bytes





share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53319850%2ffinding-a-bit-pattern-in-a-binary-file-using-python-and-memory-map%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    If you do not need random access, you can use open to stream the file. Using file.read, you can get consecutive bytes from the file. If your file were byte-aligned, you could directly search through it:



    in_stream = open('/dev/urandom', 'rb')
    # discard individual bytes until first marker byte
    while in_stream.peek(1) != b'xfaxf3x30':
    in_stream.read(1)
    # in_stream is now positioned directly after the marker
    print(in_stream.tell())


    By default, open uses a small read buffer but never loads the entire file. You can stream through the file using further in_stream.read calls.



    Alternatively, you can use the result of in_stream.tell() to jump to the correct position in an mmap'ed file.




    Searching non-aligned bits



    To manage non-byte aligned data, you must sift through bytes manually: bit-shifting allows to inspect sub-ranges of bytes. Note that Python only allows bit-shifting int, not bytes.



    >>> pattern = 0xfaf330
    >>> bin((pattern << 4) + 0b1011) # pattern shifted by 4 plus garbage
    0b1111101011110011001100001011


    You can use this to scan a window of bytes:



    def find_bits(pattern: int, window: int, n: int):
    """Find an n-byte bit pattern in an n+1-byte window and return the offset"""
    for offset in range(8):
    window_slice = (window >> offset) & (2 ** (n*8) -1)
    if pattern == window_slice:
    return offset
    raise IndexError('pattern not in window')


    You can again use this to scan the file stream:



    in_stream = open('/dev/urandom', 'rb')
    # discard individual bytes until first marker byte
    while True:
    try:
    offset = find_bits(
    0xfaf330,
    int.from_bytes(in_stream.peek(3)[:4], 'big'),
    3
    )
    except IndexError:
    in_stream.read(1)
    else:
    break
    # in_stream is now positioned directly after the marker
    print('byte-offset:', in_stream.tell(), 'bit-offset:', offset)


    Alternatively, you can use binary representation to literally find the pattern in the window. Note that you have to mind padding of zero bits, so it is about the same work.




    Reading non-aligned bits



    Once you have the bit-offset, you can read-and-align data from the file. Basically, read one byte more than you need, then shift as needed:



    def align_read(file, num_bytes: int, bit_offset: int):
    if bit_offset == 0:
    return file.read(num_bytes)
    window = file.peek(num_bytes + 1)[:num_bytes + 1]
    file.read(num_bytes)
    data = (int.from_bytes(window, 'big') >> bit_offset) & (2 ** (num_bytes*8) - 1)
    return data.to_bytes(num_bytes, 'big')





    share|improve this answer

























    • This will not work because the beginning of the file is not byte aligned. Meaning that I could read several bytes and come to the sync marker but read just a few bits of it. A subsequent read of one byte would read another misaligned part of the sync marker. Therefore, the marker could be read and not recognized. Thanks for your suggestion.

      – GAF
      Nov 15 '18 at 13:23












    • @GAF Sorry, missed that one. AFAIK Python does not support a resolution smaller than bytes - neither for open nor mmap nor other means. You will have to bit-shift each chunk.

      – MisterMiyagi
      Nov 15 '18 at 13:33











    • Thanks. Appreciate the follow up.

      – GAF
      Nov 15 '18 at 14:15











    • @GAF Added a (working) draft how to handle the shifting to find the offset and re-align data. This is probably worth using Cython if your file is large and you read only small chunks at a time.

      – MisterMiyagi
      Nov 15 '18 at 14:29















    0














    If you do not need random access, you can use open to stream the file. Using file.read, you can get consecutive bytes from the file. If your file were byte-aligned, you could directly search through it:



    in_stream = open('/dev/urandom', 'rb')
    # discard individual bytes until first marker byte
    while in_stream.peek(1) != b'xfaxf3x30':
    in_stream.read(1)
    # in_stream is now positioned directly after the marker
    print(in_stream.tell())


    By default, open uses a small read buffer but never loads the entire file. You can stream through the file using further in_stream.read calls.



    Alternatively, you can use the result of in_stream.tell() to jump to the correct position in an mmap'ed file.




    Searching non-aligned bits



    To manage non-byte aligned data, you must sift through bytes manually: bit-shifting allows to inspect sub-ranges of bytes. Note that Python only allows bit-shifting int, not bytes.



    >>> pattern = 0xfaf330
    >>> bin((pattern << 4) + 0b1011) # pattern shifted by 4 plus garbage
    0b1111101011110011001100001011


    You can use this to scan a window of bytes:



    def find_bits(pattern: int, window: int, n: int):
    """Find an n-byte bit pattern in an n+1-byte window and return the offset"""
    for offset in range(8):
    window_slice = (window >> offset) & (2 ** (n*8) -1)
    if pattern == window_slice:
    return offset
    raise IndexError('pattern not in window')


    You can again use this to scan the file stream:



    in_stream = open('/dev/urandom', 'rb')
    # discard individual bytes until first marker byte
    while True:
    try:
    offset = find_bits(
    0xfaf330,
    int.from_bytes(in_stream.peek(3)[:4], 'big'),
    3
    )
    except IndexError:
    in_stream.read(1)
    else:
    break
    # in_stream is now positioned directly after the marker
    print('byte-offset:', in_stream.tell(), 'bit-offset:', offset)


    Alternatively, you can use binary representation to literally find the pattern in the window. Note that you have to mind padding of zero bits, so it is about the same work.




    Reading non-aligned bits



    Once you have the bit-offset, you can read-and-align data from the file. Basically, read one byte more than you need, then shift as needed:



    def align_read(file, num_bytes: int, bit_offset: int):
    if bit_offset == 0:
    return file.read(num_bytes)
    window = file.peek(num_bytes + 1)[:num_bytes + 1]
    file.read(num_bytes)
    data = (int.from_bytes(window, 'big') >> bit_offset) & (2 ** (num_bytes*8) - 1)
    return data.to_bytes(num_bytes, 'big')





    share|improve this answer

























    • This will not work because the beginning of the file is not byte aligned. Meaning that I could read several bytes and come to the sync marker but read just a few bits of it. A subsequent read of one byte would read another misaligned part of the sync marker. Therefore, the marker could be read and not recognized. Thanks for your suggestion.

      – GAF
      Nov 15 '18 at 13:23












    • @GAF Sorry, missed that one. AFAIK Python does not support a resolution smaller than bytes - neither for open nor mmap nor other means. You will have to bit-shift each chunk.

      – MisterMiyagi
      Nov 15 '18 at 13:33











    • Thanks. Appreciate the follow up.

      – GAF
      Nov 15 '18 at 14:15











    • @GAF Added a (working) draft how to handle the shifting to find the offset and re-align data. This is probably worth using Cython if your file is large and you read only small chunks at a time.

      – MisterMiyagi
      Nov 15 '18 at 14:29













    0












    0








    0







    If you do not need random access, you can use open to stream the file. Using file.read, you can get consecutive bytes from the file. If your file were byte-aligned, you could directly search through it:



    in_stream = open('/dev/urandom', 'rb')
    # discard individual bytes until first marker byte
    while in_stream.peek(1) != b'xfaxf3x30':
    in_stream.read(1)
    # in_stream is now positioned directly after the marker
    print(in_stream.tell())


    By default, open uses a small read buffer but never loads the entire file. You can stream through the file using further in_stream.read calls.



    Alternatively, you can use the result of in_stream.tell() to jump to the correct position in an mmap'ed file.




    Searching non-aligned bits



    To manage non-byte aligned data, you must sift through bytes manually: bit-shifting allows to inspect sub-ranges of bytes. Note that Python only allows bit-shifting int, not bytes.



    >>> pattern = 0xfaf330
    >>> bin((pattern << 4) + 0b1011) # pattern shifted by 4 plus garbage
    0b1111101011110011001100001011


    You can use this to scan a window of bytes:



    def find_bits(pattern: int, window: int, n: int):
    """Find an n-byte bit pattern in an n+1-byte window and return the offset"""
    for offset in range(8):
    window_slice = (window >> offset) & (2 ** (n*8) -1)
    if pattern == window_slice:
    return offset
    raise IndexError('pattern not in window')


    You can again use this to scan the file stream:



    in_stream = open('/dev/urandom', 'rb')
    # discard individual bytes until first marker byte
    while True:
    try:
    offset = find_bits(
    0xfaf330,
    int.from_bytes(in_stream.peek(3)[:4], 'big'),
    3
    )
    except IndexError:
    in_stream.read(1)
    else:
    break
    # in_stream is now positioned directly after the marker
    print('byte-offset:', in_stream.tell(), 'bit-offset:', offset)


    Alternatively, you can use binary representation to literally find the pattern in the window. Note that you have to mind padding of zero bits, so it is about the same work.




    Reading non-aligned bits



    Once you have the bit-offset, you can read-and-align data from the file. Basically, read one byte more than you need, then shift as needed:



    def align_read(file, num_bytes: int, bit_offset: int):
    if bit_offset == 0:
    return file.read(num_bytes)
    window = file.peek(num_bytes + 1)[:num_bytes + 1]
    file.read(num_bytes)
    data = (int.from_bytes(window, 'big') >> bit_offset) & (2 ** (num_bytes*8) - 1)
    return data.to_bytes(num_bytes, 'big')





    share|improve this answer















    If you do not need random access, you can use open to stream the file. Using file.read, you can get consecutive bytes from the file. If your file were byte-aligned, you could directly search through it:



    in_stream = open('/dev/urandom', 'rb')
    # discard individual bytes until first marker byte
    while in_stream.peek(1) != b'xfaxf3x30':
    in_stream.read(1)
    # in_stream is now positioned directly after the marker
    print(in_stream.tell())


    By default, open uses a small read buffer but never loads the entire file. You can stream through the file using further in_stream.read calls.



    Alternatively, you can use the result of in_stream.tell() to jump to the correct position in an mmap'ed file.




    Searching non-aligned bits



    To manage non-byte aligned data, you must sift through bytes manually: bit-shifting allows to inspect sub-ranges of bytes. Note that Python only allows bit-shifting int, not bytes.



    >>> pattern = 0xfaf330
    >>> bin((pattern << 4) + 0b1011) # pattern shifted by 4 plus garbage
    0b1111101011110011001100001011


    You can use this to scan a window of bytes:



    def find_bits(pattern: int, window: int, n: int):
    """Find an n-byte bit pattern in an n+1-byte window and return the offset"""
    for offset in range(8):
    window_slice = (window >> offset) & (2 ** (n*8) -1)
    if pattern == window_slice:
    return offset
    raise IndexError('pattern not in window')


    You can again use this to scan the file stream:



    in_stream = open('/dev/urandom', 'rb')
    # discard individual bytes until first marker byte
    while True:
    try:
    offset = find_bits(
    0xfaf330,
    int.from_bytes(in_stream.peek(3)[:4], 'big'),
    3
    )
    except IndexError:
    in_stream.read(1)
    else:
    break
    # in_stream is now positioned directly after the marker
    print('byte-offset:', in_stream.tell(), 'bit-offset:', offset)


    Alternatively, you can use binary representation to literally find the pattern in the window. Note that you have to mind padding of zero bits, so it is about the same work.




    Reading non-aligned bits



    Once you have the bit-offset, you can read-and-align data from the file. Basically, read one byte more than you need, then shift as needed:



    def align_read(file, num_bytes: int, bit_offset: int):
    if bit_offset == 0:
    return file.read(num_bytes)
    window = file.peek(num_bytes + 1)[:num_bytes + 1]
    file.read(num_bytes)
    data = (int.from_bytes(window, 'big') >> bit_offset) & (2 ** (num_bytes*8) - 1)
    return data.to_bytes(num_bytes, 'big')






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 15 '18 at 14:27

























    answered Nov 15 '18 at 13:17









    MisterMiyagiMisterMiyagi

    8,0162446




    8,0162446












    • This will not work because the beginning of the file is not byte aligned. Meaning that I could read several bytes and come to the sync marker but read just a few bits of it. A subsequent read of one byte would read another misaligned part of the sync marker. Therefore, the marker could be read and not recognized. Thanks for your suggestion.

      – GAF
      Nov 15 '18 at 13:23












    • @GAF Sorry, missed that one. AFAIK Python does not support a resolution smaller than bytes - neither for open nor mmap nor other means. You will have to bit-shift each chunk.

      – MisterMiyagi
      Nov 15 '18 at 13:33











    • Thanks. Appreciate the follow up.

      – GAF
      Nov 15 '18 at 14:15











    • @GAF Added a (working) draft how to handle the shifting to find the offset and re-align data. This is probably worth using Cython if your file is large and you read only small chunks at a time.

      – MisterMiyagi
      Nov 15 '18 at 14:29

















    • This will not work because the beginning of the file is not byte aligned. Meaning that I could read several bytes and come to the sync marker but read just a few bits of it. A subsequent read of one byte would read another misaligned part of the sync marker. Therefore, the marker could be read and not recognized. Thanks for your suggestion.

      – GAF
      Nov 15 '18 at 13:23












    • @GAF Sorry, missed that one. AFAIK Python does not support a resolution smaller than bytes - neither for open nor mmap nor other means. You will have to bit-shift each chunk.

      – MisterMiyagi
      Nov 15 '18 at 13:33











    • Thanks. Appreciate the follow up.

      – GAF
      Nov 15 '18 at 14:15











    • @GAF Added a (working) draft how to handle the shifting to find the offset and re-align data. This is probably worth using Cython if your file is large and you read only small chunks at a time.

      – MisterMiyagi
      Nov 15 '18 at 14:29
















    This will not work because the beginning of the file is not byte aligned. Meaning that I could read several bytes and come to the sync marker but read just a few bits of it. A subsequent read of one byte would read another misaligned part of the sync marker. Therefore, the marker could be read and not recognized. Thanks for your suggestion.

    – GAF
    Nov 15 '18 at 13:23






    This will not work because the beginning of the file is not byte aligned. Meaning that I could read several bytes and come to the sync marker but read just a few bits of it. A subsequent read of one byte would read another misaligned part of the sync marker. Therefore, the marker could be read and not recognized. Thanks for your suggestion.

    – GAF
    Nov 15 '18 at 13:23














    @GAF Sorry, missed that one. AFAIK Python does not support a resolution smaller than bytes - neither for open nor mmap nor other means. You will have to bit-shift each chunk.

    – MisterMiyagi
    Nov 15 '18 at 13:33





    @GAF Sorry, missed that one. AFAIK Python does not support a resolution smaller than bytes - neither for open nor mmap nor other means. You will have to bit-shift each chunk.

    – MisterMiyagi
    Nov 15 '18 at 13:33













    Thanks. Appreciate the follow up.

    – GAF
    Nov 15 '18 at 14:15





    Thanks. Appreciate the follow up.

    – GAF
    Nov 15 '18 at 14:15













    @GAF Added a (working) draft how to handle the shifting to find the offset and re-align data. This is probably worth using Cython if your file is large and you read only small chunks at a time.

    – MisterMiyagi
    Nov 15 '18 at 14:29





    @GAF Added a (working) draft how to handle the shifting to find the offset and re-align data. This is probably worth using Cython if your file is large and you read only small chunks at a time.

    – MisterMiyagi
    Nov 15 '18 at 14:29













    0














    MisterMiyagi's answer is a good solution. Another solution uses the bitstring module.



    aFile = open(someFilePath, 'rb')
    aBinaryStream = bitstring.ConstBitStream(aFile)
    aTuple = aBinaryStream.find('0b111110101111001100100000') #the sync marker


    If found, the position in the file is moved to the found location. Then you can read byte aligned data.



    aBuffer = aBinaryStream.read('bytes:1024') # to read 1024 bytes





    share|improve this answer



























      0














      MisterMiyagi's answer is a good solution. Another solution uses the bitstring module.



      aFile = open(someFilePath, 'rb')
      aBinaryStream = bitstring.ConstBitStream(aFile)
      aTuple = aBinaryStream.find('0b111110101111001100100000') #the sync marker


      If found, the position in the file is moved to the found location. Then you can read byte aligned data.



      aBuffer = aBinaryStream.read('bytes:1024') # to read 1024 bytes





      share|improve this answer

























        0












        0








        0







        MisterMiyagi's answer is a good solution. Another solution uses the bitstring module.



        aFile = open(someFilePath, 'rb')
        aBinaryStream = bitstring.ConstBitStream(aFile)
        aTuple = aBinaryStream.find('0b111110101111001100100000') #the sync marker


        If found, the position in the file is moved to the found location. Then you can read byte aligned data.



        aBuffer = aBinaryStream.read('bytes:1024') # to read 1024 bytes





        share|improve this answer













        MisterMiyagi's answer is a good solution. Another solution uses the bitstring module.



        aFile = open(someFilePath, 'rb')
        aBinaryStream = bitstring.ConstBitStream(aFile)
        aTuple = aBinaryStream.find('0b111110101111001100100000') #the sync marker


        If found, the position in the file is moved to the found location. Then you can read byte aligned data.



        aBuffer = aBinaryStream.read('bytes:1024') # to read 1024 bytes






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 15 '18 at 20:16









        GAFGAF

        7624




        7624



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53319850%2ffinding-a-bit-pattern-in-a-binary-file-using-python-and-memory-map%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Barbados

            How to read a connectionString WITH PROVIDER in .NET Core?

            Node.js Script on GitHub Pages or Amazon S3