How to find utf8_string in another utf8_string using tinyutf8 in C++11?










1















I am using tinyutf8 C++ UTF-8 string library from
https://github.com/DuffsDevice/tinyutf8



I'm trying to call utf8_string::find_first_of passing a utf8_string as the first parameter.



This generates the following error:



error: no matching function for call to ‘utf8_string::find_first_of(utf8_string&, int&)’
int found_pos = haystack.find_first_of(needle, at_pos);
^
In file included from Phonemizer.cpp:8:0:
tinyutf8.h:1728:12: note: candidate: utf8_string::size_type utf8_string::find_first_of(const value_type*, utf8_string::size_type) const
size_type find_first_of( const value_type* str , size_type start_codepoint = 0 ) const ;
^~~~~~~~~~~~~
tinyutf8.h:1728:12: note: no known conversion for argument 1 from ‘utf8_string’ to ‘const value_type* aka const char32_t*’


How can I get a char32_t* from my utf8_string?
Alternatively, what other mechanism is there to find a utf8_string within another utf8_string?



Thanks!
Shawn










share|improve this question






















  • If you're not particular you can just search for the byte sequence. The standard library has lots of find functions. If you're particular you'll have to use a library to convert both search string and text to search in to a canonical form for Unicode, to ensure that characters like "é" (for example) are represented as the same sequence of code points.

    – Cheers and hth. - Alf
    Sep 15 '18 at 3:07












  • Thanks @Alf for the helpful comment. I started down the path of doing the byte sequence search, getting a raw iterator and working back to a codepoint index but then I realized I could use find instead of find_first_of which accepts a utf8_string parameter.

    – Shawn McMurdo
    Sep 19 '18 at 6:41
















1















I am using tinyutf8 C++ UTF-8 string library from
https://github.com/DuffsDevice/tinyutf8



I'm trying to call utf8_string::find_first_of passing a utf8_string as the first parameter.



This generates the following error:



error: no matching function for call to ‘utf8_string::find_first_of(utf8_string&, int&)’
int found_pos = haystack.find_first_of(needle, at_pos);
^
In file included from Phonemizer.cpp:8:0:
tinyutf8.h:1728:12: note: candidate: utf8_string::size_type utf8_string::find_first_of(const value_type*, utf8_string::size_type) const
size_type find_first_of( const value_type* str , size_type start_codepoint = 0 ) const ;
^~~~~~~~~~~~~
tinyutf8.h:1728:12: note: no known conversion for argument 1 from ‘utf8_string’ to ‘const value_type* aka const char32_t*’


How can I get a char32_t* from my utf8_string?
Alternatively, what other mechanism is there to find a utf8_string within another utf8_string?



Thanks!
Shawn










share|improve this question






















  • If you're not particular you can just search for the byte sequence. The standard library has lots of find functions. If you're particular you'll have to use a library to convert both search string and text to search in to a canonical form for Unicode, to ensure that characters like "é" (for example) are represented as the same sequence of code points.

    – Cheers and hth. - Alf
    Sep 15 '18 at 3:07












  • Thanks @Alf for the helpful comment. I started down the path of doing the byte sequence search, getting a raw iterator and working back to a codepoint index but then I realized I could use find instead of find_first_of which accepts a utf8_string parameter.

    – Shawn McMurdo
    Sep 19 '18 at 6:41














1












1








1








I am using tinyutf8 C++ UTF-8 string library from
https://github.com/DuffsDevice/tinyutf8



I'm trying to call utf8_string::find_first_of passing a utf8_string as the first parameter.



This generates the following error:



error: no matching function for call to ‘utf8_string::find_first_of(utf8_string&, int&)’
int found_pos = haystack.find_first_of(needle, at_pos);
^
In file included from Phonemizer.cpp:8:0:
tinyutf8.h:1728:12: note: candidate: utf8_string::size_type utf8_string::find_first_of(const value_type*, utf8_string::size_type) const
size_type find_first_of( const value_type* str , size_type start_codepoint = 0 ) const ;
^~~~~~~~~~~~~
tinyutf8.h:1728:12: note: no known conversion for argument 1 from ‘utf8_string’ to ‘const value_type* aka const char32_t*’


How can I get a char32_t* from my utf8_string?
Alternatively, what other mechanism is there to find a utf8_string within another utf8_string?



Thanks!
Shawn










share|improve this question














I am using tinyutf8 C++ UTF-8 string library from
https://github.com/DuffsDevice/tinyutf8



I'm trying to call utf8_string::find_first_of passing a utf8_string as the first parameter.



This generates the following error:



error: no matching function for call to ‘utf8_string::find_first_of(utf8_string&, int&)’
int found_pos = haystack.find_first_of(needle, at_pos);
^
In file included from Phonemizer.cpp:8:0:
tinyutf8.h:1728:12: note: candidate: utf8_string::size_type utf8_string::find_first_of(const value_type*, utf8_string::size_type) const
size_type find_first_of( const value_type* str , size_type start_codepoint = 0 ) const ;
^~~~~~~~~~~~~
tinyutf8.h:1728:12: note: no known conversion for argument 1 from ‘utf8_string’ to ‘const value_type* aka const char32_t*’


How can I get a char32_t* from my utf8_string?
Alternatively, what other mechanism is there to find a utf8_string within another utf8_string?



Thanks!
Shawn







c++ string c++11 unicode utf-8






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Sep 15 '18 at 1:46









Shawn McMurdoShawn McMurdo

61




61












  • If you're not particular you can just search for the byte sequence. The standard library has lots of find functions. If you're particular you'll have to use a library to convert both search string and text to search in to a canonical form for Unicode, to ensure that characters like "é" (for example) are represented as the same sequence of code points.

    – Cheers and hth. - Alf
    Sep 15 '18 at 3:07












  • Thanks @Alf for the helpful comment. I started down the path of doing the byte sequence search, getting a raw iterator and working back to a codepoint index but then I realized I could use find instead of find_first_of which accepts a utf8_string parameter.

    – Shawn McMurdo
    Sep 19 '18 at 6:41


















  • If you're not particular you can just search for the byte sequence. The standard library has lots of find functions. If you're particular you'll have to use a library to convert both search string and text to search in to a canonical form for Unicode, to ensure that characters like "é" (for example) are represented as the same sequence of code points.

    – Cheers and hth. - Alf
    Sep 15 '18 at 3:07












  • Thanks @Alf for the helpful comment. I started down the path of doing the byte sequence search, getting a raw iterator and working back to a codepoint index but then I realized I could use find instead of find_first_of which accepts a utf8_string parameter.

    – Shawn McMurdo
    Sep 19 '18 at 6:41

















If you're not particular you can just search for the byte sequence. The standard library has lots of find functions. If you're particular you'll have to use a library to convert both search string and text to search in to a canonical form for Unicode, to ensure that characters like "é" (for example) are represented as the same sequence of code points.

– Cheers and hth. - Alf
Sep 15 '18 at 3:07






If you're not particular you can just search for the byte sequence. The standard library has lots of find functions. If you're particular you'll have to use a library to convert both search string and text to search in to a canonical form for Unicode, to ensure that characters like "é" (for example) are represented as the same sequence of code points.

– Cheers and hth. - Alf
Sep 15 '18 at 3:07














Thanks @Alf for the helpful comment. I started down the path of doing the byte sequence search, getting a raw iterator and working back to a codepoint index but then I realized I could use find instead of find_first_of which accepts a utf8_string parameter.

– Shawn McMurdo
Sep 19 '18 at 6:41






Thanks @Alf for the helpful comment. I started down the path of doing the byte sequence search, getting a raw iterator and working back to a codepoint index but then I realized I could use find instead of find_first_of which accepts a utf8_string parameter.

– Shawn McMurdo
Sep 19 '18 at 6:41













1 Answer
1






active

oldest

votes


















1














Shawn, currently tiny_utf8 does not support find_first_of with a utf8_string as argument. However, to answer your second question: You can convert a utf8_string to a char32_t using utf8_string::to_wide_literal( &char32_buffer ).



I hope this helps at least a little bit (even though you said you fixed the problem yourself already, which I am glad to hear :D).



All the best,
Jakob






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52340945%2fhow-to-find-utf8-string-in-another-utf8-string-using-tinyutf8-in-c11%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Shawn, currently tiny_utf8 does not support find_first_of with a utf8_string as argument. However, to answer your second question: You can convert a utf8_string to a char32_t using utf8_string::to_wide_literal( &char32_buffer ).



    I hope this helps at least a little bit (even though you said you fixed the problem yourself already, which I am glad to hear :D).



    All the best,
    Jakob






    share|improve this answer



























      1














      Shawn, currently tiny_utf8 does not support find_first_of with a utf8_string as argument. However, to answer your second question: You can convert a utf8_string to a char32_t using utf8_string::to_wide_literal( &char32_buffer ).



      I hope this helps at least a little bit (even though you said you fixed the problem yourself already, which I am glad to hear :D).



      All the best,
      Jakob






      share|improve this answer

























        1












        1








        1







        Shawn, currently tiny_utf8 does not support find_first_of with a utf8_string as argument. However, to answer your second question: You can convert a utf8_string to a char32_t using utf8_string::to_wide_literal( &char32_buffer ).



        I hope this helps at least a little bit (even though you said you fixed the problem yourself already, which I am glad to hear :D).



        All the best,
        Jakob






        share|improve this answer













        Shawn, currently tiny_utf8 does not support find_first_of with a utf8_string as argument. However, to answer your second question: You can convert a utf8_string to a char32_t using utf8_string::to_wide_literal( &char32_buffer ).



        I hope this helps at least a little bit (even though you said you fixed the problem yourself already, which I am glad to hear :D).



        All the best,
        Jakob







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 21:11









        Jakob RiedleJakob Riedle

        1,5051318




        1,5051318





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52340945%2fhow-to-find-utf8-string-in-another-utf8-string-using-tinyutf8-in-c11%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            How to read a connectionString WITH PROVIDER in .NET Core?

            Node.js Script on GitHub Pages or Amazon S3

            Museum of Modern and Contemporary Art of Trento and Rovereto