Powershell and UTF-8










1















I have an html file test.html created with atom which contains:




Testé encoding utf-8




When I read it with Powershell console (I'm using French Windows)



Get-Content -Raw test.html


I get back this:



Testé encoding utf-8


Why is the accent character not printing correctly?










share|improve this question




























    1















    I have an html file test.html created with atom which contains:




    Testé encoding utf-8




    When I read it with Powershell console (I'm using French Windows)



    Get-Content -Raw test.html


    I get back this:



    Testé encoding utf-8


    Why is the accent character not printing correctly?










    share|improve this question


























      1












      1








      1


      2






      I have an html file test.html created with atom which contains:




      Testé encoding utf-8




      When I read it with Powershell console (I'm using French Windows)



      Get-Content -Raw test.html


      I get back this:



      Testé encoding utf-8


      Why is the accent character not printing correctly?










      share|improve this question
















      I have an html file test.html created with atom which contains:




      Testé encoding utf-8




      When I read it with Powershell console (I'm using French Windows)



      Get-Content -Raw test.html


      I get back this:



      Testé encoding utf-8


      Why is the accent character not printing correctly?







      powershell utf-8 utf






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 2 '17 at 1:26









      AP.

      3,95821627




      3,95821627










      asked Mar 1 '17 at 21:38









      user310291user310291

      14.2k60205391




      14.2k60205391






















          2 Answers
          2






          active

          oldest

          votes


















          4















          • The Atom editor creates UTF-8 files without a pseudo-BOM by default (which is the right thing to do, from a cross-platform perspective).



            • Other popular cross-platform editors, such as Visual Studio Code and Sublime Text, behave the same way.



          • Windows PowerShell[1] only recognizes UTF-8 files with a pseudo-BOM.



            • In the absence of the pseudo-BOM, PowerShell interprets files as being formatted according to the system's legacy codepage, such as Windows-1252 on US systems, for instance.

              (This is also the default encoding used by Notepad, which it calls "ANSI", not just when reading files, but also when creating them. By contrast, PowerShell creates UTF-16LE-encoded files by default.)


          Therefore, in order for Get-Content to recognize a BOM-less UTF-8 file correctly in Windows PowerShell, you must use -Encoding utf8.




          [1] By contrast, the cross-platform PowerShell Core edition commendably defaults to UTF-8, both on reading and writing, so it does interpret UTF-8-encoded files correctly even without a BOM and by default also creates files without a BOM.






          share|improve this answer
































            1














            # Created a UTF-8 Sig File 
            notepad .test.html

            # Get File contents with/without -raw
            cat .test.html;Get-Content -Raw .test.html
            Testé encoding utf-8
            Testé encoding utf-8

            # Check Encoding to make sure
            Get-FileEncoding .test.html
            utf8


            As you can see, it definitely works in PowerShell v5 on Windows 10. I'd double check the file formatting and the contents of the file you created, as there may have been characters introduced which your editor might not pick up.



            If you do not have Get-FileEncoding as a cmdlet in your PowerShell, here is an implementation you can run:



            function Get-FileEncoding([Parameter(Mandatory=$True)]$Path) 
            $bytes = [byte](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

            if(!$bytes) return 'utf8'

            switch -regex ('0:x21:x22:x23:x2' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3])
            '^efbbbf' return 'utf8'
            '^2b2f76' return 'utf7'
            '^fffe' return 'unicode'
            '^feff' return 'bigendianunicode'
            '^0000feff' return 'utf32'
            default return 'ascii'







            share|improve this answer




















            • 1





              Get-FileEncoding is not recognized on my powershell though I'm on windows 10 ?

              – user310291
              Mar 1 '17 at 22:22











            • The OP created their file with GitHub's Atom editor, which creates UTF-8 files without a pseudo-BOM by default, and that's the cause of the problem. Notepad does not create UTF-8 files by default - it uses your system's legacy codepage by default (e.g, Windows-1252 on English-language systems), and so does PowerShell when reading a file without a BOM, that's why you didn't see the problem. As an aside: cat is just an alias for Get-Content on Windows, so there's no point in contrasting the two commands.

              – mklement0
              Mar 2 '17 at 0:40











            • Get-FileEncoding is not a standard cmdlet. The best way to examine the file is to use standard cmdlet Format-Hex (PSv5+) and study the raw bytes. I found two likely Get-FileEncoding sources: from here at poshcode.org or as part of the PowerShellCookbook module in the PowerShell Gallery. Neither version reports UTF-8 for me (Windows 10, PSv5.1): the former only looks for a BOM and reports ASCII if there's none (which is true for test.html); similarly, the latter falls back to UTF-7.

              – mklement0
              Mar 2 '17 at 4:18











            • Thanks for providing the Get-FileEncoding function. However, like the versions I linked to, it only looks at BOMs, and when it reports ascii, that really means "I don't know what the encoding is, because the file has no BOM" (and I'm slightly curious why a zero-byte file is utf8). However, it is sufficient to verify your claim that Notepad creates UTF-8 files by default: If I do what you state in your answer, using your function - having made sure that there's no preexisting file .test.html and pasting text Testé encoding utf-8, I get ascii, not utf8. What do you get?

              – mklement0
              Mar 3 '17 at 22:39






            • 1





              So I use Notepad2 and thus was able to change the file encoding to: UTF-8 Signature. Yes you are correct, since when I use the standard UTF-8 w/o signature, I get ascii from the function as well

              – AP.
              Mar 4 '17 at 22:43










            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f42542560%2fpowershell-and-utf-8%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            4















            • The Atom editor creates UTF-8 files without a pseudo-BOM by default (which is the right thing to do, from a cross-platform perspective).



              • Other popular cross-platform editors, such as Visual Studio Code and Sublime Text, behave the same way.



            • Windows PowerShell[1] only recognizes UTF-8 files with a pseudo-BOM.



              • In the absence of the pseudo-BOM, PowerShell interprets files as being formatted according to the system's legacy codepage, such as Windows-1252 on US systems, for instance.

                (This is also the default encoding used by Notepad, which it calls "ANSI", not just when reading files, but also when creating them. By contrast, PowerShell creates UTF-16LE-encoded files by default.)


            Therefore, in order for Get-Content to recognize a BOM-less UTF-8 file correctly in Windows PowerShell, you must use -Encoding utf8.




            [1] By contrast, the cross-platform PowerShell Core edition commendably defaults to UTF-8, both on reading and writing, so it does interpret UTF-8-encoded files correctly even without a BOM and by default also creates files without a BOM.






            share|improve this answer





























              4















              • The Atom editor creates UTF-8 files without a pseudo-BOM by default (which is the right thing to do, from a cross-platform perspective).



                • Other popular cross-platform editors, such as Visual Studio Code and Sublime Text, behave the same way.



              • Windows PowerShell[1] only recognizes UTF-8 files with a pseudo-BOM.



                • In the absence of the pseudo-BOM, PowerShell interprets files as being formatted according to the system's legacy codepage, such as Windows-1252 on US systems, for instance.

                  (This is also the default encoding used by Notepad, which it calls "ANSI", not just when reading files, but also when creating them. By contrast, PowerShell creates UTF-16LE-encoded files by default.)


              Therefore, in order for Get-Content to recognize a BOM-less UTF-8 file correctly in Windows PowerShell, you must use -Encoding utf8.




              [1] By contrast, the cross-platform PowerShell Core edition commendably defaults to UTF-8, both on reading and writing, so it does interpret UTF-8-encoded files correctly even without a BOM and by default also creates files without a BOM.






              share|improve this answer



























                4












                4








                4








                • The Atom editor creates UTF-8 files without a pseudo-BOM by default (which is the right thing to do, from a cross-platform perspective).



                  • Other popular cross-platform editors, such as Visual Studio Code and Sublime Text, behave the same way.



                • Windows PowerShell[1] only recognizes UTF-8 files with a pseudo-BOM.



                  • In the absence of the pseudo-BOM, PowerShell interprets files as being formatted according to the system's legacy codepage, such as Windows-1252 on US systems, for instance.

                    (This is also the default encoding used by Notepad, which it calls "ANSI", not just when reading files, but also when creating them. By contrast, PowerShell creates UTF-16LE-encoded files by default.)


                Therefore, in order for Get-Content to recognize a BOM-less UTF-8 file correctly in Windows PowerShell, you must use -Encoding utf8.




                [1] By contrast, the cross-platform PowerShell Core edition commendably defaults to UTF-8, both on reading and writing, so it does interpret UTF-8-encoded files correctly even without a BOM and by default also creates files without a BOM.






                share|improve this answer
















                • The Atom editor creates UTF-8 files without a pseudo-BOM by default (which is the right thing to do, from a cross-platform perspective).



                  • Other popular cross-platform editors, such as Visual Studio Code and Sublime Text, behave the same way.



                • Windows PowerShell[1] only recognizes UTF-8 files with a pseudo-BOM.



                  • In the absence of the pseudo-BOM, PowerShell interprets files as being formatted according to the system's legacy codepage, such as Windows-1252 on US systems, for instance.

                    (This is also the default encoding used by Notepad, which it calls "ANSI", not just when reading files, but also when creating them. By contrast, PowerShell creates UTF-16LE-encoded files by default.)


                Therefore, in order for Get-Content to recognize a BOM-less UTF-8 file correctly in Windows PowerShell, you must use -Encoding utf8.




                [1] By contrast, the cross-platform PowerShell Core edition commendably defaults to UTF-8, both on reading and writing, so it does interpret UTF-8-encoded files correctly even without a BOM and by default also creates files without a BOM.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 15 '18 at 14:20

























                answered Mar 1 '17 at 23:59









                mklement0mklement0

                136k22253290




                136k22253290























                    1














                    # Created a UTF-8 Sig File 
                    notepad .test.html

                    # Get File contents with/without -raw
                    cat .test.html;Get-Content -Raw .test.html
                    Testé encoding utf-8
                    Testé encoding utf-8

                    # Check Encoding to make sure
                    Get-FileEncoding .test.html
                    utf8


                    As you can see, it definitely works in PowerShell v5 on Windows 10. I'd double check the file formatting and the contents of the file you created, as there may have been characters introduced which your editor might not pick up.



                    If you do not have Get-FileEncoding as a cmdlet in your PowerShell, here is an implementation you can run:



                    function Get-FileEncoding([Parameter(Mandatory=$True)]$Path) 
                    $bytes = [byte](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

                    if(!$bytes) return 'utf8'

                    switch -regex ('0:x21:x22:x23:x2' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3])
                    '^efbbbf' return 'utf8'
                    '^2b2f76' return 'utf7'
                    '^fffe' return 'unicode'
                    '^feff' return 'bigendianunicode'
                    '^0000feff' return 'utf32'
                    default return 'ascii'







                    share|improve this answer




















                    • 1





                      Get-FileEncoding is not recognized on my powershell though I'm on windows 10 ?

                      – user310291
                      Mar 1 '17 at 22:22











                    • The OP created their file with GitHub's Atom editor, which creates UTF-8 files without a pseudo-BOM by default, and that's the cause of the problem. Notepad does not create UTF-8 files by default - it uses your system's legacy codepage by default (e.g, Windows-1252 on English-language systems), and so does PowerShell when reading a file without a BOM, that's why you didn't see the problem. As an aside: cat is just an alias for Get-Content on Windows, so there's no point in contrasting the two commands.

                      – mklement0
                      Mar 2 '17 at 0:40











                    • Get-FileEncoding is not a standard cmdlet. The best way to examine the file is to use standard cmdlet Format-Hex (PSv5+) and study the raw bytes. I found two likely Get-FileEncoding sources: from here at poshcode.org or as part of the PowerShellCookbook module in the PowerShell Gallery. Neither version reports UTF-8 for me (Windows 10, PSv5.1): the former only looks for a BOM and reports ASCII if there's none (which is true for test.html); similarly, the latter falls back to UTF-7.

                      – mklement0
                      Mar 2 '17 at 4:18











                    • Thanks for providing the Get-FileEncoding function. However, like the versions I linked to, it only looks at BOMs, and when it reports ascii, that really means "I don't know what the encoding is, because the file has no BOM" (and I'm slightly curious why a zero-byte file is utf8). However, it is sufficient to verify your claim that Notepad creates UTF-8 files by default: If I do what you state in your answer, using your function - having made sure that there's no preexisting file .test.html and pasting text Testé encoding utf-8, I get ascii, not utf8. What do you get?

                      – mklement0
                      Mar 3 '17 at 22:39






                    • 1





                      So I use Notepad2 and thus was able to change the file encoding to: UTF-8 Signature. Yes you are correct, since when I use the standard UTF-8 w/o signature, I get ascii from the function as well

                      – AP.
                      Mar 4 '17 at 22:43















                    1














                    # Created a UTF-8 Sig File 
                    notepad .test.html

                    # Get File contents with/without -raw
                    cat .test.html;Get-Content -Raw .test.html
                    Testé encoding utf-8
                    Testé encoding utf-8

                    # Check Encoding to make sure
                    Get-FileEncoding .test.html
                    utf8


                    As you can see, it definitely works in PowerShell v5 on Windows 10. I'd double check the file formatting and the contents of the file you created, as there may have been characters introduced which your editor might not pick up.



                    If you do not have Get-FileEncoding as a cmdlet in your PowerShell, here is an implementation you can run:



                    function Get-FileEncoding([Parameter(Mandatory=$True)]$Path) 
                    $bytes = [byte](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

                    if(!$bytes) return 'utf8'

                    switch -regex ('0:x21:x22:x23:x2' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3])
                    '^efbbbf' return 'utf8'
                    '^2b2f76' return 'utf7'
                    '^fffe' return 'unicode'
                    '^feff' return 'bigendianunicode'
                    '^0000feff' return 'utf32'
                    default return 'ascii'







                    share|improve this answer




















                    • 1





                      Get-FileEncoding is not recognized on my powershell though I'm on windows 10 ?

                      – user310291
                      Mar 1 '17 at 22:22











                    • The OP created their file with GitHub's Atom editor, which creates UTF-8 files without a pseudo-BOM by default, and that's the cause of the problem. Notepad does not create UTF-8 files by default - it uses your system's legacy codepage by default (e.g, Windows-1252 on English-language systems), and so does PowerShell when reading a file without a BOM, that's why you didn't see the problem. As an aside: cat is just an alias for Get-Content on Windows, so there's no point in contrasting the two commands.

                      – mklement0
                      Mar 2 '17 at 0:40











                    • Get-FileEncoding is not a standard cmdlet. The best way to examine the file is to use standard cmdlet Format-Hex (PSv5+) and study the raw bytes. I found two likely Get-FileEncoding sources: from here at poshcode.org or as part of the PowerShellCookbook module in the PowerShell Gallery. Neither version reports UTF-8 for me (Windows 10, PSv5.1): the former only looks for a BOM and reports ASCII if there's none (which is true for test.html); similarly, the latter falls back to UTF-7.

                      – mklement0
                      Mar 2 '17 at 4:18











                    • Thanks for providing the Get-FileEncoding function. However, like the versions I linked to, it only looks at BOMs, and when it reports ascii, that really means "I don't know what the encoding is, because the file has no BOM" (and I'm slightly curious why a zero-byte file is utf8). However, it is sufficient to verify your claim that Notepad creates UTF-8 files by default: If I do what you state in your answer, using your function - having made sure that there's no preexisting file .test.html and pasting text Testé encoding utf-8, I get ascii, not utf8. What do you get?

                      – mklement0
                      Mar 3 '17 at 22:39






                    • 1





                      So I use Notepad2 and thus was able to change the file encoding to: UTF-8 Signature. Yes you are correct, since when I use the standard UTF-8 w/o signature, I get ascii from the function as well

                      – AP.
                      Mar 4 '17 at 22:43













                    1












                    1








                    1







                    # Created a UTF-8 Sig File 
                    notepad .test.html

                    # Get File contents with/without -raw
                    cat .test.html;Get-Content -Raw .test.html
                    Testé encoding utf-8
                    Testé encoding utf-8

                    # Check Encoding to make sure
                    Get-FileEncoding .test.html
                    utf8


                    As you can see, it definitely works in PowerShell v5 on Windows 10. I'd double check the file formatting and the contents of the file you created, as there may have been characters introduced which your editor might not pick up.



                    If you do not have Get-FileEncoding as a cmdlet in your PowerShell, here is an implementation you can run:



                    function Get-FileEncoding([Parameter(Mandatory=$True)]$Path) 
                    $bytes = [byte](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

                    if(!$bytes) return 'utf8'

                    switch -regex ('0:x21:x22:x23:x2' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3])
                    '^efbbbf' return 'utf8'
                    '^2b2f76' return 'utf7'
                    '^fffe' return 'unicode'
                    '^feff' return 'bigendianunicode'
                    '^0000feff' return 'utf32'
                    default return 'ascii'







                    share|improve this answer















                    # Created a UTF-8 Sig File 
                    notepad .test.html

                    # Get File contents with/without -raw
                    cat .test.html;Get-Content -Raw .test.html
                    Testé encoding utf-8
                    Testé encoding utf-8

                    # Check Encoding to make sure
                    Get-FileEncoding .test.html
                    utf8


                    As you can see, it definitely works in PowerShell v5 on Windows 10. I'd double check the file formatting and the contents of the file you created, as there may have been characters introduced which your editor might not pick up.



                    If you do not have Get-FileEncoding as a cmdlet in your PowerShell, here is an implementation you can run:



                    function Get-FileEncoding([Parameter(Mandatory=$True)]$Path) 
                    $bytes = [byte](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

                    if(!$bytes) return 'utf8'

                    switch -regex ('0:x21:x22:x23:x2' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3])
                    '^efbbbf' return 'utf8'
                    '^2b2f76' return 'utf7'
                    '^fffe' return 'unicode'
                    '^feff' return 'bigendianunicode'
                    '^0000feff' return 'utf32'
                    default return 'ascii'








                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Mar 3 '17 at 2:23

























                    answered Mar 1 '17 at 22:16









                    AP.AP.

                    3,95821627




                    3,95821627







                    • 1





                      Get-FileEncoding is not recognized on my powershell though I'm on windows 10 ?

                      – user310291
                      Mar 1 '17 at 22:22











                    • The OP created their file with GitHub's Atom editor, which creates UTF-8 files without a pseudo-BOM by default, and that's the cause of the problem. Notepad does not create UTF-8 files by default - it uses your system's legacy codepage by default (e.g, Windows-1252 on English-language systems), and so does PowerShell when reading a file without a BOM, that's why you didn't see the problem. As an aside: cat is just an alias for Get-Content on Windows, so there's no point in contrasting the two commands.

                      – mklement0
                      Mar 2 '17 at 0:40











                    • Get-FileEncoding is not a standard cmdlet. The best way to examine the file is to use standard cmdlet Format-Hex (PSv5+) and study the raw bytes. I found two likely Get-FileEncoding sources: from here at poshcode.org or as part of the PowerShellCookbook module in the PowerShell Gallery. Neither version reports UTF-8 for me (Windows 10, PSv5.1): the former only looks for a BOM and reports ASCII if there's none (which is true for test.html); similarly, the latter falls back to UTF-7.

                      – mklement0
                      Mar 2 '17 at 4:18











                    • Thanks for providing the Get-FileEncoding function. However, like the versions I linked to, it only looks at BOMs, and when it reports ascii, that really means "I don't know what the encoding is, because the file has no BOM" (and I'm slightly curious why a zero-byte file is utf8). However, it is sufficient to verify your claim that Notepad creates UTF-8 files by default: If I do what you state in your answer, using your function - having made sure that there's no preexisting file .test.html and pasting text Testé encoding utf-8, I get ascii, not utf8. What do you get?

                      – mklement0
                      Mar 3 '17 at 22:39






                    • 1





                      So I use Notepad2 and thus was able to change the file encoding to: UTF-8 Signature. Yes you are correct, since when I use the standard UTF-8 w/o signature, I get ascii from the function as well

                      – AP.
                      Mar 4 '17 at 22:43












                    • 1





                      Get-FileEncoding is not recognized on my powershell though I'm on windows 10 ?

                      – user310291
                      Mar 1 '17 at 22:22











                    • The OP created their file with GitHub's Atom editor, which creates UTF-8 files without a pseudo-BOM by default, and that's the cause of the problem. Notepad does not create UTF-8 files by default - it uses your system's legacy codepage by default (e.g, Windows-1252 on English-language systems), and so does PowerShell when reading a file without a BOM, that's why you didn't see the problem. As an aside: cat is just an alias for Get-Content on Windows, so there's no point in contrasting the two commands.

                      – mklement0
                      Mar 2 '17 at 0:40











                    • Get-FileEncoding is not a standard cmdlet. The best way to examine the file is to use standard cmdlet Format-Hex (PSv5+) and study the raw bytes. I found two likely Get-FileEncoding sources: from here at poshcode.org or as part of the PowerShellCookbook module in the PowerShell Gallery. Neither version reports UTF-8 for me (Windows 10, PSv5.1): the former only looks for a BOM and reports ASCII if there's none (which is true for test.html); similarly, the latter falls back to UTF-7.

                      – mklement0
                      Mar 2 '17 at 4:18











                    • Thanks for providing the Get-FileEncoding function. However, like the versions I linked to, it only looks at BOMs, and when it reports ascii, that really means "I don't know what the encoding is, because the file has no BOM" (and I'm slightly curious why a zero-byte file is utf8). However, it is sufficient to verify your claim that Notepad creates UTF-8 files by default: If I do what you state in your answer, using your function - having made sure that there's no preexisting file .test.html and pasting text Testé encoding utf-8, I get ascii, not utf8. What do you get?

                      – mklement0
                      Mar 3 '17 at 22:39






                    • 1





                      So I use Notepad2 and thus was able to change the file encoding to: UTF-8 Signature. Yes you are correct, since when I use the standard UTF-8 w/o signature, I get ascii from the function as well

                      – AP.
                      Mar 4 '17 at 22:43







                    1




                    1





                    Get-FileEncoding is not recognized on my powershell though I'm on windows 10 ?

                    – user310291
                    Mar 1 '17 at 22:22





                    Get-FileEncoding is not recognized on my powershell though I'm on windows 10 ?

                    – user310291
                    Mar 1 '17 at 22:22













                    The OP created their file with GitHub's Atom editor, which creates UTF-8 files without a pseudo-BOM by default, and that's the cause of the problem. Notepad does not create UTF-8 files by default - it uses your system's legacy codepage by default (e.g, Windows-1252 on English-language systems), and so does PowerShell when reading a file without a BOM, that's why you didn't see the problem. As an aside: cat is just an alias for Get-Content on Windows, so there's no point in contrasting the two commands.

                    – mklement0
                    Mar 2 '17 at 0:40





                    The OP created their file with GitHub's Atom editor, which creates UTF-8 files without a pseudo-BOM by default, and that's the cause of the problem. Notepad does not create UTF-8 files by default - it uses your system's legacy codepage by default (e.g, Windows-1252 on English-language systems), and so does PowerShell when reading a file without a BOM, that's why you didn't see the problem. As an aside: cat is just an alias for Get-Content on Windows, so there's no point in contrasting the two commands.

                    – mklement0
                    Mar 2 '17 at 0:40













                    Get-FileEncoding is not a standard cmdlet. The best way to examine the file is to use standard cmdlet Format-Hex (PSv5+) and study the raw bytes. I found two likely Get-FileEncoding sources: from here at poshcode.org or as part of the PowerShellCookbook module in the PowerShell Gallery. Neither version reports UTF-8 for me (Windows 10, PSv5.1): the former only looks for a BOM and reports ASCII if there's none (which is true for test.html); similarly, the latter falls back to UTF-7.

                    – mklement0
                    Mar 2 '17 at 4:18





                    Get-FileEncoding is not a standard cmdlet. The best way to examine the file is to use standard cmdlet Format-Hex (PSv5+) and study the raw bytes. I found two likely Get-FileEncoding sources: from here at poshcode.org or as part of the PowerShellCookbook module in the PowerShell Gallery. Neither version reports UTF-8 for me (Windows 10, PSv5.1): the former only looks for a BOM and reports ASCII if there's none (which is true for test.html); similarly, the latter falls back to UTF-7.

                    – mklement0
                    Mar 2 '17 at 4:18













                    Thanks for providing the Get-FileEncoding function. However, like the versions I linked to, it only looks at BOMs, and when it reports ascii, that really means "I don't know what the encoding is, because the file has no BOM" (and I'm slightly curious why a zero-byte file is utf8). However, it is sufficient to verify your claim that Notepad creates UTF-8 files by default: If I do what you state in your answer, using your function - having made sure that there's no preexisting file .test.html and pasting text Testé encoding utf-8, I get ascii, not utf8. What do you get?

                    – mklement0
                    Mar 3 '17 at 22:39





                    Thanks for providing the Get-FileEncoding function. However, like the versions I linked to, it only looks at BOMs, and when it reports ascii, that really means "I don't know what the encoding is, because the file has no BOM" (and I'm slightly curious why a zero-byte file is utf8). However, it is sufficient to verify your claim that Notepad creates UTF-8 files by default: If I do what you state in your answer, using your function - having made sure that there's no preexisting file .test.html and pasting text Testé encoding utf-8, I get ascii, not utf8. What do you get?

                    – mklement0
                    Mar 3 '17 at 22:39




                    1




                    1





                    So I use Notepad2 and thus was able to change the file encoding to: UTF-8 Signature. Yes you are correct, since when I use the standard UTF-8 w/o signature, I get ascii from the function as well

                    – AP.
                    Mar 4 '17 at 22:43





                    So I use Notepad2 and thus was able to change the file encoding to: UTF-8 Signature. Yes you are correct, since when I use the standard UTF-8 w/o signature, I get ascii from the function as well

                    – AP.
                    Mar 4 '17 at 22:43

















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f42542560%2fpowershell-and-utf-8%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    How to read a connectionString WITH PROVIDER in .NET Core?

                    In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

                    Museum of Modern and Contemporary Art of Trento and Rovereto