awk: Why are spaces delimiting, instead of FPAT regexp










3















I'm attempting to split strings delimited by ',' except where the ',' is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:



awk -v FPAT='([^,]+)|(([^))+))' '
for (i=1; i<=NF; i++)
printf("%sn", $i)

' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
one
two
(1one),
three
four
(3three,
4four),
five
six,
seven
eight,
nine
ten
eleven
(8ten)


The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.



The output I want is:



one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)









share|improve this question




























    3















    I'm attempting to split strings delimited by ',' except where the ',' is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:



    awk -v FPAT='([^,]+)|(([^))+))' '
    for (i=1; i<=NF; i++)
    printf("%sn", $i)

    ' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
    one
    two
    (1one),
    three
    four
    (3three,
    4four),
    five
    six,
    seven
    eight,
    nine
    ten
    eleven
    (8ten)


    The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.



    The output I want is:



    one two (1one),
    three four (3three, 4four),
    five six,
    seven eight,
    nine ten eleven (8ten)









    share|improve this question


























      3












      3








      3


      1






      I'm attempting to split strings delimited by ',' except where the ',' is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:



      awk -v FPAT='([^,]+)|(([^))+))' '
      for (i=1; i<=NF; i++)
      printf("%sn", $i)

      ' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
      one
      two
      (1one),
      three
      four
      (3three,
      4four),
      five
      six,
      seven
      eight,
      nine
      ten
      eleven
      (8ten)


      The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.



      The output I want is:



      one two (1one),
      three four (3three, 4four),
      five six,
      seven eight,
      nine ten eleven (8ten)









      share|improve this question
















      I'm attempting to split strings delimited by ',' except where the ',' is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:



      awk -v FPAT='([^,]+)|(([^))+))' '
      for (i=1; i<=NF; i++)
      printf("%sn", $i)

      ' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
      one
      two
      (1one),
      three
      four
      (3three,
      4four),
      five
      six,
      seven
      eight,
      nine
      ten
      eleven
      (8ten)


      The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.



      The output I want is:



      one two (1one),
      three four (3three, 4four),
      five six,
      seven eight,
      nine ten eleven (8ten)






      regex awk






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 '18 at 7:37









      Inian

      39k63770




      39k63770










      asked Nov 13 '18 at 7:17









      dls49dls49

      376




      376






















          2 Answers
          2






          active

          oldest

          votes


















          3














          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.






          share|improve this answer

























          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.

            – dls49
            Nov 13 '18 at 8:01






          • 1





            That regex demo looks very useful. Thanks.

            – dls49
            Nov 14 '18 at 7:20


















          2














          Your code does not work because,




          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.


          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' ' for (i=1; i<=NF; ++i) print $i ' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:




          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,

            • Putting this in (...)? makes this match optional.



          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.


          And here is how to do it in mawk



          mawk ' gsub(/[^,(]*(([^)]*))?, /, "&n") 1' file



          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g' file





          share|improve this answer

























          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?

            – dls49
            Nov 13 '18 at 7:53











          • gawk 4.2.1, I'm gonna check out mawk now

            – oguzismail
            Nov 13 '18 at 7:55











          • @dls49 updated my answer, check it out.

            – oguzismail
            Nov 13 '18 at 8:04






          • 1





            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.

            – dls49
            Nov 13 '18 at 8:32











          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??

            – oguzismail
            Nov 13 '18 at 8:34










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275747%2fawk-why-are-spaces-delimiting-instead-of-fpat-regexp%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.






          share|improve this answer

























          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.

            – dls49
            Nov 13 '18 at 8:01






          • 1





            That regex demo looks very useful. Thanks.

            – dls49
            Nov 14 '18 at 7:20















          3














          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.






          share|improve this answer

























          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.

            – dls49
            Nov 13 '18 at 8:01






          • 1





            That regex demo looks very useful. Thanks.

            – dls49
            Nov 14 '18 at 7:20













          3












          3








          3







          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.






          share|improve this answer















          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 '18 at 7:46

























          answered Nov 13 '18 at 7:41









          anubhavaanubhava

          522k46317391




          522k46317391












          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.

            – dls49
            Nov 13 '18 at 8:01






          • 1





            That regex demo looks very useful. Thanks.

            – dls49
            Nov 14 '18 at 7:20

















          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.

            – dls49
            Nov 13 '18 at 8:01






          • 1





            That regex demo looks very useful. Thanks.

            – dls49
            Nov 14 '18 at 7:20
















          Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.

          – dls49
          Nov 13 '18 at 8:01





          Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.

          – dls49
          Nov 13 '18 at 8:01




          1




          1





          That regex demo looks very useful. Thanks.

          – dls49
          Nov 14 '18 at 7:20





          That regex demo looks very useful. Thanks.

          – dls49
          Nov 14 '18 at 7:20













          2














          Your code does not work because,




          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.


          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' ' for (i=1; i<=NF; ++i) print $i ' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:




          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,

            • Putting this in (...)? makes this match optional.



          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.


          And here is how to do it in mawk



          mawk ' gsub(/[^,(]*(([^)]*))?, /, "&n") 1' file



          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g' file





          share|improve this answer

























          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?

            – dls49
            Nov 13 '18 at 7:53











          • gawk 4.2.1, I'm gonna check out mawk now

            – oguzismail
            Nov 13 '18 at 7:55











          • @dls49 updated my answer, check it out.

            – oguzismail
            Nov 13 '18 at 8:04






          • 1





            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.

            – dls49
            Nov 13 '18 at 8:32











          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??

            – oguzismail
            Nov 13 '18 at 8:34















          2














          Your code does not work because,




          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.


          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' ' for (i=1; i<=NF; ++i) print $i ' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:




          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,

            • Putting this in (...)? makes this match optional.



          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.


          And here is how to do it in mawk



          mawk ' gsub(/[^,(]*(([^)]*))?, /, "&n") 1' file



          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g' file





          share|improve this answer

























          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?

            – dls49
            Nov 13 '18 at 7:53











          • gawk 4.2.1, I'm gonna check out mawk now

            – oguzismail
            Nov 13 '18 at 7:55











          • @dls49 updated my answer, check it out.

            – oguzismail
            Nov 13 '18 at 8:04






          • 1





            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.

            – dls49
            Nov 13 '18 at 8:32











          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??

            – oguzismail
            Nov 13 '18 at 8:34













          2












          2








          2







          Your code does not work because,




          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.


          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' ' for (i=1; i<=NF; ++i) print $i ' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:




          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,

            • Putting this in (...)? makes this match optional.



          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.


          And here is how to do it in mawk



          mawk ' gsub(/[^,(]*(([^)]*))?, /, "&n") 1' file



          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g' file





          share|improve this answer















          Your code does not work because,




          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.


          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' ' for (i=1; i<=NF; ++i) print $i ' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:




          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,

            • Putting this in (...)? makes this match optional.



          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.


          And here is how to do it in mawk



          mawk ' gsub(/[^,(]*(([^)]*))?, /, "&n") 1' file



          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g' file






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 '18 at 8:27

























          answered Nov 13 '18 at 7:40









          oguzismailoguzismail

          3,36531025




          3,36531025












          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?

            – dls49
            Nov 13 '18 at 7:53











          • gawk 4.2.1, I'm gonna check out mawk now

            – oguzismail
            Nov 13 '18 at 7:55











          • @dls49 updated my answer, check it out.

            – oguzismail
            Nov 13 '18 at 8:04






          • 1





            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.

            – dls49
            Nov 13 '18 at 8:32











          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??

            – oguzismail
            Nov 13 '18 at 8:34

















          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?

            – dls49
            Nov 13 '18 at 7:53











          • gawk 4.2.1, I'm gonna check out mawk now

            – oguzismail
            Nov 13 '18 at 7:55











          • @dls49 updated my answer, check it out.

            – oguzismail
            Nov 13 '18 at 8:04






          • 1





            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.

            – dls49
            Nov 13 '18 at 8:32











          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??

            – oguzismail
            Nov 13 '18 at 8:34
















          This does the same as my original output on my system (mawk 1.3.3). What version are you on?

          – dls49
          Nov 13 '18 at 7:53





          This does the same as my original output on my system (mawk 1.3.3). What version are you on?

          – dls49
          Nov 13 '18 at 7:53













          gawk 4.2.1, I'm gonna check out mawk now

          – oguzismail
          Nov 13 '18 at 7:55





          gawk 4.2.1, I'm gonna check out mawk now

          – oguzismail
          Nov 13 '18 at 7:55













          @dls49 updated my answer, check it out.

          – oguzismail
          Nov 13 '18 at 8:04





          @dls49 updated my answer, check it out.

          – oguzismail
          Nov 13 '18 at 8:04




          1




          1





          Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.

          – dls49
          Nov 13 '18 at 8:32





          Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.

          – dls49
          Nov 13 '18 at 8:32













          Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??

          – oguzismail
          Nov 13 '18 at 8:34





          Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??

          – oguzismail
          Nov 13 '18 at 8:34

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275747%2fawk-why-are-spaces-delimiting-instead-of-fpat-regexp%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          How to read a connectionString WITH PROVIDER in .NET Core?

          In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

          Museum of Modern and Contemporary Art of Trento and Rovereto