Conditional filtering using grepl and relative row position in group









up vote
1
down vote

favorite












I have a dataset similar to the following:



Journal_ref <- c("1111","2222","2222","2222","3333","3333","4444","4444")
Journal_type <- c("Adj","Rev","Adj","Rev","Rev","Rev","Adj","Adj")
Journal_value <- c(90,10000,12000,80,9000,500,65,2500)
Dataset <- data.frame(Journal_ref,Journal_type,Journal_value)


For each Journal_ref group I am seeking to filter/select rows based on the following conditions:



  • Where "Adj" is included within Journal_type, filter/select to return the last "Adj" row in the Journal_ref group, and

  • Where "Adj" is not included within Journal_type, filter/select to return the last "Rev" in the Journal_ref group

Based on the example above, the final output required would be:



Journal_ref Journal_type Journal_value
1111 Adj 90
2222 Adj 12000
3333 Rev 500
4444 Adj 2500


I have attempted using various combinations of group_by, filter, if, ifelse, grepl, select and slice with no success.



Any help would be appreciated, particularly using dplyr.










share|improve this question



























    up vote
    1
    down vote

    favorite












    I have a dataset similar to the following:



    Journal_ref <- c("1111","2222","2222","2222","3333","3333","4444","4444")
    Journal_type <- c("Adj","Rev","Adj","Rev","Rev","Rev","Adj","Adj")
    Journal_value <- c(90,10000,12000,80,9000,500,65,2500)
    Dataset <- data.frame(Journal_ref,Journal_type,Journal_value)


    For each Journal_ref group I am seeking to filter/select rows based on the following conditions:



    • Where "Adj" is included within Journal_type, filter/select to return the last "Adj" row in the Journal_ref group, and

    • Where "Adj" is not included within Journal_type, filter/select to return the last "Rev" in the Journal_ref group

    Based on the example above, the final output required would be:



    Journal_ref Journal_type Journal_value
    1111 Adj 90
    2222 Adj 12000
    3333 Rev 500
    4444 Adj 2500


    I have attempted using various combinations of group_by, filter, if, ifelse, grepl, select and slice with no success.



    Any help would be appreciated, particularly using dplyr.










    share|improve this question

























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have a dataset similar to the following:



      Journal_ref <- c("1111","2222","2222","2222","3333","3333","4444","4444")
      Journal_type <- c("Adj","Rev","Adj","Rev","Rev","Rev","Adj","Adj")
      Journal_value <- c(90,10000,12000,80,9000,500,65,2500)
      Dataset <- data.frame(Journal_ref,Journal_type,Journal_value)


      For each Journal_ref group I am seeking to filter/select rows based on the following conditions:



      • Where "Adj" is included within Journal_type, filter/select to return the last "Adj" row in the Journal_ref group, and

      • Where "Adj" is not included within Journal_type, filter/select to return the last "Rev" in the Journal_ref group

      Based on the example above, the final output required would be:



      Journal_ref Journal_type Journal_value
      1111 Adj 90
      2222 Adj 12000
      3333 Rev 500
      4444 Adj 2500


      I have attempted using various combinations of group_by, filter, if, ifelse, grepl, select and slice with no success.



      Any help would be appreciated, particularly using dplyr.










      share|improve this question















      I have a dataset similar to the following:



      Journal_ref <- c("1111","2222","2222","2222","3333","3333","4444","4444")
      Journal_type <- c("Adj","Rev","Adj","Rev","Rev","Rev","Adj","Adj")
      Journal_value <- c(90,10000,12000,80,9000,500,65,2500)
      Dataset <- data.frame(Journal_ref,Journal_type,Journal_value)


      For each Journal_ref group I am seeking to filter/select rows based on the following conditions:



      • Where "Adj" is included within Journal_type, filter/select to return the last "Adj" row in the Journal_ref group, and

      • Where "Adj" is not included within Journal_type, filter/select to return the last "Rev" in the Journal_ref group

      Based on the example above, the final output required would be:



      Journal_ref Journal_type Journal_value
      1111 Adj 90
      2222 Adj 12000
      3333 Rev 500
      4444 Adj 2500


      I have attempted using various combinations of group_by, filter, if, ifelse, grepl, select and slice with no success.



      Any help would be appreciated, particularly using dplyr.







      r if-statement dplyr slice grepl






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 11 at 11:45









      Jaap

      54.4k20116129




      54.4k20116129










      asked Nov 11 at 11:13









      ScottCR1

      83




      83






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          Try this:



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref, Journal_type) %>%
          summarise(Journal_value = last(Journal_value)) %>%
          ungroup() %>% group_by(Journal_ref) %>%
          filter(!(n() > 1 & Journal_type == "Rev"))


          Output:



           Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500





          share|improve this answer
















          • 1




            Many thanks for this
            – ScottCR1
            Nov 11 at 11:54

















          up vote
          3
          down vote













          Another possible solution:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == c("Rev","Adj")[1 + any(Journal_type == "Adj")]) %>%
          slice(n())


          which gives:




          # A tibble: 4 x 3
          # Groups: Journal_ref [4]
          Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500



          What this does:



          • You group by Journal_ref

          • Then you filter Journal_type for only Adj when present and Rev when there is no Adj in a group. Using c("Rev","Adj")[1 + any(Journal_type == "Adj")] gives you Adj when there is at least one present in a group and it gives Rev when there is no Adj present in a group.

          • Finally use use slice(n()) tot select the last row of each group.


          You could also do this with if_else:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == if_else(any(Journal_type == "Adj"), "Adj", "Rev")) %>%
          slice(n())





          share|improve this answer


















          • 2




            This is brilliant. Many thanks for the explanation also
            – ScottCR1
            Nov 11 at 11:55

















          up vote
          0
          down vote













          A dplyr way of doing it is as follows.



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref) %>%
          mutate(Adj = any(Journal_type == "Adj"),
          i = ifelse(Adj, last(which(Journal_type == "Adj")), last(which(Journal_type == "Rev")))) %>%
          filter(row_number() == i) %>%
          select(-Adj, -i)
          ## A tibble: 4 x 3
          ## Groups: Journal_ref [4]
          # Journal_ref Journal_type Journal_value
          # <fct> <fct> <dbl>
          #1 1111 Adj 90
          #2 2222 Adj 12000
          #3 3333 Rev 500
          #4 4444 Adj 2500





          share|improve this answer




















          • Many thanks for this
            – ScottCR1
            Nov 11 at 11:55










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248147%2fconditional-filtering-using-grepl-and-relative-row-position-in-group%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          Try this:



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref, Journal_type) %>%
          summarise(Journal_value = last(Journal_value)) %>%
          ungroup() %>% group_by(Journal_ref) %>%
          filter(!(n() > 1 & Journal_type == "Rev"))


          Output:



           Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500





          share|improve this answer
















          • 1




            Many thanks for this
            – ScottCR1
            Nov 11 at 11:54














          up vote
          1
          down vote



          accepted










          Try this:



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref, Journal_type) %>%
          summarise(Journal_value = last(Journal_value)) %>%
          ungroup() %>% group_by(Journal_ref) %>%
          filter(!(n() > 1 & Journal_type == "Rev"))


          Output:



           Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500





          share|improve this answer
















          • 1




            Many thanks for this
            – ScottCR1
            Nov 11 at 11:54












          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          Try this:



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref, Journal_type) %>%
          summarise(Journal_value = last(Journal_value)) %>%
          ungroup() %>% group_by(Journal_ref) %>%
          filter(!(n() > 1 & Journal_type == "Rev"))


          Output:



           Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500





          share|improve this answer












          Try this:



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref, Journal_type) %>%
          summarise(Journal_value = last(Journal_value)) %>%
          ungroup() %>% group_by(Journal_ref) %>%
          filter(!(n() > 1 & Journal_type == "Rev"))


          Output:



           Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 11 at 11:29









          arg0naut

          1,657312




          1,657312







          • 1




            Many thanks for this
            – ScottCR1
            Nov 11 at 11:54












          • 1




            Many thanks for this
            – ScottCR1
            Nov 11 at 11:54







          1




          1




          Many thanks for this
          – ScottCR1
          Nov 11 at 11:54




          Many thanks for this
          – ScottCR1
          Nov 11 at 11:54












          up vote
          3
          down vote













          Another possible solution:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == c("Rev","Adj")[1 + any(Journal_type == "Adj")]) %>%
          slice(n())


          which gives:




          # A tibble: 4 x 3
          # Groups: Journal_ref [4]
          Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500



          What this does:



          • You group by Journal_ref

          • Then you filter Journal_type for only Adj when present and Rev when there is no Adj in a group. Using c("Rev","Adj")[1 + any(Journal_type == "Adj")] gives you Adj when there is at least one present in a group and it gives Rev when there is no Adj present in a group.

          • Finally use use slice(n()) tot select the last row of each group.


          You could also do this with if_else:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == if_else(any(Journal_type == "Adj"), "Adj", "Rev")) %>%
          slice(n())





          share|improve this answer


















          • 2




            This is brilliant. Many thanks for the explanation also
            – ScottCR1
            Nov 11 at 11:55














          up vote
          3
          down vote













          Another possible solution:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == c("Rev","Adj")[1 + any(Journal_type == "Adj")]) %>%
          slice(n())


          which gives:




          # A tibble: 4 x 3
          # Groups: Journal_ref [4]
          Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500



          What this does:



          • You group by Journal_ref

          • Then you filter Journal_type for only Adj when present and Rev when there is no Adj in a group. Using c("Rev","Adj")[1 + any(Journal_type == "Adj")] gives you Adj when there is at least one present in a group and it gives Rev when there is no Adj present in a group.

          • Finally use use slice(n()) tot select the last row of each group.


          You could also do this with if_else:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == if_else(any(Journal_type == "Adj"), "Adj", "Rev")) %>%
          slice(n())





          share|improve this answer


















          • 2




            This is brilliant. Many thanks for the explanation also
            – ScottCR1
            Nov 11 at 11:55












          up vote
          3
          down vote










          up vote
          3
          down vote









          Another possible solution:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == c("Rev","Adj")[1 + any(Journal_type == "Adj")]) %>%
          slice(n())


          which gives:




          # A tibble: 4 x 3
          # Groups: Journal_ref [4]
          Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500



          What this does:



          • You group by Journal_ref

          • Then you filter Journal_type for only Adj when present and Rev when there is no Adj in a group. Using c("Rev","Adj")[1 + any(Journal_type == "Adj")] gives you Adj when there is at least one present in a group and it gives Rev when there is no Adj present in a group.

          • Finally use use slice(n()) tot select the last row of each group.


          You could also do this with if_else:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == if_else(any(Journal_type == "Adj"), "Adj", "Rev")) %>%
          slice(n())





          share|improve this answer














          Another possible solution:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == c("Rev","Adj")[1 + any(Journal_type == "Adj")]) %>%
          slice(n())


          which gives:




          # A tibble: 4 x 3
          # Groups: Journal_ref [4]
          Journal_ref Journal_type Journal_value
          <fct> <fct> <dbl>
          1 1111 Adj 90
          2 2222 Adj 12000
          3 3333 Rev 500
          4 4444 Adj 2500



          What this does:



          • You group by Journal_ref

          • Then you filter Journal_type for only Adj when present and Rev when there is no Adj in a group. Using c("Rev","Adj")[1 + any(Journal_type == "Adj")] gives you Adj when there is at least one present in a group and it gives Rev when there is no Adj present in a group.

          • Finally use use slice(n()) tot select the last row of each group.


          You could also do this with if_else:



          Dataset %>% 
          group_by(Journal_ref) %>%
          filter(Journal_type == if_else(any(Journal_type == "Adj"), "Adj", "Rev")) %>%
          slice(n())






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 11 at 11:43

























          answered Nov 11 at 11:32









          Jaap

          54.4k20116129




          54.4k20116129







          • 2




            This is brilliant. Many thanks for the explanation also
            – ScottCR1
            Nov 11 at 11:55












          • 2




            This is brilliant. Many thanks for the explanation also
            – ScottCR1
            Nov 11 at 11:55







          2




          2




          This is brilliant. Many thanks for the explanation also
          – ScottCR1
          Nov 11 at 11:55




          This is brilliant. Many thanks for the explanation also
          – ScottCR1
          Nov 11 at 11:55










          up vote
          0
          down vote













          A dplyr way of doing it is as follows.



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref) %>%
          mutate(Adj = any(Journal_type == "Adj"),
          i = ifelse(Adj, last(which(Journal_type == "Adj")), last(which(Journal_type == "Rev")))) %>%
          filter(row_number() == i) %>%
          select(-Adj, -i)
          ## A tibble: 4 x 3
          ## Groups: Journal_ref [4]
          # Journal_ref Journal_type Journal_value
          # <fct> <fct> <dbl>
          #1 1111 Adj 90
          #2 2222 Adj 12000
          #3 3333 Rev 500
          #4 4444 Adj 2500





          share|improve this answer




















          • Many thanks for this
            – ScottCR1
            Nov 11 at 11:55














          up vote
          0
          down vote













          A dplyr way of doing it is as follows.



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref) %>%
          mutate(Adj = any(Journal_type == "Adj"),
          i = ifelse(Adj, last(which(Journal_type == "Adj")), last(which(Journal_type == "Rev")))) %>%
          filter(row_number() == i) %>%
          select(-Adj, -i)
          ## A tibble: 4 x 3
          ## Groups: Journal_ref [4]
          # Journal_ref Journal_type Journal_value
          # <fct> <fct> <dbl>
          #1 1111 Adj 90
          #2 2222 Adj 12000
          #3 3333 Rev 500
          #4 4444 Adj 2500





          share|improve this answer




















          • Many thanks for this
            – ScottCR1
            Nov 11 at 11:55












          up vote
          0
          down vote










          up vote
          0
          down vote









          A dplyr way of doing it is as follows.



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref) %>%
          mutate(Adj = any(Journal_type == "Adj"),
          i = ifelse(Adj, last(which(Journal_type == "Adj")), last(which(Journal_type == "Rev")))) %>%
          filter(row_number() == i) %>%
          select(-Adj, -i)
          ## A tibble: 4 x 3
          ## Groups: Journal_ref [4]
          # Journal_ref Journal_type Journal_value
          # <fct> <fct> <dbl>
          #1 1111 Adj 90
          #2 2222 Adj 12000
          #3 3333 Rev 500
          #4 4444 Adj 2500





          share|improve this answer












          A dplyr way of doing it is as follows.



          library(dplyr)

          Dataset %>%
          group_by(Journal_ref) %>%
          mutate(Adj = any(Journal_type == "Adj"),
          i = ifelse(Adj, last(which(Journal_type == "Adj")), last(which(Journal_type == "Rev")))) %>%
          filter(row_number() == i) %>%
          select(-Adj, -i)
          ## A tibble: 4 x 3
          ## Groups: Journal_ref [4]
          # Journal_ref Journal_type Journal_value
          # <fct> <fct> <dbl>
          #1 1111 Adj 90
          #2 2222 Adj 12000
          #3 3333 Rev 500
          #4 4444 Adj 2500






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 11 at 11:31









          Rui Barradas

          15.1k31730




          15.1k31730











          • Many thanks for this
            – ScottCR1
            Nov 11 at 11:55
















          • Many thanks for this
            – ScottCR1
            Nov 11 at 11:55















          Many thanks for this
          – ScottCR1
          Nov 11 at 11:55




          Many thanks for this
          – ScottCR1
          Nov 11 at 11:55

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248147%2fconditional-filtering-using-grepl-and-relative-row-position-in-group%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          How to read a connectionString WITH PROVIDER in .NET Core?

          Node.js Script on GitHub Pages or Amazon S3

          Museum of Modern and Contemporary Art of Trento and Rovereto