if else where cell is contained within the column name string










0















Been having a difficult time trying to code this problem out. So effectively I am trying to select the cell in a wide data frame where the values in one column are contained with the column name's string. I generally use the tidyverse in my workflows and cannot get anything in that verse to work. Tried applying, for looping through the rows. Got some things to work but they are VERY slow. The attached data frame snippet is just the first 10 rows of a 1.8 million row data frame. So using tidy::gather is out of the question here. Any thoughts on how I could go about accomplishing this would be of great use because it comes up a lot more often than I would have expected.



Data can be found here



library(tidyverse)
library(foreach)

df <- read_csv('test_data.csv')


enter image description here



So here I am trying to find the variable for fire_year that is contained within the wide var_ fields. So for instance, here if fire_year = 1998 then I would want to capture the value in the column named var_1998. This is the closest I have got to getting a solution (and it works!) but it takes forever on the full data frame:



df_slim <- foreach(df=iter(df, by='row'), .combine=rbind, 
.packages = c('dplyr', "tidyverse")) %do%
df_out <- df %>%
gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
separate(key,
into = c("tmp1", 'zyear'),
sep = "_") %>%
mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
na.omit() %>%
dplyr::select(fpa_id, fire_year, var)
return(df_out)



enter image description here



I cannot figure fast, efficient way to accomplish for the life of me! At this point I have calculated that it will take 160 hours to complete this for loop on the 1.7 million row data frame! If someone could point me in the right direction I would be eternally grateful!



Thanks!










share|improve this question




























    0















    Been having a difficult time trying to code this problem out. So effectively I am trying to select the cell in a wide data frame where the values in one column are contained with the column name's string. I generally use the tidyverse in my workflows and cannot get anything in that verse to work. Tried applying, for looping through the rows. Got some things to work but they are VERY slow. The attached data frame snippet is just the first 10 rows of a 1.8 million row data frame. So using tidy::gather is out of the question here. Any thoughts on how I could go about accomplishing this would be of great use because it comes up a lot more often than I would have expected.



    Data can be found here



    library(tidyverse)
    library(foreach)

    df <- read_csv('test_data.csv')


    enter image description here



    So here I am trying to find the variable for fire_year that is contained within the wide var_ fields. So for instance, here if fire_year = 1998 then I would want to capture the value in the column named var_1998. This is the closest I have got to getting a solution (and it works!) but it takes forever on the full data frame:



    df_slim <- foreach(df=iter(df, by='row'), .combine=rbind, 
    .packages = c('dplyr', "tidyverse")) %do%
    df_out <- df %>%
    gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
    separate(key,
    into = c("tmp1", 'zyear'),
    sep = "_") %>%
    mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
    na.omit() %>%
    dplyr::select(fpa_id, fire_year, var)
    return(df_out)



    enter image description here



    I cannot figure fast, efficient way to accomplish for the life of me! At this point I have calculated that it will take 160 hours to complete this for loop on the 1.7 million row data frame! If someone could point me in the right direction I would be eternally grateful!



    Thanks!










    share|improve this question


























      0












      0








      0








      Been having a difficult time trying to code this problem out. So effectively I am trying to select the cell in a wide data frame where the values in one column are contained with the column name's string. I generally use the tidyverse in my workflows and cannot get anything in that verse to work. Tried applying, for looping through the rows. Got some things to work but they are VERY slow. The attached data frame snippet is just the first 10 rows of a 1.8 million row data frame. So using tidy::gather is out of the question here. Any thoughts on how I could go about accomplishing this would be of great use because it comes up a lot more often than I would have expected.



      Data can be found here



      library(tidyverse)
      library(foreach)

      df <- read_csv('test_data.csv')


      enter image description here



      So here I am trying to find the variable for fire_year that is contained within the wide var_ fields. So for instance, here if fire_year = 1998 then I would want to capture the value in the column named var_1998. This is the closest I have got to getting a solution (and it works!) but it takes forever on the full data frame:



      df_slim <- foreach(df=iter(df, by='row'), .combine=rbind, 
      .packages = c('dplyr', "tidyverse")) %do%
      df_out <- df %>%
      gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
      separate(key,
      into = c("tmp1", 'zyear'),
      sep = "_") %>%
      mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
      na.omit() %>%
      dplyr::select(fpa_id, fire_year, var)
      return(df_out)



      enter image description here



      I cannot figure fast, efficient way to accomplish for the life of me! At this point I have calculated that it will take 160 hours to complete this for loop on the 1.7 million row data frame! If someone could point me in the right direction I would be eternally grateful!



      Thanks!










      share|improve this question
















      Been having a difficult time trying to code this problem out. So effectively I am trying to select the cell in a wide data frame where the values in one column are contained with the column name's string. I generally use the tidyverse in my workflows and cannot get anything in that verse to work. Tried applying, for looping through the rows. Got some things to work but they are VERY slow. The attached data frame snippet is just the first 10 rows of a 1.8 million row data frame. So using tidy::gather is out of the question here. Any thoughts on how I could go about accomplishing this would be of great use because it comes up a lot more often than I would have expected.



      Data can be found here



      library(tidyverse)
      library(foreach)

      df <- read_csv('test_data.csv')


      enter image description here



      So here I am trying to find the variable for fire_year that is contained within the wide var_ fields. So for instance, here if fire_year = 1998 then I would want to capture the value in the column named var_1998. This is the closest I have got to getting a solution (and it works!) but it takes forever on the full data frame:



      df_slim <- foreach(df=iter(df, by='row'), .combine=rbind, 
      .packages = c('dplyr', "tidyverse")) %do%
      df_out <- df %>%
      gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
      separate(key,
      into = c("tmp1", 'zyear'),
      sep = "_") %>%
      mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
      na.omit() %>%
      dplyr::select(fpa_id, fire_year, var)
      return(df_out)



      enter image description here



      I cannot figure fast, efficient way to accomplish for the life of me! At this point I have calculated that it will take 160 hours to complete this for loop on the 1.7 million row data frame! If someone could point me in the right direction I would be eternally grateful!



      Thanks!







      r






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 '18 at 22:25







      nate-m

















      asked Nov 13 '18 at 22:01









      nate-mnate-m

      8116




      8116






















          1 Answer
          1






          active

          oldest

          votes


















          1














          I am not 100% sure what you need, but here is my take on it (using data.table)



          library(data.table)

          dt <- data.table(test_data)

          setkey(dt, "fire_year")
          for(i in unique(dt[["fire_year"]]))
          dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]



          And then subset the cols you need



          dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
          dt_slim

          fpa_id fire_year var
          1: FS-1418827 2004 0
          2: FS-1418835 2004 9
          3: FS-1418845 2004 0
          4: FS-1418847 2004 0
          5: FS-1418849 2004 0
          6: FS-1418851 2004 0
          7: FS-1418859 2004 0
          8: FS-1418826 2005 0
          9: FS-1418854 2005 0
          10: FS-1418856 2005 114


          Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...






          share|improve this answer

























          • Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

            – nate-m
            Nov 14 '18 at 15:19












          • Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

            – mmn
            Nov 14 '18 at 16:11










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290163%2fif-else-where-cell-is-contained-within-the-column-name-string%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          I am not 100% sure what you need, but here is my take on it (using data.table)



          library(data.table)

          dt <- data.table(test_data)

          setkey(dt, "fire_year")
          for(i in unique(dt[["fire_year"]]))
          dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]



          And then subset the cols you need



          dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
          dt_slim

          fpa_id fire_year var
          1: FS-1418827 2004 0
          2: FS-1418835 2004 9
          3: FS-1418845 2004 0
          4: FS-1418847 2004 0
          5: FS-1418849 2004 0
          6: FS-1418851 2004 0
          7: FS-1418859 2004 0
          8: FS-1418826 2005 0
          9: FS-1418854 2005 0
          10: FS-1418856 2005 114


          Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...






          share|improve this answer

























          • Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

            – nate-m
            Nov 14 '18 at 15:19












          • Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

            – mmn
            Nov 14 '18 at 16:11















          1














          I am not 100% sure what you need, but here is my take on it (using data.table)



          library(data.table)

          dt <- data.table(test_data)

          setkey(dt, "fire_year")
          for(i in unique(dt[["fire_year"]]))
          dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]



          And then subset the cols you need



          dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
          dt_slim

          fpa_id fire_year var
          1: FS-1418827 2004 0
          2: FS-1418835 2004 9
          3: FS-1418845 2004 0
          4: FS-1418847 2004 0
          5: FS-1418849 2004 0
          6: FS-1418851 2004 0
          7: FS-1418859 2004 0
          8: FS-1418826 2005 0
          9: FS-1418854 2005 0
          10: FS-1418856 2005 114


          Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...






          share|improve this answer

























          • Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

            – nate-m
            Nov 14 '18 at 15:19












          • Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

            – mmn
            Nov 14 '18 at 16:11













          1












          1








          1







          I am not 100% sure what you need, but here is my take on it (using data.table)



          library(data.table)

          dt <- data.table(test_data)

          setkey(dt, "fire_year")
          for(i in unique(dt[["fire_year"]]))
          dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]



          And then subset the cols you need



          dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
          dt_slim

          fpa_id fire_year var
          1: FS-1418827 2004 0
          2: FS-1418835 2004 9
          3: FS-1418845 2004 0
          4: FS-1418847 2004 0
          5: FS-1418849 2004 0
          6: FS-1418851 2004 0
          7: FS-1418859 2004 0
          8: FS-1418826 2005 0
          9: FS-1418854 2005 0
          10: FS-1418856 2005 114


          Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...






          share|improve this answer















          I am not 100% sure what you need, but here is my take on it (using data.table)



          library(data.table)

          dt <- data.table(test_data)

          setkey(dt, "fire_year")
          for(i in unique(dt[["fire_year"]]))
          dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]



          And then subset the cols you need



          dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
          dt_slim

          fpa_id fire_year var
          1: FS-1418827 2004 0
          2: FS-1418835 2004 9
          3: FS-1418845 2004 0
          4: FS-1418847 2004 0
          5: FS-1418849 2004 0
          6: FS-1418851 2004 0
          7: FS-1418859 2004 0
          8: FS-1418826 2005 0
          9: FS-1418854 2005 0
          10: FS-1418856 2005 114


          Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 14 '18 at 21:11

























          answered Nov 14 '18 at 0:26









          mmnmmn

          11116




          11116












          • Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

            – nate-m
            Nov 14 '18 at 15:19












          • Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

            – mmn
            Nov 14 '18 at 16:11

















          • Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

            – nate-m
            Nov 14 '18 at 15:19












          • Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

            – mmn
            Nov 14 '18 at 16:11
















          Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

          – nate-m
          Nov 14 '18 at 15:19






          Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

          – nate-m
          Nov 14 '18 at 15:19














          Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

          – mmn
          Nov 14 '18 at 16:11





          Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

          – mmn
          Nov 14 '18 at 16:11

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290163%2fif-else-where-cell-is-contained-within-the-column-name-string%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Barbados

          How to read a connectionString WITH PROVIDER in .NET Core?

          Node.js Script on GitHub Pages or Amazon S3