if else where cell is contained within the column name string

Been having a difficult time trying to code this problem out. So effectively I am trying to select the cell in a wide data frame where the values in one column are contained with the column name's string. I generally use the tidyverse in my workflows and cannot get anything in that verse to work. Tried applying, for looping through the rows. Got some things to work but they are VERY slow. The attached data frame snippet is just the first 10 rows of a 1.8 million row data frame. So using tidy::gather is out of the question here. Any thoughts on how I could go about accomplishing this would be of great use because it comes up a lot more often than I would have expected.

Data can be found here

library(tidyverse)
library(foreach)

df <- read_csv('test_data.csv')

enter image description here

So here I am trying to find the variable for fire_year that is contained within the wide var_ fields. So for instance, here if fire_year = 1998 then I would want to capture the value in the column named var_1998. This is the closest I have got to getting a solution (and it works!) but it takes forever on the full data frame:

df_slim <- foreach(df=iter(df, by='row'), .combine=rbind, 
 .packages = c('dplyr', "tidyverse")) %do% 
 df_out <- df %>%
 gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
 separate(key,
 into = c("tmp1", 'zyear'),
 sep = "_") %>%
 mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
 na.omit() %>%
 dplyr::select(fpa_id, fire_year, var)
 return(df_out)

enter image description here

I cannot figure fast, efficient way to accomplish for the life of me! At this point I have calculated that it will take 160 hours to complete this for loop on the 1.7 million row data frame! If someone could point me in the right direction I would be eternally grateful!

Thanks!

edited Nov 13 '18 at 22:25

asked Nov 13 '18 at 22:01

nate-m

8116

add a comment |

Data can be found here

library(tidyverse)
library(foreach)

df <- read_csv('test_data.csv')

enter image description here

df_slim <- foreach(df=iter(df, by='row'), .combine=rbind, 
 .packages = c('dplyr', "tidyverse")) %do% 
 df_out <- df %>%
 gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
 separate(key,
 into = c("tmp1", 'zyear'),
 sep = "_") %>%
 mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
 na.omit() %>%
 dplyr::select(fpa_id, fire_year, var)
 return(df_out)

enter image description here

Thanks!

edited Nov 13 '18 at 22:25

asked Nov 13 '18 at 22:01

nate-m

8116

add a comment |

Data can be found here

library(tidyverse)
library(foreach)

df <- read_csv('test_data.csv')

enter image description here

df_slim <- foreach(df=iter(df, by='row'), .combine=rbind, 
 .packages = c('dplyr', "tidyverse")) %do% 
 df_out <- df %>%
 gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
 separate(key,
 into = c("tmp1", 'zyear'),
 sep = "_") %>%
 mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
 na.omit() %>%
 dplyr::select(fpa_id, fire_year, var)
 return(df_out)

enter image description here

Thanks!

edited Nov 13 '18 at 22:25

asked Nov 13 '18 at 22:01

nate-m

8116

Data can be found here

library(tidyverse)
library(foreach)

df <- read_csv('test_data.csv')

enter image description here

df_slim <- foreach(df=iter(df, by='row'), .combine=rbind, 
 .packages = c('dplyr', "tidyverse")) %do% 
 df_out <- df %>%
 gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
 separate(key,
 into = c("tmp1", 'zyear'),
 sep = "_") %>%
 mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
 na.omit() %>%
 dplyr::select(fpa_id, fire_year, var)
 return(df_out)

enter image description here

Thanks!

edited Nov 13 '18 at 22:25

asked Nov 13 '18 at 22:01

nate-m

8116

edited Nov 13 '18 at 22:25

asked Nov 13 '18 at 22:01

nate-m

8116

edited Nov 13 '18 at 22:25

asked Nov 13 '18 at 22:01

nate-m

8116

asked Nov 13 '18 at 22:01

nate-m

8116

asked Nov 13 '18 at 22:01

nate-m

8116

add a comment |

1 Answer
1

active

oldest

votes

I am not 100% sure what you need, but here is my take on it (using data.table)

library(data.table)

 dt <- data.table(test_data)

 setkey(dt, "fire_year")
 for(i in unique(dt[["fire_year"]]))
 dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]

And then subset the cols you need

dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim

 fpa_id fire_year var
 1: FS-1418827 2004 0
 2: FS-1418835 2004 9
 3: FS-1418845 2004 0
 4: FS-1418847 2004 0
 5: FS-1418849 2004 0
 6: FS-1418851 2004 0
 7: FS-1418859 2004 0
 8: FS-1418826 2005 0
 9: FS-1418854 2005 0
10: FS-1418856 2005 114

Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...

edited Nov 14 '18 at 21:11

answered Nov 14 '18 at 0:26

mmn

11116

Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

– nate-m
Nov 14 '18 at 15:19

Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

– mmn
Nov 14 '18 at 16:11

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290163%2fif-else-where-cell-is-contained-within-the-column-name-string%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I am not 100% sure what you need, but here is my take on it (using data.table)

library(data.table)

 dt <- data.table(test_data)

 setkey(dt, "fire_year")
 for(i in unique(dt[["fire_year"]]))
 dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]

And then subset the cols you need

dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim

 fpa_id fire_year var
 1: FS-1418827 2004 0
 2: FS-1418835 2004 9
 3: FS-1418845 2004 0
 4: FS-1418847 2004 0
 5: FS-1418849 2004 0
 6: FS-1418851 2004 0
 7: FS-1418859 2004 0
 8: FS-1418826 2005 0
 9: FS-1418854 2005 0
10: FS-1418856 2005 114

Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...

edited Nov 14 '18 at 21:11

answered Nov 14 '18 at 0:26

mmn

11116

Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

– nate-m
Nov 14 '18 at 15:19

Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

– mmn
Nov 14 '18 at 16:11

add a comment |

I am not 100% sure what you need, but here is my take on it (using data.table)

library(data.table)

 dt <- data.table(test_data)

 setkey(dt, "fire_year")
 for(i in unique(dt[["fire_year"]]))
 dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]

And then subset the cols you need

dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim

 fpa_id fire_year var
 1: FS-1418827 2004 0
 2: FS-1418835 2004 9
 3: FS-1418845 2004 0
 4: FS-1418847 2004 0
 5: FS-1418849 2004 0
 6: FS-1418851 2004 0
 7: FS-1418859 2004 0
 8: FS-1418826 2005 0
 9: FS-1418854 2005 0
10: FS-1418856 2005 114

Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...

edited Nov 14 '18 at 21:11

answered Nov 14 '18 at 0:26

mmn

11116

Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

– nate-m
Nov 14 '18 at 15:19

Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

– mmn
Nov 14 '18 at 16:11

add a comment |

I am not 100% sure what you need, but here is my take on it (using data.table)

library(data.table)

 dt <- data.table(test_data)

 setkey(dt, "fire_year")
 for(i in unique(dt[["fire_year"]]))
 dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]

And then subset the cols you need

dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim

 fpa_id fire_year var
 1: FS-1418827 2004 0
 2: FS-1418835 2004 9
 3: FS-1418845 2004 0
 4: FS-1418847 2004 0
 5: FS-1418849 2004 0
 6: FS-1418851 2004 0
 7: FS-1418859 2004 0
 8: FS-1418826 2005 0
 9: FS-1418854 2005 0
10: FS-1418856 2005 114

Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...

edited Nov 14 '18 at 21:11

answered Nov 14 '18 at 0:26

mmn

11116

I am not 100% sure what you need, but here is my take on it (using data.table)

library(data.table)

 dt <- data.table(test_data)

 setkey(dt, "fire_year")
 for(i in unique(dt[["fire_year"]]))
 dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]

And then subset the cols you need

dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim

 fpa_id fire_year var
 1: FS-1418827 2004 0
 2: FS-1418835 2004 9
 3: FS-1418845 2004 0
 4: FS-1418847 2004 0
 5: FS-1418849 2004 0
 6: FS-1418851 2004 0
 7: FS-1418859 2004 0
 8: FS-1418826 2005 0
 9: FS-1418854 2005 0
10: FS-1418856 2005 114

Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...

edited Nov 14 '18 at 21:11

answered Nov 14 '18 at 0:26

mmn

11116

edited Nov 14 '18 at 21:11

answered Nov 14 '18 at 0:26

mmn

11116

answered Nov 14 '18 at 0:26

mmn

11116

answered Nov 14 '18 at 0:26

mmn

11116

Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

– nate-m
Nov 14 '18 at 15:19

Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

– mmn
Nov 14 '18 at 16:11

add a comment |

Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

– nate-m
Nov 14 '18 at 15:19

Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

– mmn
Nov 14 '18 at 16:11

Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help

– nate-m
Nov 14 '18 at 15:19

Happy to help! I would be interested in a neat base and dplyr solution and how they scale.

– mmn
Nov 14 '18 at 16:11

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Odtnhj