if else where cell is contained within the column name string
Been having a difficult time trying to code this problem out. So effectively I am trying to select the cell in a wide data frame where the values in one column are contained with the column name's string. I generally use the tidyverse in my workflows and cannot get anything in that verse to work. Tried applying, for looping through the rows. Got some things to work but they are VERY slow. The attached data frame snippet is just the first 10 rows of a 1.8 million row data frame. So using tidy::gather
is out of the question here. Any thoughts on how I could go about accomplishing this would be of great use because it comes up a lot more often than I would have expected.
Data can be found here
library(tidyverse)
library(foreach)
df <- read_csv('test_data.csv')
So here I am trying to find the variable for fire_year
that is contained within the wide var_
fields. So for instance, here if fire_year
= 1998 then I would want to capture the value in the column named var_1998
. This is the closest I have got to getting a solution (and it works!) but it takes forever on the full data frame:
df_slim <- foreach(df=iter(df, by='row'), .combine=rbind,
.packages = c('dplyr', "tidyverse")) %do%
df_out <- df %>%
gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
separate(key,
into = c("tmp1", 'zyear'),
sep = "_") %>%
mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
na.omit() %>%
dplyr::select(fpa_id, fire_year, var)
return(df_out)
I cannot figure fast, efficient way to accomplish for the life of me! At this point I have calculated that it will take 160 hours to complete this for loop on the 1.7 million row data frame! If someone could point me in the right direction I would be eternally grateful!
Thanks!
r
add a comment |
Been having a difficult time trying to code this problem out. So effectively I am trying to select the cell in a wide data frame where the values in one column are contained with the column name's string. I generally use the tidyverse in my workflows and cannot get anything in that verse to work. Tried applying, for looping through the rows. Got some things to work but they are VERY slow. The attached data frame snippet is just the first 10 rows of a 1.8 million row data frame. So using tidy::gather
is out of the question here. Any thoughts on how I could go about accomplishing this would be of great use because it comes up a lot more often than I would have expected.
Data can be found here
library(tidyverse)
library(foreach)
df <- read_csv('test_data.csv')
So here I am trying to find the variable for fire_year
that is contained within the wide var_
fields. So for instance, here if fire_year
= 1998 then I would want to capture the value in the column named var_1998
. This is the closest I have got to getting a solution (and it works!) but it takes forever on the full data frame:
df_slim <- foreach(df=iter(df, by='row'), .combine=rbind,
.packages = c('dplyr', "tidyverse")) %do%
df_out <- df %>%
gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
separate(key,
into = c("tmp1", 'zyear'),
sep = "_") %>%
mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
na.omit() %>%
dplyr::select(fpa_id, fire_year, var)
return(df_out)
I cannot figure fast, efficient way to accomplish for the life of me! At this point I have calculated that it will take 160 hours to complete this for loop on the 1.7 million row data frame! If someone could point me in the right direction I would be eternally grateful!
Thanks!
r
add a comment |
Been having a difficult time trying to code this problem out. So effectively I am trying to select the cell in a wide data frame where the values in one column are contained with the column name's string. I generally use the tidyverse in my workflows and cannot get anything in that verse to work. Tried applying, for looping through the rows. Got some things to work but they are VERY slow. The attached data frame snippet is just the first 10 rows of a 1.8 million row data frame. So using tidy::gather
is out of the question here. Any thoughts on how I could go about accomplishing this would be of great use because it comes up a lot more often than I would have expected.
Data can be found here
library(tidyverse)
library(foreach)
df <- read_csv('test_data.csv')
So here I am trying to find the variable for fire_year
that is contained within the wide var_
fields. So for instance, here if fire_year
= 1998 then I would want to capture the value in the column named var_1998
. This is the closest I have got to getting a solution (and it works!) but it takes forever on the full data frame:
df_slim <- foreach(df=iter(df, by='row'), .combine=rbind,
.packages = c('dplyr', "tidyverse")) %do%
df_out <- df %>%
gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
separate(key,
into = c("tmp1", 'zyear'),
sep = "_") %>%
mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
na.omit() %>%
dplyr::select(fpa_id, fire_year, var)
return(df_out)
I cannot figure fast, efficient way to accomplish for the life of me! At this point I have calculated that it will take 160 hours to complete this for loop on the 1.7 million row data frame! If someone could point me in the right direction I would be eternally grateful!
Thanks!
r
Been having a difficult time trying to code this problem out. So effectively I am trying to select the cell in a wide data frame where the values in one column are contained with the column name's string. I generally use the tidyverse in my workflows and cannot get anything in that verse to work. Tried applying, for looping through the rows. Got some things to work but they are VERY slow. The attached data frame snippet is just the first 10 rows of a 1.8 million row data frame. So using tidy::gather
is out of the question here. Any thoughts on how I could go about accomplishing this would be of great use because it comes up a lot more often than I would have expected.
Data can be found here
library(tidyverse)
library(foreach)
df <- read_csv('test_data.csv')
So here I am trying to find the variable for fire_year
that is contained within the wide var_
fields. So for instance, here if fire_year
= 1998 then I would want to capture the value in the column named var_1998
. This is the closest I have got to getting a solution (and it works!) but it takes forever on the full data frame:
df_slim <- foreach(df=iter(df, by='row'), .combine=rbind,
.packages = c('dplyr', "tidyverse")) %do%
df_out <- df %>%
gather(key = key, value = out_var, -fpa_id, -fire_year) %>%
separate(key,
into = c("tmp1", 'zyear'),
sep = "_") %>%
mutate(var = ifelse(fire_year == zyear, out_var, NA)) %>%
na.omit() %>%
dplyr::select(fpa_id, fire_year, var)
return(df_out)
I cannot figure fast, efficient way to accomplish for the life of me! At this point I have calculated that it will take 160 hours to complete this for loop on the 1.7 million row data frame! If someone could point me in the right direction I would be eternally grateful!
Thanks!
r
r
edited Nov 13 '18 at 22:25
nate-m
asked Nov 13 '18 at 22:01
nate-mnate-m
8116
8116
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I am not 100% sure what you need, but here is my take on it (using data.table)
library(data.table)
dt <- data.table(test_data)
setkey(dt, "fire_year")
for(i in unique(dt[["fire_year"]]))
dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]
And then subset the cols you need
dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim
fpa_id fire_year var
1: FS-1418827 2004 0
2: FS-1418835 2004 9
3: FS-1418845 2004 0
4: FS-1418847 2004 0
5: FS-1418849 2004 0
6: FS-1418851 2004 0
7: FS-1418859 2004 0
8: FS-1418826 2005 0
9: FS-1418854 2005 0
10: FS-1418856 2005 114
Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...
Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help
– nate-m
Nov 14 '18 at 15:19
Happy to help! I would be interested in a neat base and dplyr solution and how they scale.
– mmn
Nov 14 '18 at 16:11
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290163%2fif-else-where-cell-is-contained-within-the-column-name-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I am not 100% sure what you need, but here is my take on it (using data.table)
library(data.table)
dt <- data.table(test_data)
setkey(dt, "fire_year")
for(i in unique(dt[["fire_year"]]))
dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]
And then subset the cols you need
dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim
fpa_id fire_year var
1: FS-1418827 2004 0
2: FS-1418835 2004 9
3: FS-1418845 2004 0
4: FS-1418847 2004 0
5: FS-1418849 2004 0
6: FS-1418851 2004 0
7: FS-1418859 2004 0
8: FS-1418826 2005 0
9: FS-1418854 2005 0
10: FS-1418856 2005 114
Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...
Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help
– nate-m
Nov 14 '18 at 15:19
Happy to help! I would be interested in a neat base and dplyr solution and how they scale.
– mmn
Nov 14 '18 at 16:11
add a comment |
I am not 100% sure what you need, but here is my take on it (using data.table)
library(data.table)
dt <- data.table(test_data)
setkey(dt, "fire_year")
for(i in unique(dt[["fire_year"]]))
dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]
And then subset the cols you need
dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim
fpa_id fire_year var
1: FS-1418827 2004 0
2: FS-1418835 2004 9
3: FS-1418845 2004 0
4: FS-1418847 2004 0
5: FS-1418849 2004 0
6: FS-1418851 2004 0
7: FS-1418859 2004 0
8: FS-1418826 2005 0
9: FS-1418854 2005 0
10: FS-1418856 2005 114
Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...
Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help
– nate-m
Nov 14 '18 at 15:19
Happy to help! I would be interested in a neat base and dplyr solution and how they scale.
– mmn
Nov 14 '18 at 16:11
add a comment |
I am not 100% sure what you need, but here is my take on it (using data.table)
library(data.table)
dt <- data.table(test_data)
setkey(dt, "fire_year")
for(i in unique(dt[["fire_year"]]))
dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]
And then subset the cols you need
dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim
fpa_id fire_year var
1: FS-1418827 2004 0
2: FS-1418835 2004 9
3: FS-1418845 2004 0
4: FS-1418847 2004 0
5: FS-1418849 2004 0
6: FS-1418851 2004 0
7: FS-1418859 2004 0
8: FS-1418826 2005 0
9: FS-1418854 2005 0
10: FS-1418856 2005 114
Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...
I am not 100% sure what you need, but here is my take on it (using data.table)
library(data.table)
dt <- data.table(test_data)
setkey(dt, "fire_year")
for(i in unique(dt[["fire_year"]]))
dt[fire_year == i, var:= get(paste("var", i, sep = "_"))]
And then subset the cols you need
dt_slim <- dt[,.SD, .SDcols = c("fpa_id", "fire_year", "var")]
dt_slim
fpa_id fire_year var
1: FS-1418827 2004 0
2: FS-1418835 2004 9
3: FS-1418845 2004 0
4: FS-1418847 2004 0
5: FS-1418849 2004 0
6: FS-1418851 2004 0
7: FS-1418859 2004 0
8: FS-1418826 2005 0
9: FS-1418854 2005 0
10: FS-1418856 2005 114
Not tested on your 1.8 million rows. Should be relativly fast. Interested on the timings though...
edited Nov 14 '18 at 21:11
answered Nov 14 '18 at 0:26
mmnmmn
11116
11116
Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help
– nate-m
Nov 14 '18 at 15:19
Happy to help! I would be interested in a neat base and dplyr solution and how they scale.
– mmn
Nov 14 '18 at 16:11
add a comment |
Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help
– nate-m
Nov 14 '18 at 15:19
Happy to help! I would be interested in a neat base and dplyr solution and how they scale.
– mmn
Nov 14 '18 at 16:11
Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help
– nate-m
Nov 14 '18 at 15:19
Wow that is perfect. I haven't used data.table much and this is inspiring me to drive into that package more. The method I illustrated above took 160 hours, this seconds over the 1.8 million records! Very impressive! Thanks for your help
– nate-m
Nov 14 '18 at 15:19
Happy to help! I would be interested in a neat base and dplyr solution and how they scale.
– mmn
Nov 14 '18 at 16:11
Happy to help! I would be interested in a neat base and dplyr solution and how they scale.
– mmn
Nov 14 '18 at 16:11
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290163%2fif-else-where-cell-is-contained-within-the-column-name-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown