Improve speed of drawdown.duration implementation

I have working code that calculates a running drawdown.duration where drawdown.duration is defined as the number of months between the current month and the previous peak. I implemented the code, however, as a for loop and it runs quite slow.

Is there a more efficient/faster way to implement this in R?

The code takes a data.frame (specifically a tibble since I have been working with dplyr) named returnsWithValues.

> structure(list(date = structure(c(789, 820, 850, 881, 911, 942
), class = "Date"), value = c(0.94031052, 0.930751624153046, 
0.926756311376762, 0.874209664097166, 0.843026010916249, 2.1), 
 peak = c(1, 1, 1, 1, 1, 2.1), drawdown = c(-0.05968948, -0.0692483758469535, 
 -0.0732436886232377, -0.125790335902834, -0.156973989083751, 
 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))
# A tibble: 6 x 4
 date value peak drawdown
 <date> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597
2 1972-03-31 0.931 1 -0.0692
3 1972-04-30 0.927 1 -0.0732
4 1972-05-31 0.874 1 -0.126 
5 1972-06-30 0.843 1 -0.157 
6 1972-07-31 2.1 2.1 0

I have implemented drawdown.duration using a for loop:

returnsWithValues <- returnsWithValues %>% mutate(drawdown.duration = NA)

 # add drawdown.duration col
 for (row in 1:nrow(returnsWithValues)) 
 if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) 
 returnsWithValues[row,"drawdown.duration"] = 0
 else 
 if(row == 1)
 returnsWithValues[row,"drawdown.duration"] = 1
 else 
 returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1

Which gives the correct answer as:

> returnsWithValues
# A tibble: 6 x 5
 date value peak drawdown drawdown.duration
 <date> <dbl> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597 1
2 1972-03-31 0.931 1 -0.0692 2
3 1972-04-30 0.927 1 -0.0732 3
4 1972-05-31 0.874 1 -0.126 4
5 1972-06-30 0.843 1 -0.157 5
6 1972-07-31 2.1 2.1 0 0

asked Nov 15 '18 at 2:35

cpage

22212

add a comment |

Is there a more efficient/faster way to implement this in R?

The code takes a data.frame (specifically a tibble since I have been working with dplyr) named returnsWithValues.

> structure(list(date = structure(c(789, 820, 850, 881, 911, 942
), class = "Date"), value = c(0.94031052, 0.930751624153046, 
0.926756311376762, 0.874209664097166, 0.843026010916249, 2.1), 
 peak = c(1, 1, 1, 1, 1, 2.1), drawdown = c(-0.05968948, -0.0692483758469535, 
 -0.0732436886232377, -0.125790335902834, -0.156973989083751, 
 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))
# A tibble: 6 x 4
 date value peak drawdown
 <date> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597
2 1972-03-31 0.931 1 -0.0692
3 1972-04-30 0.927 1 -0.0732
4 1972-05-31 0.874 1 -0.126 
5 1972-06-30 0.843 1 -0.157 
6 1972-07-31 2.1 2.1 0

I have implemented drawdown.duration using a for loop:

returnsWithValues <- returnsWithValues %>% mutate(drawdown.duration = NA)

 # add drawdown.duration col
 for (row in 1:nrow(returnsWithValues)) 
 if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) 
 returnsWithValues[row,"drawdown.duration"] = 0
 else 
 if(row == 1)
 returnsWithValues[row,"drawdown.duration"] = 1
 else 
 returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1

Which gives the correct answer as:

> returnsWithValues
# A tibble: 6 x 5
 date value peak drawdown drawdown.duration
 <date> <dbl> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597 1
2 1972-03-31 0.931 1 -0.0692 2
3 1972-04-30 0.927 1 -0.0732 3
4 1972-05-31 0.874 1 -0.126 4
5 1972-06-30 0.843 1 -0.157 5
6 1972-07-31 2.1 2.1 0 0

asked Nov 15 '18 at 2:35

cpage

22212

add a comment |

Is there a more efficient/faster way to implement this in R?

The code takes a data.frame (specifically a tibble since I have been working with dplyr) named returnsWithValues.

> structure(list(date = structure(c(789, 820, 850, 881, 911, 942
), class = "Date"), value = c(0.94031052, 0.930751624153046, 
0.926756311376762, 0.874209664097166, 0.843026010916249, 2.1), 
 peak = c(1, 1, 1, 1, 1, 2.1), drawdown = c(-0.05968948, -0.0692483758469535, 
 -0.0732436886232377, -0.125790335902834, -0.156973989083751, 
 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))
# A tibble: 6 x 4
 date value peak drawdown
 <date> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597
2 1972-03-31 0.931 1 -0.0692
3 1972-04-30 0.927 1 -0.0732
4 1972-05-31 0.874 1 -0.126 
5 1972-06-30 0.843 1 -0.157 
6 1972-07-31 2.1 2.1 0

I have implemented drawdown.duration using a for loop:

returnsWithValues <- returnsWithValues %>% mutate(drawdown.duration = NA)

 # add drawdown.duration col
 for (row in 1:nrow(returnsWithValues)) 
 if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) 
 returnsWithValues[row,"drawdown.duration"] = 0
 else 
 if(row == 1)
 returnsWithValues[row,"drawdown.duration"] = 1
 else 
 returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1

Which gives the correct answer as:

> returnsWithValues
# A tibble: 6 x 5
 date value peak drawdown drawdown.duration
 <date> <dbl> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597 1
2 1972-03-31 0.931 1 -0.0692 2
3 1972-04-30 0.927 1 -0.0732 3
4 1972-05-31 0.874 1 -0.126 4
5 1972-06-30 0.843 1 -0.157 5
6 1972-07-31 2.1 2.1 0 0

asked Nov 15 '18 at 2:35

cpage

22212

Is there a more efficient/faster way to implement this in R?

The code takes a data.frame (specifically a tibble since I have been working with dplyr) named returnsWithValues.

> structure(list(date = structure(c(789, 820, 850, 881, 911, 942
), class = "Date"), value = c(0.94031052, 0.930751624153046, 
0.926756311376762, 0.874209664097166, 0.843026010916249, 2.1), 
 peak = c(1, 1, 1, 1, 1, 2.1), drawdown = c(-0.05968948, -0.0692483758469535, 
 -0.0732436886232377, -0.125790335902834, -0.156973989083751, 
 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))
# A tibble: 6 x 4
 date value peak drawdown
 <date> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597
2 1972-03-31 0.931 1 -0.0692
3 1972-04-30 0.927 1 -0.0732
4 1972-05-31 0.874 1 -0.126 
5 1972-06-30 0.843 1 -0.157 
6 1972-07-31 2.1 2.1 0

I have implemented drawdown.duration using a for loop:

returnsWithValues <- returnsWithValues %>% mutate(drawdown.duration = NA)

 # add drawdown.duration col
 for (row in 1:nrow(returnsWithValues)) 
 if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) 
 returnsWithValues[row,"drawdown.duration"] = 0
 else 
 if(row == 1)
 returnsWithValues[row,"drawdown.duration"] = 1
 else 
 returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1

Which gives the correct answer as:

> returnsWithValues
# A tibble: 6 x 5
 date value peak drawdown drawdown.duration
 <date> <dbl> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597 1
2 1972-03-31 0.931 1 -0.0692 2
3 1972-04-30 0.927 1 -0.0732 3
4 1972-05-31 0.874 1 -0.126 4
5 1972-06-30 0.843 1 -0.157 5
6 1972-07-31 2.1 2.1 0 0

r performance dplyr

asked Nov 15 '18 at 2:35

cpage

22212

asked Nov 15 '18 at 2:35

cpage

22212

asked Nov 15 '18 at 2:35

cpage

22212

asked Nov 15 '18 at 2:35

cpage

22212

asked Nov 15 '18 at 2:35

cpage

22212

add a comment |

2 Answers
2

active

oldest

votes

I think this will do it, as long as each peak value is unique and not repeated in another group later on:

returnsWithValues %>%
 group_by(peak) %>%
 mutate(drawdown.duration = cumsum(value != peak))

If you do have repeated peak values, you might need a way to group just within consecutive peak values, e.g.

returns %>%
 # Start counting the number of groups at 1, and every time
 # peak changes compared to the previous row, add 1
 mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
 group_by(peak_group) %>%
 mutate(drawdown.duration = cumsum(value != peak))

edited Nov 15 '18 at 3:03

answered Nov 15 '18 at 2:56

Marius

32.6k97378

add a comment |

I will remove the for loop as you want and I will use the idea of indexing.

indices <- function(returnsWithValues){
 indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
 indices_to_zero<-which(indices_logical) # which values are true
 indices_drawdpwn<-which(!indices_logical) # which values are false
 returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
 returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
 returnsWithValues

Here is you for loop wrapped in a function.

for_loop<-function(returnsWithValues)
 # add drawdown.duration col
 for (row in 1:nrow(returnsWithValues)) 
 if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) 
 returnsWithValues[row,"drawdown.duration"] = 0
 else 
 if(row == 1)
 returnsWithValues[row,"drawdown.duration"] = 1
 else 
 returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
 
 
 
 returnsWithValues

Here is a benchmark compared to your for loop.

microbenchmark::microbenchmark(
 "for loop" = flp<-for_loop(returnsWithValues),
 indices = ind<-indices(returnsWithValues),
 times = 10
)

Unit: microseconds
 expr min lq mean median uq max neval
 for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
 indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE

answered Nov 16 '18 at 18:29

Csd

31819

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311617%2fimprove-speed-of-drawdown-duration-implementation%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I think this will do it, as long as each peak value is unique and not repeated in another group later on:

returnsWithValues %>%
 group_by(peak) %>%
 mutate(drawdown.duration = cumsum(value != peak))

If you do have repeated peak values, you might need a way to group just within consecutive peak values, e.g.

returns %>%
 # Start counting the number of groups at 1, and every time
 # peak changes compared to the previous row, add 1
 mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
 group_by(peak_group) %>%
 mutate(drawdown.duration = cumsum(value != peak))

edited Nov 15 '18 at 3:03

answered Nov 15 '18 at 2:56

Marius

32.6k97378

add a comment |

I think this will do it, as long as each peak value is unique and not repeated in another group later on:

returnsWithValues %>%
 group_by(peak) %>%
 mutate(drawdown.duration = cumsum(value != peak))

If you do have repeated peak values, you might need a way to group just within consecutive peak values, e.g.

returns %>%
 # Start counting the number of groups at 1, and every time
 # peak changes compared to the previous row, add 1
 mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
 group_by(peak_group) %>%
 mutate(drawdown.duration = cumsum(value != peak))

edited Nov 15 '18 at 3:03

answered Nov 15 '18 at 2:56

Marius

32.6k97378

add a comment |

I think this will do it, as long as each peak value is unique and not repeated in another group later on:

returnsWithValues %>%
 group_by(peak) %>%
 mutate(drawdown.duration = cumsum(value != peak))

If you do have repeated peak values, you might need a way to group just within consecutive peak values, e.g.

returns %>%
 # Start counting the number of groups at 1, and every time
 # peak changes compared to the previous row, add 1
 mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
 group_by(peak_group) %>%
 mutate(drawdown.duration = cumsum(value != peak))

edited Nov 15 '18 at 3:03

answered Nov 15 '18 at 2:56

Marius

32.6k97378

I think this will do it, as long as each peak value is unique and not repeated in another group later on:

returnsWithValues %>%
 group_by(peak) %>%
 mutate(drawdown.duration = cumsum(value != peak))

If you do have repeated peak values, you might need a way to group just within consecutive peak values, e.g.

returns %>%
 # Start counting the number of groups at 1, and every time
 # peak changes compared to the previous row, add 1
 mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
 group_by(peak_group) %>%
 mutate(drawdown.duration = cumsum(value != peak))

edited Nov 15 '18 at 3:03

answered Nov 15 '18 at 2:56

Marius

32.6k97378

edited Nov 15 '18 at 3:03

answered Nov 15 '18 at 2:56

Marius

32.6k97378

answered Nov 15 '18 at 2:56

Marius

32.6k97378

answered Nov 15 '18 at 2:56

Marius

32.6k97378

add a comment |

I will remove the for loop as you want and I will use the idea of indexing.

indices <- function(returnsWithValues){
 indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
 indices_to_zero<-which(indices_logical) # which values are true
 indices_drawdpwn<-which(!indices_logical) # which values are false
 returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
 returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
 returnsWithValues

Here is you for loop wrapped in a function.

for_loop<-function(returnsWithValues)
 # add drawdown.duration col
 for (row in 1:nrow(returnsWithValues)) 
 if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) 
 returnsWithValues[row,"drawdown.duration"] = 0
 else 
 if(row == 1)
 returnsWithValues[row,"drawdown.duration"] = 1
 else 
 returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
 
 
 
 returnsWithValues

Here is a benchmark compared to your for loop.

microbenchmark::microbenchmark(
 "for loop" = flp<-for_loop(returnsWithValues),
 indices = ind<-indices(returnsWithValues),
 times = 10
)

Unit: microseconds
 expr min lq mean median uq max neval
 for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
 indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE

answered Nov 16 '18 at 18:29

Csd

31819

add a comment |

I will remove the for loop as you want and I will use the idea of indexing.

indices <- function(returnsWithValues){
 indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
 indices_to_zero<-which(indices_logical) # which values are true
 indices_drawdpwn<-which(!indices_logical) # which values are false
 returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
 returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
 returnsWithValues

Here is you for loop wrapped in a function.

for_loop<-function(returnsWithValues)
 # add drawdown.duration col
 for (row in 1:nrow(returnsWithValues)) 
 if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) 
 returnsWithValues[row,"drawdown.duration"] = 0
 else 
 if(row == 1)
 returnsWithValues[row,"drawdown.duration"] = 1
 else 
 returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
 
 
 
 returnsWithValues

Here is a benchmark compared to your for loop.

microbenchmark::microbenchmark(
 "for loop" = flp<-for_loop(returnsWithValues),
 indices = ind<-indices(returnsWithValues),
 times = 10
)

Unit: microseconds
 expr min lq mean median uq max neval
 for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
 indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE

answered Nov 16 '18 at 18:29

Csd

31819

add a comment |

I will remove the for loop as you want and I will use the idea of indexing.

indices <- function(returnsWithValues){
 indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
 indices_to_zero<-which(indices_logical) # which values are true
 indices_drawdpwn<-which(!indices_logical) # which values are false
 returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
 returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
 returnsWithValues

Here is you for loop wrapped in a function.

for_loop<-function(returnsWithValues)
 # add drawdown.duration col
 for (row in 1:nrow(returnsWithValues)) 
 if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) 
 returnsWithValues[row,"drawdown.duration"] = 0
 else 
 if(row == 1)
 returnsWithValues[row,"drawdown.duration"] = 1
 else 
 returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
 
 
 
 returnsWithValues

Here is a benchmark compared to your for loop.

microbenchmark::microbenchmark(
 "for loop" = flp<-for_loop(returnsWithValues),
 indices = ind<-indices(returnsWithValues),
 times = 10
)

Unit: microseconds
 expr min lq mean median uq max neval
 for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
 indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE

answered Nov 16 '18 at 18:29

Csd

31819

I will remove the for loop as you want and I will use the idea of indexing.

indices <- function(returnsWithValues){
 indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
 indices_to_zero<-which(indices_logical) # which values are true
 indices_drawdpwn<-which(!indices_logical) # which values are false
 returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
 returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
 returnsWithValues

Here is you for loop wrapped in a function.

for_loop<-function(returnsWithValues)
 # add drawdown.duration col
 for (row in 1:nrow(returnsWithValues)) 
 if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) 
 returnsWithValues[row,"drawdown.duration"] = 0
 else 
 if(row == 1)
 returnsWithValues[row,"drawdown.duration"] = 1
 else 
 returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
 
 
 
 returnsWithValues

Here is a benchmark compared to your for loop.

microbenchmark::microbenchmark(
 "for loop" = flp<-for_loop(returnsWithValues),
 indices = ind<-indices(returnsWithValues),
 times = 10
)

Unit: microseconds
 expr min lq mean median uq max neval
 for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
 indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE

answered Nov 16 '18 at 18:29

Csd

31819

answered Nov 16 '18 at 18:29

Csd

31819

answered Nov 16 '18 at 18:29

Csd

31819

answered Nov 16 '18 at 18:29

Csd

31819

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Odtnhj