Find the maximum in a certain time frame in a non-continuous time series
I have a dataframe with a time series that looks like this:
df<-structure(list(date = structure(c(-6905, -6891, -6853, -6588,
-6588, -6586, -6523, -6515, -5856, -5753), class = "Date"), flow = c(2.22,
2.56, 3.3, 1.38, 4, 1.4, 1.32, 1.26, 6, 35.69)), .Names = c("date",
"flow"), row.names = c(NA, 10L), class = "data.frame")
I want to remove all the lines that are not the maximum within 2 days forward or backward of its date. So in the case above, lines 4 and 6 will be removed. I couldn't find similar answered questions.
I wrote this code that doesn't work and it is ugly, long and doesn't take care of the edges of the dataframe:
idx <- c()
for (j in 3:(length(df$date)-2))
as.Date(df$date[j])-as.Date(df$date[j-2])<3)
if (df$flow[j]!=max(df$flow[(j-2):(j+2)]))
idx <- c(idx,j)
else if (as.Date(df$date[j+1])-as.Date(df$date[j])<3
Notice that the dates in the dataframe are not consecutive.
r dataframe time-series max
|
show 2 more comments
I have a dataframe with a time series that looks like this:
df<-structure(list(date = structure(c(-6905, -6891, -6853, -6588,
-6588, -6586, -6523, -6515, -5856, -5753), class = "Date"), flow = c(2.22,
2.56, 3.3, 1.38, 4, 1.4, 1.32, 1.26, 6, 35.69)), .Names = c("date",
"flow"), row.names = c(NA, 10L), class = "data.frame")
I want to remove all the lines that are not the maximum within 2 days forward or backward of its date. So in the case above, lines 4 and 6 will be removed. I couldn't find similar answered questions.
I wrote this code that doesn't work and it is ugly, long and doesn't take care of the edges of the dataframe:
idx <- c()
for (j in 3:(length(df$date)-2))
as.Date(df$date[j])-as.Date(df$date[j-2])<3)
if (df$flow[j]!=max(df$flow[(j-2):(j+2)]))
idx <- c(idx,j)
else if (as.Date(df$date[j+1])-as.Date(df$date[j])<3
Notice that the dates in the dataframe are not consecutive.
r dataframe time-series max
1
Are you sure that row 7 would be removed?
– RLave
Nov 14 '18 at 8:53
@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.
– asher
Nov 14 '18 at 9:08
are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7
– DJV
Nov 14 '18 at 9:13
Also in your example you have a date twice, is it correct?
– RLave
Nov 14 '18 at 9:21
1
@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.
– asher
Nov 14 '18 at 9:26
|
show 2 more comments
I have a dataframe with a time series that looks like this:
df<-structure(list(date = structure(c(-6905, -6891, -6853, -6588,
-6588, -6586, -6523, -6515, -5856, -5753), class = "Date"), flow = c(2.22,
2.56, 3.3, 1.38, 4, 1.4, 1.32, 1.26, 6, 35.69)), .Names = c("date",
"flow"), row.names = c(NA, 10L), class = "data.frame")
I want to remove all the lines that are not the maximum within 2 days forward or backward of its date. So in the case above, lines 4 and 6 will be removed. I couldn't find similar answered questions.
I wrote this code that doesn't work and it is ugly, long and doesn't take care of the edges of the dataframe:
idx <- c()
for (j in 3:(length(df$date)-2))
as.Date(df$date[j])-as.Date(df$date[j-2])<3)
if (df$flow[j]!=max(df$flow[(j-2):(j+2)]))
idx <- c(idx,j)
else if (as.Date(df$date[j+1])-as.Date(df$date[j])<3
Notice that the dates in the dataframe are not consecutive.
r dataframe time-series max
I have a dataframe with a time series that looks like this:
df<-structure(list(date = structure(c(-6905, -6891, -6853, -6588,
-6588, -6586, -6523, -6515, -5856, -5753), class = "Date"), flow = c(2.22,
2.56, 3.3, 1.38, 4, 1.4, 1.32, 1.26, 6, 35.69)), .Names = c("date",
"flow"), row.names = c(NA, 10L), class = "data.frame")
I want to remove all the lines that are not the maximum within 2 days forward or backward of its date. So in the case above, lines 4 and 6 will be removed. I couldn't find similar answered questions.
I wrote this code that doesn't work and it is ugly, long and doesn't take care of the edges of the dataframe:
idx <- c()
for (j in 3:(length(df$date)-2))
as.Date(df$date[j])-as.Date(df$date[j-2])<3)
if (df$flow[j]!=max(df$flow[(j-2):(j+2)]))
idx <- c(idx,j)
else if (as.Date(df$date[j+1])-as.Date(df$date[j])<3
Notice that the dates in the dataframe are not consecutive.
r dataframe time-series max
r dataframe time-series max
edited Nov 14 '18 at 9:31
asher
asked Nov 14 '18 at 8:37
asherasher
506
506
1
Are you sure that row 7 would be removed?
– RLave
Nov 14 '18 at 8:53
@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.
– asher
Nov 14 '18 at 9:08
are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7
– DJV
Nov 14 '18 at 9:13
Also in your example you have a date twice, is it correct?
– RLave
Nov 14 '18 at 9:21
1
@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.
– asher
Nov 14 '18 at 9:26
|
show 2 more comments
1
Are you sure that row 7 would be removed?
– RLave
Nov 14 '18 at 8:53
@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.
– asher
Nov 14 '18 at 9:08
are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7
– DJV
Nov 14 '18 at 9:13
Also in your example you have a date twice, is it correct?
– RLave
Nov 14 '18 at 9:21
1
@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.
– asher
Nov 14 '18 at 9:26
1
1
Are you sure that row 7 would be removed?
– RLave
Nov 14 '18 at 8:53
Are you sure that row 7 would be removed?
– RLave
Nov 14 '18 at 8:53
@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.
– asher
Nov 14 '18 at 9:08
@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.
– asher
Nov 14 '18 at 9:08
are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7
– DJV
Nov 14 '18 at 9:13
are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7
– DJV
Nov 14 '18 at 9:13
Also in your example you have a date twice, is it correct?
– RLave
Nov 14 '18 at 9:21
Also in your example you have a date twice, is it correct?
– RLave
Nov 14 '18 at 9:21
1
1
@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.
– asher
Nov 14 '18 at 9:26
@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.
– asher
Nov 14 '18 at 9:26
|
show 2 more comments
3 Answers
3
active
oldest
votes
Using the zoo
library.
library(zoo)
# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)
# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)
# pad with NAs to make the time series regular
dtf.zoo <- merge(
dtf.zoo,
zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)
# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))
# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]
# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69
which(!(dt$flow %in% dt2$flow))
# 4 6
hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.
– Darren Tsai
Nov 14 '18 at 10:56
@DarrenTsai: Indeed, thank you. I forgot to setpartial=TRUE
inrollapply()
, which caused an index mismatch.
– AkselA
Nov 14 '18 at 11:28
@AkselA: I added this condition:if (length(rem>0))df.zoo <- df.zoo[-rem]
to escape cases that don't have events that are less than 3 days apart
– asher
Nov 14 '18 at 13:26
@asher: That will work. Removingwhich()
and replacing-rem
with!rem
should also work.
– AkselA
Nov 14 '18 at 13:41
@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.
– asher
Nov 15 '18 at 11:40
add a comment |
You can also use the tidyverse
approch:
require(tidyverse)
df %>%
#Arrange by date
arrange(date) %>%
#Picking the max for each da
group_by(date) %>%
top_n(1, flow) %>%
ungroup() %>%
#Adding missing dates with NAs
complete(date = seq.Date(min(date), max(date), by="day")) %>%
#Remove Two up/down
mutate(
remove = case_when(
flow < rowMeans(data.frame(lag(flow, 1),
lag(flow, 2)), na.rm = TRUE) ~ "remove",
flow < rowMeans(data.frame(lead(flow, 1),
lead(flow, 2)), na.rm = TRUE) ~ "remove",
TRUE ~ "keep")) %>%
na.omit() %>%
filter(remove == "keep") %>%
select(-remove)
# A tibble: 8 x 2
date flow
<date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7
add a comment |
I use lapply()
to check the range : [date - 2 days , date + 2 days] of each date.
rm.list <- lapply(df$date, function(x)
ind <- which(abs(df$date - x) <= 2)
flow <- df$flow[ind]
if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
else NULL
)
rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]
# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53295976%2ffind-the-maximum-in-a-certain-time-frame-in-a-non-continuous-time-series%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using the zoo
library.
library(zoo)
# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)
# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)
# pad with NAs to make the time series regular
dtf.zoo <- merge(
dtf.zoo,
zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)
# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))
# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]
# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69
which(!(dt$flow %in% dt2$flow))
# 4 6
hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.
– Darren Tsai
Nov 14 '18 at 10:56
@DarrenTsai: Indeed, thank you. I forgot to setpartial=TRUE
inrollapply()
, which caused an index mismatch.
– AkselA
Nov 14 '18 at 11:28
@AkselA: I added this condition:if (length(rem>0))df.zoo <- df.zoo[-rem]
to escape cases that don't have events that are less than 3 days apart
– asher
Nov 14 '18 at 13:26
@asher: That will work. Removingwhich()
and replacing-rem
with!rem
should also work.
– AkselA
Nov 14 '18 at 13:41
@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.
– asher
Nov 15 '18 at 11:40
add a comment |
Using the zoo
library.
library(zoo)
# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)
# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)
# pad with NAs to make the time series regular
dtf.zoo <- merge(
dtf.zoo,
zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)
# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))
# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]
# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69
which(!(dt$flow %in% dt2$flow))
# 4 6
hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.
– Darren Tsai
Nov 14 '18 at 10:56
@DarrenTsai: Indeed, thank you. I forgot to setpartial=TRUE
inrollapply()
, which caused an index mismatch.
– AkselA
Nov 14 '18 at 11:28
@AkselA: I added this condition:if (length(rem>0))df.zoo <- df.zoo[-rem]
to escape cases that don't have events that are less than 3 days apart
– asher
Nov 14 '18 at 13:26
@asher: That will work. Removingwhich()
and replacing-rem
with!rem
should also work.
– AkselA
Nov 14 '18 at 13:41
@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.
– asher
Nov 15 '18 at 11:40
add a comment |
Using the zoo
library.
library(zoo)
# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)
# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)
# pad with NAs to make the time series regular
dtf.zoo <- merge(
dtf.zoo,
zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)
# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))
# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]
# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69
which(!(dt$flow %in% dt2$flow))
# 4 6
Using the zoo
library.
library(zoo)
# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)
# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)
# pad with NAs to make the time series regular
dtf.zoo <- merge(
dtf.zoo,
zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)
# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))
# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]
# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69
which(!(dt$flow %in% dt2$flow))
# 4 6
edited Nov 14 '18 at 11:24
answered Nov 14 '18 at 9:55
AkselAAkselA
4,49121325
4,49121325
hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.
– Darren Tsai
Nov 14 '18 at 10:56
@DarrenTsai: Indeed, thank you. I forgot to setpartial=TRUE
inrollapply()
, which caused an index mismatch.
– AkselA
Nov 14 '18 at 11:28
@AkselA: I added this condition:if (length(rem>0))df.zoo <- df.zoo[-rem]
to escape cases that don't have events that are less than 3 days apart
– asher
Nov 14 '18 at 13:26
@asher: That will work. Removingwhich()
and replacing-rem
with!rem
should also work.
– AkselA
Nov 14 '18 at 13:41
@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.
– asher
Nov 15 '18 at 11:40
add a comment |
hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.
– Darren Tsai
Nov 14 '18 at 10:56
@DarrenTsai: Indeed, thank you. I forgot to setpartial=TRUE
inrollapply()
, which caused an index mismatch.
– AkselA
Nov 14 '18 at 11:28
@AkselA: I added this condition:if (length(rem>0))df.zoo <- df.zoo[-rem]
to escape cases that don't have events that are less than 3 days apart
– asher
Nov 14 '18 at 13:26
@asher: That will work. Removingwhich()
and replacing-rem
with!rem
should also work.
– AkselA
Nov 14 '18 at 13:41
@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.
– asher
Nov 15 '18 at 11:40
hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.
– Darren Tsai
Nov 14 '18 at 10:56
hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.
– Darren Tsai
Nov 14 '18 at 10:56
@DarrenTsai: Indeed, thank you. I forgot to set
partial=TRUE
in rollapply()
, which caused an index mismatch.– AkselA
Nov 14 '18 at 11:28
@DarrenTsai: Indeed, thank you. I forgot to set
partial=TRUE
in rollapply()
, which caused an index mismatch.– AkselA
Nov 14 '18 at 11:28
@AkselA: I added this condition:
if (length(rem>0))df.zoo <- df.zoo[-rem]
to escape cases that don't have events that are less than 3 days apart– asher
Nov 14 '18 at 13:26
@AkselA: I added this condition:
if (length(rem>0))df.zoo <- df.zoo[-rem]
to escape cases that don't have events that are less than 3 days apart– asher
Nov 14 '18 at 13:26
@asher: That will work. Removing
which()
and replacing -rem
with !rem
should also work.– AkselA
Nov 14 '18 at 13:41
@asher: That will work. Removing
which()
and replacing -rem
with !rem
should also work.– AkselA
Nov 14 '18 at 13:41
@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.
– asher
Nov 15 '18 at 11:40
@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.
– asher
Nov 15 '18 at 11:40
add a comment |
You can also use the tidyverse
approch:
require(tidyverse)
df %>%
#Arrange by date
arrange(date) %>%
#Picking the max for each da
group_by(date) %>%
top_n(1, flow) %>%
ungroup() %>%
#Adding missing dates with NAs
complete(date = seq.Date(min(date), max(date), by="day")) %>%
#Remove Two up/down
mutate(
remove = case_when(
flow < rowMeans(data.frame(lag(flow, 1),
lag(flow, 2)), na.rm = TRUE) ~ "remove",
flow < rowMeans(data.frame(lead(flow, 1),
lead(flow, 2)), na.rm = TRUE) ~ "remove",
TRUE ~ "keep")) %>%
na.omit() %>%
filter(remove == "keep") %>%
select(-remove)
# A tibble: 8 x 2
date flow
<date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7
add a comment |
You can also use the tidyverse
approch:
require(tidyverse)
df %>%
#Arrange by date
arrange(date) %>%
#Picking the max for each da
group_by(date) %>%
top_n(1, flow) %>%
ungroup() %>%
#Adding missing dates with NAs
complete(date = seq.Date(min(date), max(date), by="day")) %>%
#Remove Two up/down
mutate(
remove = case_when(
flow < rowMeans(data.frame(lag(flow, 1),
lag(flow, 2)), na.rm = TRUE) ~ "remove",
flow < rowMeans(data.frame(lead(flow, 1),
lead(flow, 2)), na.rm = TRUE) ~ "remove",
TRUE ~ "keep")) %>%
na.omit() %>%
filter(remove == "keep") %>%
select(-remove)
# A tibble: 8 x 2
date flow
<date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7
add a comment |
You can also use the tidyverse
approch:
require(tidyverse)
df %>%
#Arrange by date
arrange(date) %>%
#Picking the max for each da
group_by(date) %>%
top_n(1, flow) %>%
ungroup() %>%
#Adding missing dates with NAs
complete(date = seq.Date(min(date), max(date), by="day")) %>%
#Remove Two up/down
mutate(
remove = case_when(
flow < rowMeans(data.frame(lag(flow, 1),
lag(flow, 2)), na.rm = TRUE) ~ "remove",
flow < rowMeans(data.frame(lead(flow, 1),
lead(flow, 2)), na.rm = TRUE) ~ "remove",
TRUE ~ "keep")) %>%
na.omit() %>%
filter(remove == "keep") %>%
select(-remove)
# A tibble: 8 x 2
date flow
<date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7
You can also use the tidyverse
approch:
require(tidyverse)
df %>%
#Arrange by date
arrange(date) %>%
#Picking the max for each da
group_by(date) %>%
top_n(1, flow) %>%
ungroup() %>%
#Adding missing dates with NAs
complete(date = seq.Date(min(date), max(date), by="day")) %>%
#Remove Two up/down
mutate(
remove = case_when(
flow < rowMeans(data.frame(lag(flow, 1),
lag(flow, 2)), na.rm = TRUE) ~ "remove",
flow < rowMeans(data.frame(lead(flow, 1),
lead(flow, 2)), na.rm = TRUE) ~ "remove",
TRUE ~ "keep")) %>%
na.omit() %>%
filter(remove == "keep") %>%
select(-remove)
# A tibble: 8 x 2
date flow
<date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7
edited Nov 14 '18 at 10:41
answered Nov 14 '18 at 9:16
DJVDJV
1,3761317
1,3761317
add a comment |
add a comment |
I use lapply()
to check the range : [date - 2 days , date + 2 days] of each date.
rm.list <- lapply(df$date, function(x)
ind <- which(abs(df$date - x) <= 2)
flow <- df$flow[ind]
if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
else NULL
)
rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]
# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69
add a comment |
I use lapply()
to check the range : [date - 2 days , date + 2 days] of each date.
rm.list <- lapply(df$date, function(x)
ind <- which(abs(df$date - x) <= 2)
flow <- df$flow[ind]
if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
else NULL
)
rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]
# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69
add a comment |
I use lapply()
to check the range : [date - 2 days , date + 2 days] of each date.
rm.list <- lapply(df$date, function(x)
ind <- which(abs(df$date - x) <= 2)
flow <- df$flow[ind]
if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
else NULL
)
rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]
# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69
I use lapply()
to check the range : [date - 2 days , date + 2 days] of each date.
rm.list <- lapply(df$date, function(x)
ind <- which(abs(df$date - x) <= 2)
flow <- df$flow[ind]
if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
else NULL
)
rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]
# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69
edited Nov 14 '18 at 10:49
answered Nov 14 '18 at 10:37
Darren TsaiDarren Tsai
2,3102428
2,3102428
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53295976%2ffind-the-maximum-in-a-certain-time-frame-in-a-non-continuous-time-series%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Are you sure that row 7 would be removed?
– RLave
Nov 14 '18 at 8:53
@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.
– asher
Nov 14 '18 at 9:08
are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7
– DJV
Nov 14 '18 at 9:13
Also in your example you have a date twice, is it correct?
– RLave
Nov 14 '18 at 9:21
1
@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.
– asher
Nov 14 '18 at 9:26