How to take mean from month April of previous year to July in R?

Month Year Rainfall
4 2010
5 2010
6 2010
7 2010
8 2010
9 2010
10 2010
11 2010
12 2010
1 2011
2 2011
3 2011
4 2011
5 2011
6 2011
7 2011

I want to get the average from the month of 4 of 2010 to 7 of July 2011 and then start to get average from the month of 4 of 2011 to 7 of July 2012?

I have tried this code but it worked for the first part only so can anyone help me on the second part?

## The code
subdataLGSP<-
 subset(df2.ppt.mon, (Year %in% c(2010,2011,2012,2013,2014,2015,2016)) & (month %in% c(4,5,6,7,8,9,10,11,12))) #Apr from previous year tp July 
Subdatanext<-
 subset(df2.ppt.mon, (Year %in% c(2011,2012,2013,2014,2015,2016)) & (month %in% c(1,2,3,4,5,6,7))) # Apr from previous year to next July 

subdataprnext<-
 rbind(subdataLGSP,Subdatanext)

df2prnext<-
 aggregate(subdataprnext$RAIN, by = list(month = subdataprnext$month, Year= subdataprnext$Year), mean)

library(data.table)
setDT(df2prnext)
n <- 16 # every 16 rows
datPRApOct<-
 df2prnext[, mean(x), by= (seq(nrow(df2prnext)) - 1) %/% n]# This is what we want for seasonal precipitation

edited Nov 15 '18 at 17:25

abhiieor

1,27431531

asked Nov 15 '18 at 17:16

Sonisa Sharma

104

Welcome to SO! Just to clarify: do you mean July by 7 ot July.

– Jrakru56
Nov 15 '18 at 17:26

Yes July mean month 7. Thank you.

– Sonisa Sharma
Nov 15 '18 at 17:27

If one of the answers addresses your question, please accept it; doing so not only provides a little perk to the answerer with some points, but also provides some closure for readers with similar questions. Though you can only accept one answer, you have the option to up-vote as many as you think are helpful. (If there are still issues, you will likely need to edit your question with further details.)

– r2evans
Nov 15 '18 at 19:53

add a comment |

Month Year Rainfall
4 2010
5 2010
6 2010
7 2010
8 2010
9 2010
10 2010
11 2010
12 2010
1 2011
2 2011
3 2011
4 2011
5 2011
6 2011
7 2011

I want to get the average from the month of 4 of 2010 to 7 of July 2011 and then start to get average from the month of 4 of 2011 to 7 of July 2012?

I have tried this code but it worked for the first part only so can anyone help me on the second part?

## The code
subdataLGSP<-
 subset(df2.ppt.mon, (Year %in% c(2010,2011,2012,2013,2014,2015,2016)) & (month %in% c(4,5,6,7,8,9,10,11,12))) #Apr from previous year tp July 
Subdatanext<-
 subset(df2.ppt.mon, (Year %in% c(2011,2012,2013,2014,2015,2016)) & (month %in% c(1,2,3,4,5,6,7))) # Apr from previous year to next July 

subdataprnext<-
 rbind(subdataLGSP,Subdatanext)

df2prnext<-
 aggregate(subdataprnext$RAIN, by = list(month = subdataprnext$month, Year= subdataprnext$Year), mean)

library(data.table)
setDT(df2prnext)
n <- 16 # every 16 rows
datPRApOct<-
 df2prnext[, mean(x), by= (seq(nrow(df2prnext)) - 1) %/% n]# This is what we want for seasonal precipitation

edited Nov 15 '18 at 17:25

abhiieor

1,27431531

asked Nov 15 '18 at 17:16

Sonisa Sharma

104

Welcome to SO! Just to clarify: do you mean July by 7 ot July.

– Jrakru56
Nov 15 '18 at 17:26

Yes July mean month 7. Thank you.

– Sonisa Sharma
Nov 15 '18 at 17:27

If one of the answers addresses your question, please accept it; doing so not only provides a little perk to the answerer with some points, but also provides some closure for readers with similar questions. Though you can only accept one answer, you have the option to up-vote as many as you think are helpful. (If there are still issues, you will likely need to edit your question with further details.)

– r2evans
Nov 15 '18 at 19:53

add a comment |

Month Year Rainfall
4 2010
5 2010
6 2010
7 2010
8 2010
9 2010
10 2010
11 2010
12 2010
1 2011
2 2011
3 2011
4 2011
5 2011
6 2011
7 2011

I want to get the average from the month of 4 of 2010 to 7 of July 2011 and then start to get average from the month of 4 of 2011 to 7 of July 2012?

I have tried this code but it worked for the first part only so can anyone help me on the second part?

## The code
subdataLGSP<-
 subset(df2.ppt.mon, (Year %in% c(2010,2011,2012,2013,2014,2015,2016)) & (month %in% c(4,5,6,7,8,9,10,11,12))) #Apr from previous year tp July 
Subdatanext<-
 subset(df2.ppt.mon, (Year %in% c(2011,2012,2013,2014,2015,2016)) & (month %in% c(1,2,3,4,5,6,7))) # Apr from previous year to next July 

subdataprnext<-
 rbind(subdataLGSP,Subdatanext)

df2prnext<-
 aggregate(subdataprnext$RAIN, by = list(month = subdataprnext$month, Year= subdataprnext$Year), mean)

library(data.table)
setDT(df2prnext)
n <- 16 # every 16 rows
datPRApOct<-
 df2prnext[, mean(x), by= (seq(nrow(df2prnext)) - 1) %/% n]# This is what we want for seasonal precipitation

edited Nov 15 '18 at 17:25

abhiieor

1,27431531

asked Nov 15 '18 at 17:16

Sonisa Sharma

104

Month Year Rainfall
4 2010
5 2010
6 2010
7 2010
8 2010
9 2010
10 2010
11 2010
12 2010
1 2011
2 2011
3 2011
4 2011
5 2011
6 2011
7 2011

I want to get the average from the month of 4 of 2010 to 7 of July 2011 and then start to get average from the month of 4 of 2011 to 7 of July 2012?

I have tried this code but it worked for the first part only so can anyone help me on the second part?

## The code
subdataLGSP<-
 subset(df2.ppt.mon, (Year %in% c(2010,2011,2012,2013,2014,2015,2016)) & (month %in% c(4,5,6,7,8,9,10,11,12))) #Apr from previous year tp July 
Subdatanext<-
 subset(df2.ppt.mon, (Year %in% c(2011,2012,2013,2014,2015,2016)) & (month %in% c(1,2,3,4,5,6,7))) # Apr from previous year to next July 

subdataprnext<-
 rbind(subdataLGSP,Subdatanext)

df2prnext<-
 aggregate(subdataprnext$RAIN, by = list(month = subdataprnext$month, Year= subdataprnext$Year), mean)

library(data.table)
setDT(df2prnext)
n <- 16 # every 16 rows
datPRApOct<-
 df2prnext[, mean(x), by= (seq(nrow(df2prnext)) - 1) %/% n]# This is what we want for seasonal precipitation

r mean

edited Nov 15 '18 at 17:25

abhiieor

1,27431531

asked Nov 15 '18 at 17:16

Sonisa Sharma

104

edited Nov 15 '18 at 17:25

abhiieor

1,27431531

asked Nov 15 '18 at 17:16

Sonisa Sharma

104

edited Nov 15 '18 at 17:25

abhiieor

1,27431531

edited Nov 15 '18 at 17:25

abhiieor

1,27431531

edited Nov 15 '18 at 17:25

abhiieor

1,27431531

asked Nov 15 '18 at 17:16

Sonisa Sharma

104

asked Nov 15 '18 at 17:16

Sonisa Sharma

104

asked Nov 15 '18 at 17:16

Sonisa Sharma

104

Welcome to SO! Just to clarify: do you mean July by 7 ot July.

– Jrakru56
Nov 15 '18 at 17:26

Yes July mean month 7. Thank you.

– Sonisa Sharma
Nov 15 '18 at 17:27

If one of the answers addresses your question, please accept it; doing so not only provides a little perk to the answerer with some points, but also provides some closure for readers with similar questions. Though you can only accept one answer, you have the option to up-vote as many as you think are helpful. (If there are still issues, you will likely need to edit your question with further details.)

– r2evans
Nov 15 '18 at 19:53

add a comment |

Welcome to SO! Just to clarify: do you mean July by 7 ot July.

– Jrakru56
Nov 15 '18 at 17:26

Yes July mean month 7. Thank you.

– Sonisa Sharma
Nov 15 '18 at 17:27

If one of the answers addresses your question, please accept it; doing so not only provides a little perk to the answerer with some points, but also provides some closure for readers with similar questions. Though you can only accept one answer, you have the option to up-vote as many as you think are helpful. (If there are still issues, you will likely need to edit your question with further details.)

– r2evans
Nov 15 '18 at 19:53

Welcome to SO! Just to clarify: do you mean July by 7 ot July.

– Jrakru56
Nov 15 '18 at 17:26

Yes July mean month 7. Thank you.

– Sonisa Sharma
Nov 15 '18 at 17:27

If one of the answers addresses your question, please accept it; doing so not only provides a little perk to the answerer with some points, but also provides some closure for readers with similar questions. Though you can only accept one answer, you have the option to up-vote as many as you think are helpful. (If there are still issues, you will likely need to edit your question with further details.)

– r2evans
Nov 15 '18 at 19:53

add a comment |

2 Answers
2

active

oldest

votes

Something like this would work:

One line to create the grouping and the rest is standard R stuff

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

All together we have:

library(dplyr)

df <- structure(list(Month = c(4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
 1L, 2L, 3L, 4L, 5L, 6L, 7L), Year = c(2010L, 2010L, 2010L, 2010L, 
 2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 
 2011L, 2011L, 2011L), Rainfall = c(3L, 4L, 5L, 3L, 4L, 5L, 6L, 
 7L, 8L, 4L, 3L, 4L, 5L, 6L, 5L, 4L)), row.names = c(NA, -16L), class = c("data.table", 
 "data.frame"))

df
#> Month Year Rainfall
#> 1 4 2010 3
#> 2 5 2010 4
#> 3 6 2010 5
#> 4 7 2010 3
#> 5 8 2010 4
#> 6 9 2010 5
#> 7 10 2010 6
#> 8 11 2010 7
#> 9 12 2010 8
#> 10 1 2011 4
#> 11 2 2011 3
#> 12 3 2011 4
#> 13 4 2011 5
#> 14 5 2011 6
#> 15 6 2011 5
#> 16 7 2011 4

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

df
#> Month Year Rainfall gp
#> 1 4 2010 3 0
#> 2 5 2010 4 0
#> 3 6 2010 5 0
#> 4 7 2010 3 0
#> 5 8 2010 4 0
#> 6 9 2010 5 0
#> 7 10 2010 6 0
#> 8 11 2010 7 0
#> 9 12 2010 8 0
#> 10 1 2011 4 0
#> 11 2 2011 3 0
#> 12 3 2011 4 1
#> 13 4 2011 5 1
#> 14 5 2011 6 1
#> 15 6 2011 5 1
#> 16 7 2011 4 1

df %>% group_by(gp) %>% summarise(mean(Rainfall))
#> # A tibble: 2 x 2
#> gp `mean(Rainfall)`
#> <dbl> <dbl>
#> 1 0 4.73
#> 2 1 4.8

There are arguably better ways to deal with this windowing problem using lubridate package or by converting to a ts object.

edited Nov 15 '18 at 17:56

answered Nov 15 '18 at 17:49

Jrakru56

609212

Thank you so much for your code. Your code helped me a lot. But I am trying to get the mean of all the 16th row and the next set of the mean will be from 13th row to 32th row.

– Sonisa Sharma
Nov 15 '18 at 19:50

add a comment |

Using my own fabricated data (below), here's a solution:

sapply(years, function(yr) (Year == yr+1 & Month <= 7))$Rainfall)
)
# [1] 0.5421714 0.4412616 0.4867803

(for 2010, 2011, and 2012, respectively).

This does not strictly check to ensure we have all months (including 4 and 7) in each range, that's a different discussion.

For explanation:

seq(min(x$Year), max(x$Year)-1): iterate by year from the first to the second-to-last (assuming contiguous years);

(Year == yr & Month >= 4): include all data that is in this year and at or after month 4, or ...

| (Year == yr+1 & Month <= 7): next year and month at/before 7.

from there, simply sum( subset(...)$Rainfall )

The mid-step looks like this (with my data):

sapply(seq(min(x$Year), max(x$Year)-1), function(yr) 
 subset(x, (Year == yr & Month >= 4) , simplify=F)
# [[1]]
# Month Year Rainfall
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750
# 7 7 2010 0.1291590
# 8 8 2010 0.8334488
# 9 9 2010 0.4680185
# 10 10 2010 0.5499837
# 11 11 2010 0.5526741
# 12 12 2010 0.2388948
# 13 1 2011 0.7605133
# 14 2 2011 0.1808201
# 15 3 2011 0.4052822
# 16 4 2011 0.8535485
# 17 5 2011 0.9763985
# 18 6 2011 0.2258255
# 19 7 2011 0.4448092
# [[2]]
# Month Year Rainfall
# 16 4 2011 0.85354845
# 17 5 2011 0.97639849
# 18 6 2011 0.22582546
# 19 7 2011 0.44480923
# 20 8 2011 0.07497942
# 21 9 2011 0.66189876
# 22 10 2011 0.38754954
# 23 11 2011 0.83688918
# 24 12 2011 0.15050144
# 25 1 2012 0.34727225
# 26 2 2012 0.48877323
# 27 3 2012 0.14924686
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# [[3]]
# Month Year Rainfall
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# 32 8 2012 0.16464224
# 33 9 2012 0.81019214
# 34 10 2012 0.86886104
# 35 11 2012 0.51428176
# 36 12 2012 0.62719629
# 37 1 2013 0.84442900
# 38 2 2013 0.28487057
# 39 3 2013 0.66722565
# 40 4 2013 0.15046975
# 41 5 2013 0.98172786
# 42 6 2013 0.29701074
# 43 7 2013 0.11508408

Data:

set.seed(2)
years <- 4
x <- data.frame(
 Month = rep(1:12, times=years),
 Year = rep(2009 + seq_len(years), each=12),
 Rainfall = runif(12*years)
)
head(x)
# Month Year Rainfall
# 1 1 2010 0.1848823
# 2 2 2010 0.7023740
# 3 3 2010 0.5733263
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750

edited Nov 15 '18 at 21:10

answered Nov 15 '18 at 17:55

r2evans

27.9k33159

It perfectly worked thank you.

– Sonisa Sharma
Nov 15 '18 at 19:50

Month Year Rainfall 4 2010 484.6 5 2010 630.32 6 2010 35.31 7 2010 637.64 8 2010 238.57 9 2010 1129.35 10 2010 376.78 11 2010 282.78 12 2010 324.58 1 2011 338.6 2 2011 859.37 3 2011 66.24 4 2011 38.36 We should get 418 as average value but I am getting 369.12

– Sonisa Sharma
Nov 15 '18 at 20:20

That's an incomplete dataset, it does not span from 4/2010 to 7/2011. However, when I use that data, I get 418.6538. For future discussions, data like that does poorly in comments, please edit your question and put it there in an easily consumed format, such as the output from dput(x), dput(head(x,n=?)) (top ? rows if large), data.frame(...), or read.table(text="...", ...).

– r2evans
Nov 15 '18 at 21:13

If you mean that the missing months should count as 0, then ... you need to provide a usable example in your question that states that and includes some missingness in the data.

– r2evans
Nov 15 '18 at 21:14

Is there a way that I can attach the data?

– Sonisa Sharma
Nov 15 '18 at 21:33

|
show 4 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53324746%2fhow-to-take-mean-from-month-april-of-previous-year-to-july-in-r%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Something like this would work:

One line to create the grouping and the rest is standard R stuff

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

All together we have:

library(dplyr)

df <- structure(list(Month = c(4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
 1L, 2L, 3L, 4L, 5L, 6L, 7L), Year = c(2010L, 2010L, 2010L, 2010L, 
 2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 
 2011L, 2011L, 2011L), Rainfall = c(3L, 4L, 5L, 3L, 4L, 5L, 6L, 
 7L, 8L, 4L, 3L, 4L, 5L, 6L, 5L, 4L)), row.names = c(NA, -16L), class = c("data.table", 
 "data.frame"))

df
#> Month Year Rainfall
#> 1 4 2010 3
#> 2 5 2010 4
#> 3 6 2010 5
#> 4 7 2010 3
#> 5 8 2010 4
#> 6 9 2010 5
#> 7 10 2010 6
#> 8 11 2010 7
#> 9 12 2010 8
#> 10 1 2011 4
#> 11 2 2011 3
#> 12 3 2011 4
#> 13 4 2011 5
#> 14 5 2011 6
#> 15 6 2011 5
#> 16 7 2011 4

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

df
#> Month Year Rainfall gp
#> 1 4 2010 3 0
#> 2 5 2010 4 0
#> 3 6 2010 5 0
#> 4 7 2010 3 0
#> 5 8 2010 4 0
#> 6 9 2010 5 0
#> 7 10 2010 6 0
#> 8 11 2010 7 0
#> 9 12 2010 8 0
#> 10 1 2011 4 0
#> 11 2 2011 3 0
#> 12 3 2011 4 1
#> 13 4 2011 5 1
#> 14 5 2011 6 1
#> 15 6 2011 5 1
#> 16 7 2011 4 1

df %>% group_by(gp) %>% summarise(mean(Rainfall))
#> # A tibble: 2 x 2
#> gp `mean(Rainfall)`
#> <dbl> <dbl>
#> 1 0 4.73
#> 2 1 4.8

There are arguably better ways to deal with this windowing problem using lubridate package or by converting to a ts object.

edited Nov 15 '18 at 17:56

answered Nov 15 '18 at 17:49

Jrakru56

609212

Thank you so much for your code. Your code helped me a lot. But I am trying to get the mean of all the 16th row and the next set of the mean will be from 13th row to 32th row.

– Sonisa Sharma
Nov 15 '18 at 19:50

add a comment |

Something like this would work:

One line to create the grouping and the rest is standard R stuff

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

All together we have:

library(dplyr)

df <- structure(list(Month = c(4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
 1L, 2L, 3L, 4L, 5L, 6L, 7L), Year = c(2010L, 2010L, 2010L, 2010L, 
 2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 
 2011L, 2011L, 2011L), Rainfall = c(3L, 4L, 5L, 3L, 4L, 5L, 6L, 
 7L, 8L, 4L, 3L, 4L, 5L, 6L, 5L, 4L)), row.names = c(NA, -16L), class = c("data.table", 
 "data.frame"))

df
#> Month Year Rainfall
#> 1 4 2010 3
#> 2 5 2010 4
#> 3 6 2010 5
#> 4 7 2010 3
#> 5 8 2010 4
#> 6 9 2010 5
#> 7 10 2010 6
#> 8 11 2010 7
#> 9 12 2010 8
#> 10 1 2011 4
#> 11 2 2011 3
#> 12 3 2011 4
#> 13 4 2011 5
#> 14 5 2011 6
#> 15 6 2011 5
#> 16 7 2011 4

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

df
#> Month Year Rainfall gp
#> 1 4 2010 3 0
#> 2 5 2010 4 0
#> 3 6 2010 5 0
#> 4 7 2010 3 0
#> 5 8 2010 4 0
#> 6 9 2010 5 0
#> 7 10 2010 6 0
#> 8 11 2010 7 0
#> 9 12 2010 8 0
#> 10 1 2011 4 0
#> 11 2 2011 3 0
#> 12 3 2011 4 1
#> 13 4 2011 5 1
#> 14 5 2011 6 1
#> 15 6 2011 5 1
#> 16 7 2011 4 1

df %>% group_by(gp) %>% summarise(mean(Rainfall))
#> # A tibble: 2 x 2
#> gp `mean(Rainfall)`
#> <dbl> <dbl>
#> 1 0 4.73
#> 2 1 4.8

There are arguably better ways to deal with this windowing problem using lubridate package or by converting to a ts object.

edited Nov 15 '18 at 17:56

answered Nov 15 '18 at 17:49

Jrakru56

609212

Thank you so much for your code. Your code helped me a lot. But I am trying to get the mean of all the 16th row and the next set of the mean will be from 13th row to 32th row.

– Sonisa Sharma
Nov 15 '18 at 19:50

add a comment |

Something like this would work:

One line to create the grouping and the rest is standard R stuff

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

All together we have:

library(dplyr)

df <- structure(list(Month = c(4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
 1L, 2L, 3L, 4L, 5L, 6L, 7L), Year = c(2010L, 2010L, 2010L, 2010L, 
 2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 
 2011L, 2011L, 2011L), Rainfall = c(3L, 4L, 5L, 3L, 4L, 5L, 6L, 
 7L, 8L, 4L, 3L, 4L, 5L, 6L, 5L, 4L)), row.names = c(NA, -16L), class = c("data.table", 
 "data.frame"))

df
#> Month Year Rainfall
#> 1 4 2010 3
#> 2 5 2010 4
#> 3 6 2010 5
#> 4 7 2010 3
#> 5 8 2010 4
#> 6 9 2010 5
#> 7 10 2010 6
#> 8 11 2010 7
#> 9 12 2010 8
#> 10 1 2011 4
#> 11 2 2011 3
#> 12 3 2011 4
#> 13 4 2011 5
#> 14 5 2011 6
#> 15 6 2011 5
#> 16 7 2011 4

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

df
#> Month Year Rainfall gp
#> 1 4 2010 3 0
#> 2 5 2010 4 0
#> 3 6 2010 5 0
#> 4 7 2010 3 0
#> 5 8 2010 4 0
#> 6 9 2010 5 0
#> 7 10 2010 6 0
#> 8 11 2010 7 0
#> 9 12 2010 8 0
#> 10 1 2011 4 0
#> 11 2 2011 3 0
#> 12 3 2011 4 1
#> 13 4 2011 5 1
#> 14 5 2011 6 1
#> 15 6 2011 5 1
#> 16 7 2011 4 1

df %>% group_by(gp) %>% summarise(mean(Rainfall))
#> # A tibble: 2 x 2
#> gp `mean(Rainfall)`
#> <dbl> <dbl>
#> 1 0 4.73
#> 2 1 4.8

There are arguably better ways to deal with this windowing problem using lubridate package or by converting to a ts object.

edited Nov 15 '18 at 17:56

answered Nov 15 '18 at 17:49

Jrakru56

609212

Something like this would work:

One line to create the grouping and the rest is standard R stuff

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

All together we have:

library(dplyr)

df <- structure(list(Month = c(4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
 1L, 2L, 3L, 4L, 5L, 6L, 7L), Year = c(2010L, 2010L, 2010L, 2010L, 
 2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 
 2011L, 2011L, 2011L), Rainfall = c(3L, 4L, 5L, 3L, 4L, 5L, 6L, 
 7L, 8L, 4L, 3L, 4L, 5L, 6L, 5L, 4L)), row.names = c(NA, -16L), class = c("data.table", 
 "data.frame"))

df
#> Month Year Rainfall
#> 1 4 2010 3
#> 2 5 2010 4
#> 3 6 2010 5
#> 4 7 2010 3
#> 5 8 2010 4
#> 6 9 2010 5
#> 7 10 2010 6
#> 8 11 2010 7
#> 9 12 2010 8
#> 10 1 2011 4
#> 11 2 2011 3
#> 12 3 2011 4
#> 13 4 2011 5
#> 14 5 2011 6
#> 15 6 2011 5
#> 16 7 2011 4

df$gp<- sapply(1:nrow(df), function(x) x%/%12)

df
#> Month Year Rainfall gp
#> 1 4 2010 3 0
#> 2 5 2010 4 0
#> 3 6 2010 5 0
#> 4 7 2010 3 0
#> 5 8 2010 4 0
#> 6 9 2010 5 0
#> 7 10 2010 6 0
#> 8 11 2010 7 0
#> 9 12 2010 8 0
#> 10 1 2011 4 0
#> 11 2 2011 3 0
#> 12 3 2011 4 1
#> 13 4 2011 5 1
#> 14 5 2011 6 1
#> 15 6 2011 5 1
#> 16 7 2011 4 1

df %>% group_by(gp) %>% summarise(mean(Rainfall))
#> # A tibble: 2 x 2
#> gp `mean(Rainfall)`
#> <dbl> <dbl>
#> 1 0 4.73
#> 2 1 4.8

There are arguably better ways to deal with this windowing problem using lubridate package or by converting to a ts object.

edited Nov 15 '18 at 17:56

answered Nov 15 '18 at 17:49

Jrakru56

609212

edited Nov 15 '18 at 17:56

answered Nov 15 '18 at 17:49

Jrakru56

609212

answered Nov 15 '18 at 17:49

Jrakru56

609212

answered Nov 15 '18 at 17:49

Jrakru56

609212

Thank you so much for your code. Your code helped me a lot. But I am trying to get the mean of all the 16th row and the next set of the mean will be from 13th row to 32th row.

– Sonisa Sharma
Nov 15 '18 at 19:50

add a comment |

Thank you so much for your code. Your code helped me a lot. But I am trying to get the mean of all the 16th row and the next set of the mean will be from 13th row to 32th row.

– Sonisa Sharma
Nov 15 '18 at 19:50

Thank you so much for your code. Your code helped me a lot. But I am trying to get the mean of all the 16th row and the next set of the mean will be from 13th row to 32th row.

– Sonisa Sharma
Nov 15 '18 at 19:50

add a comment |

Using my own fabricated data (below), here's a solution:

sapply(years, function(yr) (Year == yr+1 & Month <= 7))$Rainfall)
)
# [1] 0.5421714 0.4412616 0.4867803

(for 2010, 2011, and 2012, respectively).

This does not strictly check to ensure we have all months (including 4 and 7) in each range, that's a different discussion.

For explanation:

seq(min(x$Year), max(x$Year)-1): iterate by year from the first to the second-to-last (assuming contiguous years);

(Year == yr & Month >= 4): include all data that is in this year and at or after month 4, or ...

| (Year == yr+1 & Month <= 7): next year and month at/before 7.

from there, simply sum( subset(...)$Rainfall )

The mid-step looks like this (with my data):

sapply(seq(min(x$Year), max(x$Year)-1), function(yr) 
 subset(x, (Year == yr & Month >= 4) , simplify=F)
# [[1]]
# Month Year Rainfall
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750
# 7 7 2010 0.1291590
# 8 8 2010 0.8334488
# 9 9 2010 0.4680185
# 10 10 2010 0.5499837
# 11 11 2010 0.5526741
# 12 12 2010 0.2388948
# 13 1 2011 0.7605133
# 14 2 2011 0.1808201
# 15 3 2011 0.4052822
# 16 4 2011 0.8535485
# 17 5 2011 0.9763985
# 18 6 2011 0.2258255
# 19 7 2011 0.4448092
# [[2]]
# Month Year Rainfall
# 16 4 2011 0.85354845
# 17 5 2011 0.97639849
# 18 6 2011 0.22582546
# 19 7 2011 0.44480923
# 20 8 2011 0.07497942
# 21 9 2011 0.66189876
# 22 10 2011 0.38754954
# 23 11 2011 0.83688918
# 24 12 2011 0.15050144
# 25 1 2012 0.34727225
# 26 2 2012 0.48877323
# 27 3 2012 0.14924686
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# [[3]]
# Month Year Rainfall
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# 32 8 2012 0.16464224
# 33 9 2012 0.81019214
# 34 10 2012 0.86886104
# 35 11 2012 0.51428176
# 36 12 2012 0.62719629
# 37 1 2013 0.84442900
# 38 2 2013 0.28487057
# 39 3 2013 0.66722565
# 40 4 2013 0.15046975
# 41 5 2013 0.98172786
# 42 6 2013 0.29701074
# 43 7 2013 0.11508408

Data:

set.seed(2)
years <- 4
x <- data.frame(
 Month = rep(1:12, times=years),
 Year = rep(2009 + seq_len(years), each=12),
 Rainfall = runif(12*years)
)
head(x)
# Month Year Rainfall
# 1 1 2010 0.1848823
# 2 2 2010 0.7023740
# 3 3 2010 0.5733263
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750

edited Nov 15 '18 at 21:10

answered Nov 15 '18 at 17:55

r2evans

27.9k33159

It perfectly worked thank you.

– Sonisa Sharma
Nov 15 '18 at 19:50

Month Year Rainfall 4 2010 484.6 5 2010 630.32 6 2010 35.31 7 2010 637.64 8 2010 238.57 9 2010 1129.35 10 2010 376.78 11 2010 282.78 12 2010 324.58 1 2011 338.6 2 2011 859.37 3 2011 66.24 4 2011 38.36 We should get 418 as average value but I am getting 369.12

– Sonisa Sharma
Nov 15 '18 at 20:20

That's an incomplete dataset, it does not span from 4/2010 to 7/2011. However, when I use that data, I get 418.6538. For future discussions, data like that does poorly in comments, please edit your question and put it there in an easily consumed format, such as the output from dput(x), dput(head(x,n=?)) (top ? rows if large), data.frame(...), or read.table(text="...", ...).

– r2evans
Nov 15 '18 at 21:13

If you mean that the missing months should count as 0, then ... you need to provide a usable example in your question that states that and includes some missingness in the data.

– r2evans
Nov 15 '18 at 21:14

Is there a way that I can attach the data?

– Sonisa Sharma
Nov 15 '18 at 21:33

|
show 4 more comments

Using my own fabricated data (below), here's a solution:

sapply(years, function(yr) (Year == yr+1 & Month <= 7))$Rainfall)
)
# [1] 0.5421714 0.4412616 0.4867803

(for 2010, 2011, and 2012, respectively).

This does not strictly check to ensure we have all months (including 4 and 7) in each range, that's a different discussion.

For explanation:

seq(min(x$Year), max(x$Year)-1): iterate by year from the first to the second-to-last (assuming contiguous years);

(Year == yr & Month >= 4): include all data that is in this year and at or after month 4, or ...

| (Year == yr+1 & Month <= 7): next year and month at/before 7.

from there, simply sum( subset(...)$Rainfall )

The mid-step looks like this (with my data):

sapply(seq(min(x$Year), max(x$Year)-1), function(yr) 
 subset(x, (Year == yr & Month >= 4) , simplify=F)
# [[1]]
# Month Year Rainfall
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750
# 7 7 2010 0.1291590
# 8 8 2010 0.8334488
# 9 9 2010 0.4680185
# 10 10 2010 0.5499837
# 11 11 2010 0.5526741
# 12 12 2010 0.2388948
# 13 1 2011 0.7605133
# 14 2 2011 0.1808201
# 15 3 2011 0.4052822
# 16 4 2011 0.8535485
# 17 5 2011 0.9763985
# 18 6 2011 0.2258255
# 19 7 2011 0.4448092
# [[2]]
# Month Year Rainfall
# 16 4 2011 0.85354845
# 17 5 2011 0.97639849
# 18 6 2011 0.22582546
# 19 7 2011 0.44480923
# 20 8 2011 0.07497942
# 21 9 2011 0.66189876
# 22 10 2011 0.38754954
# 23 11 2011 0.83688918
# 24 12 2011 0.15050144
# 25 1 2012 0.34727225
# 26 2 2012 0.48877323
# 27 3 2012 0.14924686
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# [[3]]
# Month Year Rainfall
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# 32 8 2012 0.16464224
# 33 9 2012 0.81019214
# 34 10 2012 0.86886104
# 35 11 2012 0.51428176
# 36 12 2012 0.62719629
# 37 1 2013 0.84442900
# 38 2 2013 0.28487057
# 39 3 2013 0.66722565
# 40 4 2013 0.15046975
# 41 5 2013 0.98172786
# 42 6 2013 0.29701074
# 43 7 2013 0.11508408

Data:

set.seed(2)
years <- 4
x <- data.frame(
 Month = rep(1:12, times=years),
 Year = rep(2009 + seq_len(years), each=12),
 Rainfall = runif(12*years)
)
head(x)
# Month Year Rainfall
# 1 1 2010 0.1848823
# 2 2 2010 0.7023740
# 3 3 2010 0.5733263
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750

edited Nov 15 '18 at 21:10

answered Nov 15 '18 at 17:55

r2evans

27.9k33159

It perfectly worked thank you.

– Sonisa Sharma
Nov 15 '18 at 19:50

Month Year Rainfall 4 2010 484.6 5 2010 630.32 6 2010 35.31 7 2010 637.64 8 2010 238.57 9 2010 1129.35 10 2010 376.78 11 2010 282.78 12 2010 324.58 1 2011 338.6 2 2011 859.37 3 2011 66.24 4 2011 38.36 We should get 418 as average value but I am getting 369.12

– Sonisa Sharma
Nov 15 '18 at 20:20

That's an incomplete dataset, it does not span from 4/2010 to 7/2011. However, when I use that data, I get 418.6538. For future discussions, data like that does poorly in comments, please edit your question and put it there in an easily consumed format, such as the output from dput(x), dput(head(x,n=?)) (top ? rows if large), data.frame(...), or read.table(text="...", ...).

– r2evans
Nov 15 '18 at 21:13

If you mean that the missing months should count as 0, then ... you need to provide a usable example in your question that states that and includes some missingness in the data.

– r2evans
Nov 15 '18 at 21:14

Is there a way that I can attach the data?

– Sonisa Sharma
Nov 15 '18 at 21:33

|
show 4 more comments

Using my own fabricated data (below), here's a solution:

sapply(years, function(yr) (Year == yr+1 & Month <= 7))$Rainfall)
)
# [1] 0.5421714 0.4412616 0.4867803

(for 2010, 2011, and 2012, respectively).

This does not strictly check to ensure we have all months (including 4 and 7) in each range, that's a different discussion.

For explanation:

seq(min(x$Year), max(x$Year)-1): iterate by year from the first to the second-to-last (assuming contiguous years);

(Year == yr & Month >= 4): include all data that is in this year and at or after month 4, or ...

| (Year == yr+1 & Month <= 7): next year and month at/before 7.

from there, simply sum( subset(...)$Rainfall )

The mid-step looks like this (with my data):

sapply(seq(min(x$Year), max(x$Year)-1), function(yr) 
 subset(x, (Year == yr & Month >= 4) , simplify=F)
# [[1]]
# Month Year Rainfall
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750
# 7 7 2010 0.1291590
# 8 8 2010 0.8334488
# 9 9 2010 0.4680185
# 10 10 2010 0.5499837
# 11 11 2010 0.5526741
# 12 12 2010 0.2388948
# 13 1 2011 0.7605133
# 14 2 2011 0.1808201
# 15 3 2011 0.4052822
# 16 4 2011 0.8535485
# 17 5 2011 0.9763985
# 18 6 2011 0.2258255
# 19 7 2011 0.4448092
# [[2]]
# Month Year Rainfall
# 16 4 2011 0.85354845
# 17 5 2011 0.97639849
# 18 6 2011 0.22582546
# 19 7 2011 0.44480923
# 20 8 2011 0.07497942
# 21 9 2011 0.66189876
# 22 10 2011 0.38754954
# 23 11 2011 0.83688918
# 24 12 2011 0.15050144
# 25 1 2012 0.34727225
# 26 2 2012 0.48877323
# 27 3 2012 0.14924686
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# [[3]]
# Month Year Rainfall
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# 32 8 2012 0.16464224
# 33 9 2012 0.81019214
# 34 10 2012 0.86886104
# 35 11 2012 0.51428176
# 36 12 2012 0.62719629
# 37 1 2013 0.84442900
# 38 2 2013 0.28487057
# 39 3 2013 0.66722565
# 40 4 2013 0.15046975
# 41 5 2013 0.98172786
# 42 6 2013 0.29701074
# 43 7 2013 0.11508408

Data:

set.seed(2)
years <- 4
x <- data.frame(
 Month = rep(1:12, times=years),
 Year = rep(2009 + seq_len(years), each=12),
 Rainfall = runif(12*years)
)
head(x)
# Month Year Rainfall
# 1 1 2010 0.1848823
# 2 2 2010 0.7023740
# 3 3 2010 0.5733263
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750

edited Nov 15 '18 at 21:10

answered Nov 15 '18 at 17:55

r2evans

27.9k33159

Using my own fabricated data (below), here's a solution:

sapply(years, function(yr) (Year == yr+1 & Month <= 7))$Rainfall)
)
# [1] 0.5421714 0.4412616 0.4867803

(for 2010, 2011, and 2012, respectively).

This does not strictly check to ensure we have all months (including 4 and 7) in each range, that's a different discussion.

For explanation:

seq(min(x$Year), max(x$Year)-1): iterate by year from the first to the second-to-last (assuming contiguous years);

(Year == yr & Month >= 4): include all data that is in this year and at or after month 4, or ...

| (Year == yr+1 & Month <= 7): next year and month at/before 7.

from there, simply sum( subset(...)$Rainfall )

The mid-step looks like this (with my data):

sapply(seq(min(x$Year), max(x$Year)-1), function(yr) 
 subset(x, (Year == yr & Month >= 4) , simplify=F)
# [[1]]
# Month Year Rainfall
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750
# 7 7 2010 0.1291590
# 8 8 2010 0.8334488
# 9 9 2010 0.4680185
# 10 10 2010 0.5499837
# 11 11 2010 0.5526741
# 12 12 2010 0.2388948
# 13 1 2011 0.7605133
# 14 2 2011 0.1808201
# 15 3 2011 0.4052822
# 16 4 2011 0.8535485
# 17 5 2011 0.9763985
# 18 6 2011 0.2258255
# 19 7 2011 0.4448092
# [[2]]
# Month Year Rainfall
# 16 4 2011 0.85354845
# 17 5 2011 0.97639849
# 18 6 2011 0.22582546
# 19 7 2011 0.44480923
# 20 8 2011 0.07497942
# 21 9 2011 0.66189876
# 22 10 2011 0.38754954
# 23 11 2011 0.83688918
# 24 12 2011 0.15050144
# 25 1 2012 0.34727225
# 26 2 2012 0.48877323
# 27 3 2012 0.14924686
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# [[3]]
# Month Year Rainfall
# 28 4 2012 0.35706259
# 29 5 2012 0.96264405
# 30 6 2012 0.13237200
# 31 7 2012 0.01041453
# 32 8 2012 0.16464224
# 33 9 2012 0.81019214
# 34 10 2012 0.86886104
# 35 11 2012 0.51428176
# 36 12 2012 0.62719629
# 37 1 2013 0.84442900
# 38 2 2013 0.28487057
# 39 3 2013 0.66722565
# 40 4 2013 0.15046975
# 41 5 2013 0.98172786
# 42 6 2013 0.29701074
# 43 7 2013 0.11508408

Data:

set.seed(2)
years <- 4
x <- data.frame(
 Month = rep(1:12, times=years),
 Year = rep(2009 + seq_len(years), each=12),
 Rainfall = runif(12*years)
)
head(x)
# Month Year Rainfall
# 1 1 2010 0.1848823
# 2 2 2010 0.7023740
# 3 3 2010 0.5733263
# 4 4 2010 0.1680519
# 5 5 2010 0.9438393
# 6 6 2010 0.9434750

edited Nov 15 '18 at 21:10

answered Nov 15 '18 at 17:55

r2evans

27.9k33159

edited Nov 15 '18 at 21:10

answered Nov 15 '18 at 17:55

r2evans

27.9k33159

answered Nov 15 '18 at 17:55

r2evans

27.9k33159

answered Nov 15 '18 at 17:55

r2evans

27.9k33159

It perfectly worked thank you.

– Sonisa Sharma
Nov 15 '18 at 19:50

Month Year Rainfall 4 2010 484.6 5 2010 630.32 6 2010 35.31 7 2010 637.64 8 2010 238.57 9 2010 1129.35 10 2010 376.78 11 2010 282.78 12 2010 324.58 1 2011 338.6 2 2011 859.37 3 2011 66.24 4 2011 38.36 We should get 418 as average value but I am getting 369.12

– Sonisa Sharma
Nov 15 '18 at 20:20

That's an incomplete dataset, it does not span from 4/2010 to 7/2011. However, when I use that data, I get 418.6538. For future discussions, data like that does poorly in comments, please edit your question and put it there in an easily consumed format, such as the output from dput(x), dput(head(x,n=?)) (top ? rows if large), data.frame(...), or read.table(text="...", ...).

– r2evans
Nov 15 '18 at 21:13

If you mean that the missing months should count as 0, then ... you need to provide a usable example in your question that states that and includes some missingness in the data.

– r2evans
Nov 15 '18 at 21:14

Is there a way that I can attach the data?

– Sonisa Sharma
Nov 15 '18 at 21:33

|
show 4 more comments

It perfectly worked thank you.

– Sonisa Sharma
Nov 15 '18 at 19:50

Month Year Rainfall 4 2010 484.6 5 2010 630.32 6 2010 35.31 7 2010 637.64 8 2010 238.57 9 2010 1129.35 10 2010 376.78 11 2010 282.78 12 2010 324.58 1 2011 338.6 2 2011 859.37 3 2011 66.24 4 2011 38.36 We should get 418 as average value but I am getting 369.12

– Sonisa Sharma
Nov 15 '18 at 20:20

That's an incomplete dataset, it does not span from 4/2010 to 7/2011. However, when I use that data, I get 418.6538. For future discussions, data like that does poorly in comments, please edit your question and put it there in an easily consumed format, such as the output from dput(x), dput(head(x,n=?)) (top ? rows if large), data.frame(...), or read.table(text="...", ...).

– r2evans
Nov 15 '18 at 21:13

If you mean that the missing months should count as 0, then ... you need to provide a usable example in your question that states that and includes some missingness in the data.

– r2evans
Nov 15 '18 at 21:14

Is there a way that I can attach the data?

– Sonisa Sharma
Nov 15 '18 at 21:33

It perfectly worked thank you.

– Sonisa Sharma
Nov 15 '18 at 19:50

Month Year Rainfall 4 2010 484.6 5 2010 630.32 6 2010 35.31 7 2010 637.64 8 2010 238.57 9 2010 1129.35 10 2010 376.78 11 2010 282.78 12 2010 324.58 1 2011 338.6 2 2011 859.37 3 2011 66.24 4 2011 38.36 We should get 418 as average value but I am getting 369.12

– Sonisa Sharma
Nov 15 '18 at 20:20

That's an incomplete dataset, it does not span from 4/2010 to 7/2011. However, when I use that data, I get 418.6538. For future discussions, data like that does poorly in comments, please edit your question and put it there in an easily consumed format, such as the output from dput(x), dput(head(x,n=?)) (top ? rows if large), data.frame(...), or read.table(text="...", ...).

– r2evans
Nov 15 '18 at 21:13

If you mean that the missing months should count as 0, then ... you need to provide a usable example in your question that states that and includes some missingness in the data.

– r2evans
Nov 15 '18 at 21:14

Is there a way that I can attach the data?

– Sonisa Sharma
Nov 15 '18 at 21:33

|
show 4 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

W8YfD2R9G V,5ZfCz2,Zw urEifOZNaZmhRW,0kjqYHzn,S5qLBD4BJc18dzb4Qbsm2CPo1SwB 72HTKjBYKsuM

搜尋此網誌

Odtnhj