-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is mean not allowed with "na.rm=TRUE" and weights other than 0, 1, or NA #1218
Comments
The reason is that, when using non-binary weights, the values are multiplied with the weights before they are summarized. If you use na.rm=TRUE the weights should be adjusted to get a proper mean. I think your solution is good ( I am not sure why you do
If "fun=mean" would be allowed you would get
instead of the weighted mean:
That is, the weights would no longer add up to the same total; so if you had a lot of NAs your mean would be much too low. That is, of course, also true for the sum. You can compute the mean by dividing the output by the sum of the weights, but I do not think that is what you want. Your solution should be fine. Given that that slows things down, I could perhaps implement a solution that adjusts the weights when using a non-binary weights matrix to compute the weighted mean. |
Thanks for the quick reply! It would be great if it could be included due to the speed cost of a custom function. I tried installing terra locally with just the error message commented out on line 29 of focal.R and it appeared to just work! At least with this toy example...
I did that because with out it I was getting really small numbers because it appears that the custom function is passed the values after they are multiplied by the weights |
I believe some of the confusion here is related to that when the function "mean" is passed to library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.38
r <- rast(matrix(data=c(1,1,1,1,0,1,1,1,1), nrow=3, ncol=3))
r[2]<- NA
v<- as.character(values(r))
v[is.na(v)]<- "NA"
plot(r, colNA="grey")
text(r, labels=v) m<- focalMat(r, d=c(3,1), type = "Gauss")
m
#> [,1] [,2] [,3]
#> [1,] 0.1069973 0.1131098 0.1069973
#> [2,] 0.1131098 0.1195715 0.1131098
#> [3,] 0.1069973 0.1131098 0.1069973
mean_fun1 <- function(i){
weighted.mean(i/m, m, na.rm = TRUE)
}
mean_fun2 <- function(i){
sum(i * m, na.rm=TRUE) / sum( m[ !is.na(i) ] )
}
mean_fun3 <- function(i, weights, na.rm){
weighted.mean(i, w = weights, na.rm=na.rm)
}
y1<- focal(r, m, fun=mean_fun1)
#Expected value of central cell
weighted.mean(values(r, mat=FALSE), w = as.vector(t(m)), na.rm = TRUE)
#> [1] 0.8651789
## Or manually with sum
sum((values(r, mat=FALSE) * as.vector(t(m))), na.rm = TRUE)/sum(as.vector(t(m))[!is.na(values(r, mat=FALSE))])
#> [1] 0.8651789
y1<- focal(r, m, fun=mean_fun1)
values(y1)[5]
#> [1] 0.8651789
plot(y1)
text(y1, digits=2) y2<- focal(r, w=3, fun=mean_fun2)
values(y2)[5]
#> [1] 0.8651789
plot(y2)
text(y2, digits=2) y3<- focal(r, w=3, fun=mean_fun3, weights = as.vector(m), na.rm=TRUE)
values(y3)[5] #Correct
#> [1] 0.8651789
plot(y3)
text(y3, digits=2) Created on 2023-07-05 with reprex v2.0.2 |
@see24 for example here is the potential problem that can occur with the other 2 versions of the function if you're not careful. library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.38
r <- rast(matrix(data=c(1,1,1,1,0,1,1,1,1), nrow=3, ncol=3))
r[2]<- NA
v<- as.character(values(r))
v[is.na(v)]<- "NA"
plot(r, colNA="grey")
text(r, labels=v) m<- focalMat(r, d=c(3,1), type = "Gauss")
m2<- matrix(seq(0.1,0.9,by=0.1), nrow=3, ncol=3)
mean_fun1 <- function(i){
weighted.mean(i/m, m, na.rm = TRUE)
}
mean_fun2 <- function(i){
sum(i * m, na.rm=TRUE) / sum( m[ !is.na(i) ] )
} #mean fun 2 uses m which is only defined in local environment
mean_fun3 <- function(i, weights, na.rm){
weighted.mean(i, w = weights, na.rm=na.rm)
}
# This will multiply values by m2 to get i. Then the weighted mean of i/m is calcualed using m as the weights
y1<- focal(r, m2, fun=mean_fun1)
# This will multiply values by m2 to get i and then calculate the weighted mean of i using weights m
y2<- focal(r, m2, fun=mean_fun2)
# This extract values in the window and then assign them to i (focal weights matrix is only 1's). Then the weighted mean of i is calculated using weights m2
y3<- focal(r, w=3, fun=mean_fun3, weights=m2, na.rm=TRUE)
plot(y1)
text(y1, digits=2) plot(y2)
text(y2, digits=2) plot(y3)
text(y3, digits=2) Created on 2023-07-05 with reprex v2.0.2 |
@ailich yes ideally the weights that are passed as the w argument to focal could be available in the custom function, then you would be able to use them correctly without having to supply a dummy window and additional weights argument. In general it would be great if there was a bit more detail in the documentation about how the custom function should work. Specifically, it could explain that the values passed to the function are the raster values multiplied by the window weights. Using the "mean" fun does give the same answer if you remove the error message as in my PR. remotes::install_github("see24/terra")
library(terra)
#> terra 1.7.40
r <- rast(matrix(data=c(1,1,1,1,0,1,1,1,1), nrow=3, ncol=3))
r[2]<- NA
v<- as.character(values(r))
v[is.na(v)]<- "NA"
plot(r, colNA="grey")
text(r, labels=v) m<- focalMat(r, d=c(3,1), type = "Gauss")
y1<- focal(r, m, fun="mean", na.rm = TRUE)
plot(y1)
text(y1, digits=2) Created on 2023-07-06 with reprex v2.0.2 |
@see24, cool! I commented out that line that purposefully throws an error as in your PR and I am also seeing that the "mean" function coded as is already accounts for the presence of NA's when summing the weights which is what I believe @rhijmans was concerned about when he said "the weights would no longer add up to the same total." I also agree that the weights could be described a bit more clearly in the Details section than they are at present. Perhaps we could contribute via a pull request to help with that once this is resolved. library(terra)
#> terra 1.7.41
mean_fun3 <- function(i, weights, na.rm){
weighted.mean(i, w = weights, na.rm=na.rm)
}
r <- rast(matrix(data=c(1,1,1,1,0,1,1,1,1), nrow=3, ncol=3))
r[2]<- NA
v<- as.character(values(r))
v[is.na(v)]<- "NA"
plot(r, colNA="grey")
text(r, labels=v) m<- focalMat(r, d=c(3,1), type = "Gauss")
y3<- focal(r, w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm=TRUE)
y4<- focal(r, m, fun=mean, na.rm=TRUE)
all.equal(values(y3, mat=FALSE),values(y4, mat=FALSE)) #Results are identical
#> [1] TRUE
i<- values(r, mat=FALSE)
w<- as.vector(t(m))
sum(i * w, na.rm=TRUE) / sum(w[ !is.na(i) ]) # Sum of weights accounts for presence of NA's
#> [1] 0.8651789
sum(i * w, na.rm=TRUE) / sum(w) # Sum of weights does not account for presence of NA's
#> [1] 0.7673187
y4[5] # Using mean already accounts for presence of NA's when calculating the weighted mean
#> focal_mean
#> 1 0.8651789 Created on 2023-07-06 with reprex v2.0.2 |
error with na.policy = "omit" and multilayer SpatRaster bc it was reading all layer values in the checkNA step
I discovered a bug when trying to use @ailich's function as a stop gap for users with an old version of terra. I made a new PR with a proposed fix. This example should reproduce it: library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.38
mat <- matrix(c(rep(1,5000), rep(2, 5000)), nrow = 100, ncol = 100)
exps <- terra::rast(mat, crs = "EPSG:5070")
exps[55:60, 20:80] <- NA
r <- terra::segregate(exps)
plot(r, colNA="grey")
text(r)
m<- focalMat(r, d=c(3,1), type = "Gauss")
mean_fun3 <- function(i, weights, na.rm){
weighted.mean(i, w = weights, na.rm=na.rm)
}
y3<- focal(r, w=3, fun=mean_fun3, weights = as.vector(m), na.rm = TRUE, na.policy = "omit") It looks like there was just one spot where it was reading in values for all layers instead of just one. |
Yeah, @see24 @rhijmans, there is definitely a bit of weirdness happening here. library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.39
mat <- matrix(c(rep(1,5000), rep(2, 5000)), nrow = 100, ncol = 100)
exps <- terra::rast(mat, crs = "EPSG:5070")
exps[55:60, 20:80] <- NA
r <- terra::segregate(exps)
plot(r, colNA="red") m<- focalMat(r, d=c(3,1), type = "Gauss")
mean_fun1 <- function(i){
weighted.mean(i/m, m, na.rm = TRUE)
}
mean_fun2 <- function(i){
sum(i * m, na.rm=TRUE) / sum( m[ !is.na(i) ] )
} #mean fun 2 uses m which is only defined in local environment
mean_fun3 <- function(i, weights, na.rm){
weighted.mean(i, w = weights, na.rm=na.rm)
}
# mean_fun1
focal(r, w=m, fun=mean_fun1, na.rm = TRUE, na.policy = "omit")
#> Error in txtfun != "sum": comparison (!=) is possible only for atomic and list types
focal(r[[1]], w=m, fun=mean_fun1, na.rm = TRUE, na.policy = "omit") #only first layer
#> Error in txtfun != "sum": comparison (!=) is possible only for atomic and list types
focal(r, w=m, fun=mean_fun1, na.rm = TRUE, na.policy = "all") #na.policy=all
#> Error in txtfun != "sum": comparison (!=) is possible only for atomic and list types
#mean_fun2
focal(r, w=3, fun=mean_fun2, na.rm = TRUE, na.policy = "omit")
#> Error: [focal] test failed
focal(r[[1]], w=3, fun=mean_fun2, na.rm = TRUE, na.policy = "omit") #only first layer
#> Error: [focal] test failed
focal(r, w=3, fun=mean_fun2, na.rm = TRUE, na.policy = "all") #na.policy=all
#> Error: [focal] test failed
#mean_fun3
focal(r, w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm = TRUE, na.policy = "omit")
#> Error: [writeValues] too many values for writing: 40000 > 20000
focal(r[[1]], w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm = TRUE, na.policy = "omit") #only first layer
#> class : SpatRaster
#> dimensions : 100, 100, 1 (nrow, ncol, nlyr)
#> resolution : 1, 1 (x, y)
#> extent : 0, 100, 0, 100 (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / Conus Albers (EPSG:5070)
#> source(s) : memory
#> name : 1
#> min value : 0
#> max value : 1
focal(r, w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm = TRUE, na.policy = "all") #na.policy=all
#> class : SpatRaster
#> dimensions : 100, 100, 2 (nrow, ncol, nlyr)
#> resolution : 1, 1 (x, y)
#> extent : 0, 100, 0, 100 (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / Conus Albers (EPSG:5070)
#> source(s) : memory
#> names : 1, 2
#> min values : 0, 0
#> max values : 1, 1
focal(r, w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm = TRUE, na.policy = "only") #na.policy=only
#> Error: [writeValues] too many values for writing: 40000 > 20000 Created on 2023-07-17 with reprex v2.0.2 |
Yes the PR I made addresses the writeValues error but not the txtfun one. Since the test of |
Many thanks! |
In my example lets say I want to calculate the amount of the land area around a given cell that has a certain landcover class. I have set water to NA because it is not applicable to my calculation of the proportion of the land area. I would like to use a Gaussian filter such that further away cells get a lower weight.
If I use focal “mean” it works as expected but everything near the water is
also NA which I don’t want so I need to set
na.rm = TRUE
However, this leads to an error which recommends that I use the sum function
Using “sum” and
na.rm = TRUE
is equivalent to the NA area being 0 which is notwhat I need. So I use a custom function which I believe is doing the same
thing “mean” with
na.rm = TRUE
would do if it was allowed.Assuming I understand the implications at the edge of the raster from setting
na.rm = TRUE
and that I have a reason for wanting to exclude NA cells from the calculation of the mean why is it not allowed?Note: I have seen various questions on StackOverflow about using mean with a circle but in those cases the recomendation was to change to a 0/1 raster which doesn’t work for this case with a Gaussian filter
Created on 2023-06-30 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: