Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is mean not allowed with "na.rm=TRUE" and weights other than 0, 1, or NA #1218

Closed
see24 opened this issue Jun 30, 2023 · 11 comments
Closed

Comments

@see24
Copy link
Contributor

see24 commented Jun 30, 2023

In my example lets say I want to calculate the amount of the land area around a given cell that has a certain landcover class. I have set water to NA because it is not applicable to my calculation of the proportion of the land area. I would like to use a Gaussian filter such that further away cells get a lower weight.

library(terra)
#> terra 1.7.39

# make land cover class raster
r <- rast(nrows=36, ncols=36, xmin=0)
r[] <- 0
r[3:4, 5:8] <- 1
r[8:10, 2:6] <- 1
r[7:12, 22:36] <- 1
r[15:16, 18:29] <- 1

# Set an area to NA eg a lake
r[5:6, 4:7] <- NA

plot(r, colNA = "grey")

# focal window matrix
m <- focalMat(r, 3, type = "Gauss")

If I use focal “mean” it works as expected but everything near the water is
also NA which I don’t want so I need to set na.rm = TRUE

f_na <- focal(r, m, "mean", na.policy = "omit")

plot(f_na, colNA = "grey")

However, this leads to an error which recommends that I use the sum function

f_narm <- focal(r, m, "mean", na.rm = TRUE)
#> Error: [focal] with "na.rm=TRUE" and weights other than 0, 1, or NA, only fun="sum" is allowed

f_narm <- focal(r, m, "sum", na.rm = TRUE, na.policy = "omit")
plot(f_narm, colNA = "grey", type = "classes")

Using “sum” and na.rm = TRUE is equivalent to the NA area being 0 which is not
what I need. So I use a custom function which I believe is doing the same
thing “mean” with na.rm = TRUE would do if it was allowed.

mean_fun <- function(i){
  weighted.mean(i/m, m, na.rm = TRUE)
} 

f_narm2 <- focal(r, m, mean_fun, na.policy = "omit")
plot(f_narm2, colNA = "grey", type = "classes")

Assuming I understand the implications at the edge of the raster from setting na.rm = TRUE and that I have a reason for wanting to exclude NA cells from the calculation of the mean why is it not allowed?

Note: I have seen various questions on StackOverflow about using mean with a circle but in those cases the recomendation was to change to a 0/1 raster which doesn’t work for this case with a Gaussian filter

Created on 2023-06-30 with reprex v2.0.2

@rhijmans
Copy link
Member

The reason is that, when using non-binary weights, the values are multiplied with the weights before they are summarized. If you use na.rm=TRUE the weights should be adjusted to get a proper mean. I think your solution is good ( I am not sure why you do i/m

weighted.mean(i, m, na.rm = TRUE)

If "fun=mean" would be allowed you would get

mean(i * m, na.rm=TRUE)

instead of the weighted mean:

sum(i * m, na.rm=TRUE) / sum( m[ !is.na(i) ] )

That is, the weights would no longer add up to the same total; so if you had a lot of NAs your mean would be much too low. That is, of course, also true for the sum. You can compute the mean by dividing the output by the sum of the weights, but I do not think that is what you want.

Your solution should be fine. Given that that slows things down, I could perhaps implement a solution that adjusts the weights when using a non-binary weights matrix to compute the weighted mean.

@see24
Copy link
Contributor Author

see24 commented Jun 30, 2023

Thanks for the quick reply! It would be great if it could be included due to the speed cost of a custom function. I tried installing terra locally with just the error message commented out on line 29 of focal.R and it appeared to just work! At least with this toy example...

I am not sure why you do i/m

I did that because with out it I was getting really small numbers because it appears that the custom function is passed the values after they are multiplied by the weights

@ailich
Copy link
Contributor

ailich commented Jul 5, 2023

I believe some of the confusion here is related to that when the function "mean" is passed to focal the behavior is atypical (see #347). In general, the values are multiplied by the weights, and then the function is applied. If mean is used, the weighted mean is calculated. @see24, your custom function appears to provide correct results but it applies the weights twice and you can accidentally apply different weights for each step. I've included 3 versions of the mean_fun below. Your version is mean_fun1, Robert's is mean_fun2, and my version mean_fun3. Mine and Robert's produce the same result as yours but take a weights matrix of just 1's (w=3 makes a 3x3 matrix of ones). The weights are instead applied within the function itself. The reason you were getting small values using Robert's function is that you were supplying m as the weights matrix to focal when using mean_fun2 so the weights were being applied in two different steps. The difference in my version, mean_fun3 is that it explicitly takes the weights and na.rm arguments rather than relying on variable in your local environment that is called m. @rhijmans, overall, I don't see why weights other than 0, 1, or NA can't be used if na.rm=TRUE as I'd expect the procedure to be, extract values, multiply by weights to get new values, remove NA's in these modified values, and lastly perform function on the remaining values.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.38

r <- rast(matrix(data=c(1,1,1,1,0,1,1,1,1), nrow=3, ncol=3))
r[2]<- NA

v<- as.character(values(r))
v[is.na(v)]<- "NA"

plot(r, colNA="grey")
text(r, labels=v)

m<- focalMat(r, d=c(3,1), type = "Gauss")
m
#>           [,1]      [,2]      [,3]
#> [1,] 0.1069973 0.1131098 0.1069973
#> [2,] 0.1131098 0.1195715 0.1131098
#> [3,] 0.1069973 0.1131098 0.1069973

mean_fun1 <- function(i){
  weighted.mean(i/m, m, na.rm = TRUE)
}

mean_fun2 <- function(i){
  sum(i * m, na.rm=TRUE) / sum( m[ !is.na(i) ] )
}

mean_fun3 <- function(i, weights, na.rm){
  weighted.mean(i, w = weights, na.rm=na.rm)
} 

y1<- focal(r, m, fun=mean_fun1)

#Expected value of central cell
weighted.mean(values(r, mat=FALSE), w = as.vector(t(m)), na.rm = TRUE)
#> [1] 0.8651789
## Or manually with sum
sum((values(r, mat=FALSE) * as.vector(t(m))), na.rm = TRUE)/sum(as.vector(t(m))[!is.na(values(r, mat=FALSE))])
#> [1] 0.8651789


y1<- focal(r, m, fun=mean_fun1)
values(y1)[5]
#> [1] 0.8651789
plot(y1)
text(y1, digits=2)

y2<- focal(r, w=3, fun=mean_fun2)
values(y2)[5]
#> [1] 0.8651789
plot(y2)
text(y2, digits=2)

y3<- focal(r, w=3, fun=mean_fun3, weights = as.vector(m), na.rm=TRUE)
values(y3)[5] #Correct
#> [1] 0.8651789
plot(y3)
text(y3, digits=2)

Created on 2023-07-05 with reprex v2.0.2

@ailich
Copy link
Contributor

ailich commented Jul 5, 2023

@see24 for example here is the potential problem that can occur with the other 2 versions of the function if you're not careful.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.38

r <- rast(matrix(data=c(1,1,1,1,0,1,1,1,1), nrow=3, ncol=3))
r[2]<- NA

v<- as.character(values(r))
v[is.na(v)]<- "NA"

plot(r, colNA="grey")
text(r, labels=v)

m<- focalMat(r, d=c(3,1), type = "Gauss")

m2<- matrix(seq(0.1,0.9,by=0.1), nrow=3, ncol=3)

mean_fun1 <- function(i){
  weighted.mean(i/m, m, na.rm = TRUE)
}

mean_fun2 <- function(i){
  sum(i * m, na.rm=TRUE) / sum( m[ !is.na(i) ] )
} #mean fun 2 uses m which is only defined in local environment

mean_fun3 <- function(i, weights, na.rm){
  weighted.mean(i, w = weights, na.rm=na.rm)
} 

# This will multiply values by m2 to get i. Then the weighted mean of i/m is calcualed using m as the weights
y1<- focal(r, m2, fun=mean_fun1) 

# This will multiply values by m2 to get i and then calculate the weighted mean of i using weights m
y2<- focal(r, m2, fun=mean_fun2) 

# This extract values in the window and then assign them to i (focal weights matrix is only 1's). Then the weighted mean of i is calculated using weights m2
y3<- focal(r, w=3, fun=mean_fun3, weights=m2, na.rm=TRUE)

plot(y1)
text(y1, digits=2)

plot(y2)
text(y2, digits=2)

plot(y3)
text(y3, digits=2)

Created on 2023-07-05 with reprex v2.0.2

@see24
Copy link
Contributor Author

see24 commented Jul 6, 2023

@ailich yes ideally the weights that are passed as the w argument to focal could be available in the custom function, then you would be able to use them correctly without having to supply a dummy window and additional weights argument. In general it would be great if there was a bit more detail in the documentation about how the custom function should work. Specifically, it could explain that the values passed to the function are the raster values multiplied by the window weights.

Using the "mean" fun does give the same answer if you remove the error message as in my PR.

remotes::install_github("see24/terra")

library(terra)
#> terra 1.7.40

r <- rast(matrix(data=c(1,1,1,1,0,1,1,1,1), nrow=3, ncol=3))
r[2]<- NA

v<- as.character(values(r))
v[is.na(v)]<- "NA"

plot(r, colNA="grey")
text(r, labels=v)

m<- focalMat(r, d=c(3,1), type = "Gauss")

y1<- focal(r, m, fun="mean", na.rm = TRUE)

plot(y1)
text(y1, digits=2)

Created on 2023-07-06 with reprex v2.0.2

@ailich
Copy link
Contributor

ailich commented Jul 6, 2023

@see24, cool! I commented out that line that purposefully throws an error as in your PR and I am also seeing that the "mean" function coded as is already accounts for the presence of NA's when summing the weights which is what I believe @rhijmans was concerned about when he said "the weights would no longer add up to the same total." I also agree that the weights could be described a bit more clearly in the Details section than they are at present. Perhaps we could contribute via a pull request to help with that once this is resolved.

library(terra)
#> terra 1.7.41

mean_fun3 <- function(i, weights, na.rm){
  weighted.mean(i, w = weights, na.rm=na.rm)
} 

r <- rast(matrix(data=c(1,1,1,1,0,1,1,1,1), nrow=3, ncol=3))
r[2]<- NA

v<- as.character(values(r))
v[is.na(v)]<- "NA"

plot(r, colNA="grey")
text(r, labels=v)

m<- focalMat(r, d=c(3,1), type = "Gauss")

y3<- focal(r, w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm=TRUE)

y4<- focal(r, m, fun=mean, na.rm=TRUE)

all.equal(values(y3, mat=FALSE),values(y4, mat=FALSE)) #Results are identical
#> [1] TRUE


i<- values(r, mat=FALSE)
w<- as.vector(t(m))

sum(i * w, na.rm=TRUE) / sum(w[ !is.na(i) ]) # Sum of weights accounts for presence of NA's
#> [1] 0.8651789
sum(i * w, na.rm=TRUE) / sum(w) # Sum of weights does not account for presence of NA's
#> [1] 0.7673187

y4[5] # Using mean already accounts for presence of NA's when calculating the weighted mean
#>   focal_mean
#> 1  0.8651789

Created on 2023-07-06 with reprex v2.0.2

see24 added a commit to see24/terra that referenced this issue Jul 17, 2023
error with na.policy = "omit" and multilayer SpatRaster bc it was reading all layer values in the checkNA step
@see24
Copy link
Contributor Author

see24 commented Jul 17, 2023

I discovered a bug when trying to use @ailich's function as a stop gap for users with an old version of terra. I made a new PR with a proposed fix.

This example should reproduce it:

library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.38

mat <- matrix(c(rep(1,5000), rep(2, 5000)), nrow = 100, ncol = 100)
exps <- terra::rast(mat, crs = "EPSG:5070")
exps[55:60, 20:80] <- NA
r <- terra::segregate(exps)


plot(r, colNA="grey")
text(r)


m<- focalMat(r, d=c(3,1), type = "Gauss")

mean_fun3 <- function(i, weights, na.rm){
  weighted.mean(i, w = weights, na.rm=na.rm)
} 

y3<- focal(r, w=3, fun=mean_fun3, weights = as.vector(m), na.rm = TRUE, na.policy = "omit")

It looks like there was just one spot where it was reading in values for all layers instead of just one.

@ailich
Copy link
Contributor

ailich commented Jul 17, 2023

Yeah, @see24 @rhijmans, there is definitely a bit of weirdness happening here. mean_fun1 and mean_fun2 will produce errors for all scenarios, but mean_fun3 only produces an error with multilayer SpatRasters if na.policy=omit or na.policy=only.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.39

mat <- matrix(c(rep(1,5000), rep(2, 5000)), nrow = 100, ncol = 100)
exps <- terra::rast(mat, crs = "EPSG:5070")
exps[55:60, 20:80] <- NA
r <- terra::segregate(exps)

plot(r, colNA="red")

m<- focalMat(r, d=c(3,1), type = "Gauss")

mean_fun1 <- function(i){
  weighted.mean(i/m, m, na.rm = TRUE)
}

mean_fun2 <- function(i){
  sum(i * m, na.rm=TRUE) / sum( m[ !is.na(i) ] )
} #mean fun 2 uses m which is only defined in local environment

mean_fun3 <- function(i, weights, na.rm){
  weighted.mean(i, w = weights, na.rm=na.rm)
} 


# mean_fun1
focal(r, w=m, fun=mean_fun1, na.rm = TRUE, na.policy = "omit")
#> Error in txtfun != "sum": comparison (!=) is possible only for atomic and list types
focal(r[[1]], w=m, fun=mean_fun1, na.rm = TRUE, na.policy = "omit") #only first layer
#> Error in txtfun != "sum": comparison (!=) is possible only for atomic and list types
focal(r, w=m, fun=mean_fun1, na.rm = TRUE, na.policy = "all") #na.policy=all
#> Error in txtfun != "sum": comparison (!=) is possible only for atomic and list types

#mean_fun2
focal(r, w=3, fun=mean_fun2, na.rm = TRUE, na.policy = "omit")
#> Error: [focal] test failed
focal(r[[1]], w=3, fun=mean_fun2, na.rm = TRUE, na.policy = "omit") #only first layer
#> Error: [focal] test failed
focal(r, w=3, fun=mean_fun2, na.rm = TRUE, na.policy = "all") #na.policy=all
#> Error: [focal] test failed

#mean_fun3
focal(r, w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm = TRUE, na.policy = "omit")
#> Error: [writeValues] too many values for writing: 40000 > 20000
focal(r[[1]], w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm = TRUE, na.policy = "omit") #only first layer
#> class       : SpatRaster 
#> dimensions  : 100, 100, 1  (nrow, ncol, nlyr)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 100, 0, 100  (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / Conus Albers (EPSG:5070) 
#> source(s)   : memory
#> name        : 1 
#> min value   : 0 
#> max value   : 1
focal(r, w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm = TRUE, na.policy = "all") #na.policy=all
#> class       : SpatRaster 
#> dimensions  : 100, 100, 2  (nrow, ncol, nlyr)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 100, 0, 100  (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / Conus Albers (EPSG:5070) 
#> source(s)   : memory
#> names       : 1, 2 
#> min values  : 0, 0 
#> max values  : 1, 1
focal(r, w=3, fun=mean_fun3, weights= as.vector(t(m)), na.rm = TRUE, na.policy = "only") #na.policy=only
#> Error: [writeValues] too many values for writing: 40000 > 20000

Created on 2023-07-17 with reprex v2.0.2

@see24
Copy link
Contributor Author

see24 commented Jul 17, 2023

Yes the PR I made addresses the writeValues error but not the txtfun one. Since the test of txtfun != "sum" is only needed to produce the error message that we previously commented out it can probably just be removed. The focal test failed error is because the function test run doesn't work. I believe this is because there is an m object created in the focal function that overwrites the global one, so just that they are poorly written functions not really a terra issue.

rhijmans added a commit that referenced this issue Aug 24, 2023
@rhijmans
Copy link
Member

Many thanks!

@ailich
Copy link
Contributor

ailich commented Aug 24, 2023

@rhijmans @see24, I've also put in a pull request to clarify the documentation of w in focal.

#1263

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants