Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Comparison: 'extract' Function in 'raster' vs. 'terra' for Large Dataframes and Rasters #1584

Open
HassanMasoomi opened this issue Aug 13, 2024 · 0 comments

Comments

@HassanMasoomi
Copy link

When working with large rasters and large dataframes, the extract function from the raster library significantly outperforms the terra library. In the example below, raster is twice as fast as terra. As the raster size increases, this performance gap widens considerably. For the example below, with a raster on disk around 12 GB and a dataframe containing approximately 50 million locations, raster demonstrates notably superior speed. For a raster of 100 GB, this difference in performance could exceed ten fold.


# Load necessary libraries
library(terra)
library(raster)
library(microbenchmark)

# Function to generate a large artificial raster
create_large_raster <- function(filename, nrow, ncol) {
  # Create a raster with specified dimensions
  r <- rast(nrows = nrow, ncols = ncol, crs = "EPSG:4326")
  # Fill raster with random values
  values(r) <- runif(ncell(r))
  # Write raster to file
  writeRaster(r, filename, overwrite = TRUE)
}

# Function to generate a large artificial dataframe
create_large_dataframe <- function(n) {
  # Create a dataframe with random latitude and longitude
  lat <- runif(n, min = -90, max = 90)
  lon <- runif(n, min = -180, max = 180)
  df <- data.frame(lon = lon, lat = lat)
  return(df)
}

# Parameters for the large raster and dataframe
raster_filename <- "large_raster.tif"
raster_nrow <- 50000  # Example dimensions for the raster
raster_ncol <- 50000
num_locations <- 50000000  # 50 million locations

# Create the artificial raster and dataframe
create_large_raster(raster_filename, raster_nrow, raster_ncol)
large_dataframe <- create_large_dataframe(num_locations)

# Load raster and perform extraction using both libraries
extract_terra <- function(raster_file, loc_table) {
  r <- rast(raster_file)
  extracted_values <- extract(r, loc_table)
  return(extracted_values)
}

extract_raster <- function(raster_file, loc_table) {
  r <- raster(raster_file)
  extracted_values <- extract(r, loc_table)
  return(extracted_values)
}

# Benchmarking the performance
results <- microbenchmark(
  terra = extract_terra(raster_filename, large_dataframe),
  raster = extract_raster(raster_filename, large_dataframe),
  times = 2  # Reduce times for demonstration purposes
)

# Print benchmarking results
print(results)


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant