-
Hi, opening remote slides (like on s3) sounds a cool feature. In an old discussion (openslide/openslide.github.io#11 (comment)), it is stated that using s3 somehow limits reading slides to sequential access. Anyway it seems Range Header is supported by s3, allowing to randomly access to a file. Does anyone have some experience in randomly reading regions of slides stored on s3? What about the performance? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @mdrio This is fully supported for tiffslide and not limited to sequential access as with fuse mounting mentioned on the open slide thread. Performance wise, for random sampling, it's best to sample along tile boundaries in multiples of the file internal tile sizes, to avoid having to load a lot of data just to throw it away. Theoretically throughput on ec2 should be possible to scale up to 100Gbit / sec from s3. Whenever time allows I'm working on setting up some benchmarks for this explicitly. Might become available in the near term future. You might want to checkout www.github.com/bayer-group/pado if you need a data loader for pathology image datasets that works with cloud native image locations too. Cheers, |
Beta Was this translation helpful? Give feedback.
Hi @mdrio
This is fully supported for tiffslide and not limited to sequential access as with fuse mounting mentioned on the open slide thread.
Performance wise, for random sampling, it's best to sample along tile boundaries in multiples of the file internal tile sizes, to avoid having to load a lot of data just to throw it away.
Theoretically throughput on ec2 should be possible to scale up to 100Gbit / sec from s3. Whenever time allows I'm working on setting up some benchmarks for this explicitly. Might become available in the near term future.
You might want to checkout www.github.com/bayer-group/pado if you need a data loader for pathology image datasets that works with cloud native im…