Reduce zstd memory footprint
v1.12.1 reduces global encoding buffer space allocated by the zstd library from 8MB per CPU to 128KB per CPU.
Also fixes a deadlock condition when calling Add*() and Send() concurrently.
This release requires v1.8.3 or later of github.com/klauspost/compress.