nsqd: add compression/decompression for messages #1149

andyxning · 2019-03-20T07:48:59Z

Fix #1148

This PR adds compression/decompression pipeline for nsqd server.

Update:

add proposal: https://docs.google.com/document/d/107yDH6pN8-b_i22AfvPkzdDU2yneOC-1lcUllpUmN4w/edit#heading=h.j5jbn91ci2bi

/cc @mreiferson @ploxiln

andyxning · 2019-03-20T07:49:58Z

WIP waits for @ploxiln 's re-design about nsqd storage backend.

andyxning · 2019-03-21T06:50:14Z

nsqd/topic.go

-				chanMsg.Timestamp = msg.Timestamp
-				chanMsg.deferred = msg.deferred
-			}
+		for _, channel := range chans {


This is mainly used to avoid the race condition between topic's PutMessage method which is detected by the go test -race.

mreiferson · 2019-03-26T13:12:10Z

nsqd/protocol_v2.go

+	}
+
+	msg.Body = adjustedBody
+	_, err = msg.WriteTo(buf)


I'm still confused why the changes here are necessary, the existing compression functionality already delivers compressed messages to consumers?

Because the snappy or deflate compression is implemented by wraping the socket, i.e., nsqd will write original message over the wrapped socket and the compressed message will be actually be written to the wire. Reading is the same.

Isn't that what we want to happen?

As we had discussed on #1148 (comment), I feel like the way to accomplish what you want is for your producers and consumers to compress data.

It probably makes sense to get on the same page about the feature and whether or not it belongs in NSQ core before we discuss code and implementation.

Yes. @mreiferson You're right. I have also consider this pattern about compressing message in producer and decompressing it in consumer. However, On this way we need to do:
1、pass the compression algorithm information to the consumer for it to determine how to decompress it. To accomplish this, we need to change the wire format currently used and also make it backward compatible. IMHO, it is difficult.
2、not all compression algorithm libs are supported by Golang/Python standard lib, for example snappy. In case to decompress the message correctly, consumer should know which compression algorithm the message use or the message can not be consumed correctly and will be requeued or even worse dropped directly once the max_tries is reached(I know that we can just print an error log and exit the consumer to indicate that something is wrong, but how about users set the max_tries to 1 and the message will be dropped next time it is consumed).

How you compress the messages is entirely up to you. In most cases you control both the producer and consumer — is there a need for more than one compression type to be used?

If multiple compression formats do need to be supported, you could also build some identifier into the message structure to indicate to consumers how to decompress.

If you have to transition from uncompressed messages to compressed messages, one way to do it is to start a new topic for the compressed messages ... so if you have "events" you might create another topic "events_gzip" and during the transition consume from both. Messages from "events_gzip" are gzipped, in this hypothetical case. But pick whatever is supported by both producer and consumer.

This scheme is nice because nsqd does not have to compress for storage and then decompress for delivery.

If multiple compression formats do need to be supported, you could also build some identifier into the message structure to indicate to consumers how to decompress.

I know this. But not every case we both control producer and consumer nor every producer need add compression byte info into a message.

If you have to transition from uncompressed messages to compressed messages, one way to do it is to start a new topic for the compressed messages ... so if you have "events" you might create another topic "events_gzip" and during the transition consume from both. Messages from "events_gzip" are gzipped, in this hypothetical case. But pick whatever is supported by both producer and consumer.

Yes. I agree with this. But what i want to express is to do this transparently to end users(producers and consumers).

andyxning · 2019-03-28T03:29:32Z

internal/compress/compress.go

+func DeflateDecompress(compressedMsg []byte) ([]byte, error) {
+	fr := flate.NewReader(bytes.NewReader(compressedMsg))
+	defer fr.Close()
+	body, err := ioutil.ReadAll(fr)


andyxning · 2019-03-28T04:25:14Z

internal/compress/compress.go

+	fr := flate.NewReader(br)
+	defer fr.Close()
+	body, err := ioutil.ReadAll(fr)
+	if err != nil && !(err == io.ErrUnexpectedEOF && br.Len() == 0) {


Great! 👍

andyxning · 2019-07-28T09:54:42Z

Let's wait for #1170 to be merged, then it will be easy to add this feature. :)

mreiferson · 2020-06-14T01:58:17Z

@andyxning sorry I've been away for so long!

Context: the intended use case of the existing compression features of nsqd can best be described as "proxy compression", similar to using something like nginx and enabling gzip compression.

I noticed that you had a design doc linked in the original issue description with the following motivation:

NSQ currently supports snappy compression and decompression for TCP-protocol based producer and consumer. But it has some shortcomings.

the memory and disk capacity nsqd used can not be decreased since all messages are stored just like what nsqd receives, i.e., the original messages.

the in-network bandwidth can not be decreased since the message are not compressed before it is send.

they are just available for TCP protocol based producer and consumer.

I think we should discuss these individually:

This seems like a reasonable request, at least for storage on disk, let's consider this for nsqd: refactor storage engine #1170
When you publish messages via TCP protocol and enable compression on the connection, messages are compressed over the wire to nsqd. A reasonable addition could be to add Snappy and Deflate support specifically to HTTP publish methods.
Seems like a dupe of (2), but there is no HTTP consumer, so it doesn't matter on the read side.

For this PR specifically, I would consider landing (2), i.e. just the changes to HTTP publish methods to support Content-Encoding.

What do you think @andyxning and @ploxiln?

jehiah · 2020-06-14T03:05:04Z

👍 for adding Content-Encoding support on HTTP publish methods

ploxiln · 2020-06-14T23:28:59Z

👍 from me as well for HTTP Content-Encoding gzip/deflate (since those are already standard).

That probably won't satisfy @andyxning ... but I still think the way to go, when you really need good compression, is for the producer/consumer to do it. That way avoids nsqd doing extra compress and uncompress work for each message, and allows you to squeeze the most efficiency out of the system, if you need it.

mreiferson · 2020-12-29T06:02:35Z

Closing this, opened #1313 to track compression over HTTP pub.

andyxning changed the title ~~add compression/decompression pipeline~~ [WIP] add compression/decompression pipeline Mar 20, 2019

andyxning force-pushed the add_compression/decompression_pipeline branch 4 times, most recently from 667c030 to c649166 Compare March 21, 2019 06:37

andyxning commented Mar 21, 2019

View reviewed changes

andyxning force-pushed the add_compression/decompression_pipeline branch from c649166 to 2bbd7c2 Compare March 25, 2019 07:23

mreiferson mentioned this pull request Mar 26, 2019

nsqd: enable compression/decompression pipeline from producer to consumer #1148

Closed

mreiferson reviewed Mar 26, 2019

View reviewed changes

andyxning commented Mar 28, 2019

View reviewed changes

add compression/decompression pipeline

a19f720

andyxning force-pushed the add_compression/decompression_pipeline branch from 2bbd7c2 to a19f720 Compare March 28, 2019 04:24

andyxning commented Mar 28, 2019

View reviewed changes

mreiferson changed the title ~~[WIP] add compression/decompression pipeline~~ nsqd: add compression/decompression for messages Jun 14, 2020

mreiferson added the feature label Jun 14, 2020

mreiferson mentioned this pull request Jun 14, 2020

Did i use snappy=true to use compress? But no effects #1214

Closed

mreiferson mentioned this pull request Dec 29, 2020

nsqd: support Snappy/Deflate for HTTP publishing #1313

Open

mreiferson closed this Dec 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsqd: add compression/decompression for messages #1149

nsqd: add compression/decompression for messages #1149

andyxning commented Mar 20, 2019 •

edited

Loading

andyxning commented Mar 20, 2019 •

edited

Loading

andyxning Mar 21, 2019

mreiferson Mar 26, 2019

andyxning Mar 26, 2019

mreiferson Mar 28, 2019

andyxning Mar 29, 2019 •

edited

Loading

mreiferson Mar 29, 2019

ploxiln Mar 30, 2019

andyxning Mar 31, 2019

andyxning Mar 28, 2019

andyxning Mar 28, 2019

waruqi Jun 13, 2019

andyxning commented Jul 28, 2019

mreiferson commented Jun 14, 2020 •

edited

Loading

jehiah commented Jun 14, 2020

ploxiln commented Jun 14, 2020

mreiferson commented Dec 29, 2020

nsqd: add compression/decompression for messages #1149

nsqd: add compression/decompression for messages #1149

Conversation

andyxning commented Mar 20, 2019 • edited Loading

andyxning commented Mar 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyxning Mar 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyxning commented Jul 28, 2019

mreiferson commented Jun 14, 2020 • edited Loading

jehiah commented Jun 14, 2020

ploxiln commented Jun 14, 2020

mreiferson commented Dec 29, 2020

andyxning commented Mar 20, 2019 •

edited

Loading

andyxning commented Mar 20, 2019 •

edited

Loading

andyxning Mar 29, 2019 •

edited

Loading

mreiferson commented Jun 14, 2020 •

edited

Loading