Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: add compression/decompression for messages #1149

Conversation

andyxning
Copy link
Member

@andyxning andyxning commented Mar 20, 2019

Fix #1148

This PR adds compression/decompression pipeline for nsqd server.

Update:

/cc @mreiferson @ploxiln

@andyxning andyxning changed the title add compression/decompression pipeline [WIP] add compression/decompression pipeline Mar 20, 2019
@andyxning
Copy link
Member Author

andyxning commented Mar 20, 2019

WIP waits for @ploxiln 's re-design about nsqd storage backend.

@andyxning andyxning force-pushed the add_compression/decompression_pipeline branch 4 times, most recently from 667c030 to c649166 Compare March 21, 2019 06:37
chanMsg.Timestamp = msg.Timestamp
chanMsg.deferred = msg.deferred
}
for _, channel := range chans {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly used to avoid the race condition between topic's PutMessage method which is detected by the go test -race.

}

msg.Body = adjustedBody
_, err = msg.WriteTo(buf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused why the changes here are necessary, the existing compression functionality already delivers compressed messages to consumers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the snappy or deflate compression is implemented by wraping the socket, i.e., nsqd will write original message over the wrapped socket and the compressed message will be actually be written to the wire. Reading is the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what we want to happen?

As we had discussed on #1148 (comment), I feel like the way to accomplish what you want is for your producers and consumers to compress data.

It probably makes sense to get on the same page about the feature and whether or not it belongs in NSQ core before we discuss code and implementation.

Copy link
Member Author

@andyxning andyxning Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. @mreiferson You're right. I have also consider this pattern about compressing message in producer and decompressing it in consumer. However, On this way we need to do:
1、pass the compression algorithm information to the consumer for it to determine how to decompress it. To accomplish this, we need to change the wire format currently used and also make it backward compatible. IMHO, it is difficult.
2、not all compression algorithm libs are supported by Golang/Python standard lib, for example snappy. In case to decompress the message correctly, consumer should know which compression algorithm the message use or the message can not be consumed correctly and will be requeued or even worse dropped directly once the max_tries is reached(I know that we can just print an error log and exit the consumer to indicate that something is wrong, but how about users set the max_tries to 1 and the message will be dropped next time it is consumed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How you compress the messages is entirely up to you. In most cases you control both the producer and consumer — is there a need for more than one compression type to be used?

If multiple compression formats do need to be supported, you could also build some identifier into the message structure to indicate to consumers how to decompress.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have to transition from uncompressed messages to compressed messages, one way to do it is to start a new topic for the compressed messages ... so if you have "events" you might create another topic "events_gzip" and during the transition consume from both. Messages from "events_gzip" are gzipped, in this hypothetical case. But pick whatever is supported by both producer and consumer.

This scheme is nice because nsqd does not have to compress for storage and then decompress for delivery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple compression formats do need to be supported, you could also build some identifier into the message structure to indicate to consumers how to decompress.

I know this. But not every case we both control producer and consumer nor every producer need add compression byte info into a message.

If you have to transition from uncompressed messages to compressed messages, one way to do it is to start a new topic for the compressed messages ... so if you have "events" you might create another topic "events_gzip" and during the transition consume from both. Messages from "events_gzip" are gzipped, in this hypothetical case. But pick whatever is supported by both producer and consumer.

Yes. I agree with this. But what i want to express is to do this transparently to end users(producers and consumers).

func DeflateDecompress(compressedMsg []byte) ([]byte, error) {
fr := flate.NewReader(bytes.NewReader(compressedMsg))
defer fr.Close()
body, err := ioutil.ReadAll(fr)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark

@andyxning andyxning force-pushed the add_compression/decompression_pipeline branch from 2bbd7c2 to a19f720 Compare March 28, 2019 04:24
fr := flate.NewReader(br)
defer fr.Close()
body, err := ioutil.ReadAll(fr)
if err != nil && !(err == io.ErrUnexpectedEOF && br.Len() == 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new way

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 👍

@andyxning
Copy link
Member Author

Let's wait for #1170 to be merged, then it will be easy to add this feature. :)

@mreiferson
Copy link
Member

mreiferson commented Jun 14, 2020

@andyxning sorry I've been away for so long!

Context: the intended use case of the existing compression features of nsqd can best be described as "proxy compression", similar to using something like nginx and enabling gzip compression.

I noticed that you had a design doc linked in the original issue description with the following motivation:

NSQ currently supports snappy compression and decompression for TCP-protocol based producer and consumer. But it has some shortcomings.

  1. the memory and disk capacity nsqd used can not be decreased since all messages are stored just like what nsqd receives, i.e., the original messages.
  2. the in-network bandwidth can not be decreased since the message are not compressed before it is send.
  3. they are just available for TCP protocol based producer and consumer.

I think we should discuss these individually:

  1. This seems like a reasonable request, at least for storage on disk, let's consider this for nsqd: refactor storage engine #1170

  2. When you publish messages via TCP protocol and enable compression on the connection, messages are compressed over the wire to nsqd. A reasonable addition could be to add Snappy and Deflate support specifically to HTTP publish methods.

  3. Seems like a dupe of (2), but there is no HTTP consumer, so it doesn't matter on the read side.

For this PR specifically, I would consider landing (2), i.e. just the changes to HTTP publish methods to support Content-Encoding.

What do you think @andyxning and @ploxiln?

@mreiferson mreiferson changed the title [WIP] add compression/decompression pipeline nsqd: add compression/decompression for messages Jun 14, 2020
@jehiah
Copy link
Member

jehiah commented Jun 14, 2020

👍 for adding Content-Encoding support on HTTP publish methods

@ploxiln
Copy link
Member

ploxiln commented Jun 14, 2020

👍 from me as well for HTTP Content-Encoding gzip/deflate (since those are already standard).

That probably won't satisfy @andyxning ... but I still think the way to go, when you really need good compression, is for the producer/consumer to do it. That way avoids nsqd doing extra compress and uncompress work for each message, and allows you to squeeze the most efficiency out of the system, if you need it.

@mreiferson
Copy link
Member

Closing this, opened #1313 to track compression over HTTP pub.

@mreiferson mreiferson closed this Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nsqd: enable compression/decompression pipeline from producer to consumer
5 participants