From 017688cd0940b72c094a02a8c36a077f7cb24c33 Mon Sep 17 00:00:00 2001 From: Renning Bruns Date: Fri, 14 Jul 2023 08:51:06 -0700 Subject: [PATCH] fix: Redis scan batch size increase (#794) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Which problem is this PR solving? This week, we've bumped up our number of refinery pods from 250 to 300. After that change, we've noticed a lot more timeouts from redis. ``` time="2023-07-13T17:49:28Z" level=error msg="redis scan encountered an error" err="redis scan timeout" keys_returned=92 timeoutSec=5s time="2023-07-13T18:39:41Z" level=error msg="get members failed during watch" error="context deadline exceeded" name="http://host:8081/" oldPeers="[list_of_the_300_hosts:8081]" timeout=5s ``` After digging into some of the code, we've noticed that peers are stored in redis and are retrieved in batches of 10. Because there are 300 nodes, each node is making 30 requests to redis to get all the peers, and doing it twice, so 60 total. This is done every 5 seconds. So our redis instance is handling 18000 requests every 5 seconds. ## Short description of the changes After a little bit of napkin math, with 300 nodes and an average payload size of 26 bytes (that's what we see reported by `oldPeers` in our logs, anyway), redis has within it about 7.8kb of data. That should be very easily retrieved in a single request and doesn't need to be broken down in batches of 10. This PR proposes bumping that up to 1000. That would give us a total payload of about 26kb. For 1000 nodes, that would also only yield 2000 requests to redis every 5 seconds (total 72mb bandwidth for those 5 seconds, too). Leaving it as is would require 200,000 requests to redis over 5 seconds for 1000 nodes. Getting a little extreme, for 1M nodes, that would give total payload of about 26MB and yield 2000 requests per node, so 2B requests to redis every 5 seconds for a total of 72GB data transferred (1/1000 of the total payload 26MB, 2000 times). This might kill the redis, but the data is still "pretty small". Fixes #793 --------- Co-authored-by: Kent Quirk Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com> --- internal/redimem/redimem.go | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/internal/redimem/redimem.go b/internal/redimem/redimem.go index 3dd90a6ce6..3b0db427b4 100644 --- a/internal/redimem/redimem.go +++ b/internal/redimem/redimem.go @@ -35,6 +35,9 @@ const ( // redisScanTimeout indicates how long to attempt to scan for peers. redisScanTimeout = 5 * time.Second + + // redisScanBatchSize indicates how many keys to retrieve from Redis at a time. + redisScanBatchSize = "1000" ) // RedisMembership implements the Membership interface using Redis as the backend @@ -152,7 +155,7 @@ func (rm *RedisMembership) getMembersOnce(ctx context.Context) ([]string, error) return nil, err } defer conn.Close() - keysChan, errChan := rm.scan(conn, keyPrefix, "10", redisScanTimeout) + keysChan, errChan := rm.scan(conn, keyPrefix, redisScanBatchSize, redisScanTimeout) memberList := make([]string, 0) for key := range keysChan { name := strings.Split(key, "•")[2]