Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache storage policy application results (Search, Replicate) #2892

Merged
merged 2 commits into from
Jul 18, 2024

Conversation

cthulhu-rider
Copy link
Contributor

@cthulhu-rider cthulhu-rider commented Jul 11, 2024

code and unit tests are ready, intergration tests pass. I also want to take a memory profile in devenv


i created 5 containers and ran 30 routines sending 1000 search queries (each time to random container)

v0.42.1

Screenshot from 2024-07-11 19-25-14

current branch

Screenshot from 2024-07-11 19-35-30


Go script
func TestDevenv(t *testing.T) {
	strCnr := []string{
		"3SDKVLon6bk6ALm8PCEjaGHUeg8cP5d6cvSJdpuMJ3qy",
		"5gGHUYnAXfB7y5XSFJb4SBr2g3NvEnkAELDAV9rxFdWL",
		"65NX4euCiJDDkduNeSc8puo7oKa3ddZC2dHxNMSS3UBA",
		"9EPJGmDDDv9iNqP1rAy7pyYuaJHRy2XSyMZt81DL35cD",
		"ArUd3NXPUhaDLNZuEKYHjbc4GS7u4nViDTKbyLiDtENW",
	}
	const endpoint = "s01.neofs.devenv:8080"
	const workers = 30
	ctx := context.Background()
	signer := test.RandomSignerRFC6979(t)

	cnrs := make([]cid.ID, len(strCnr))
	for i := range strCnr {
		require.NoError(t, cnrs[i].DecodeString(strCnr[i]))
	}

	c, err := client.New(client.PrmInit{})
	require.NoError(t, err)
	var prm client.PrmDial
	prm.SetServerURI(endpoint)
	require.NoError(t, c.Dial(prm))
	t.Cleanup(func() { c.Close() })

	var wg sync.WaitGroup
	st := time.Now()
	for i := 0; i < workers; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()

			for i := 0; i < 1000; i++ {
				if i%100 == 0 {
					fmt.Println(i)
				}
				rdr, err := c.ObjectSearchInit(ctx, cnrs[rand.Int()%len(cnrs)], signer, client.PrmObjectSearch{})
				if err != nil {
					log.Println("init search:", err)
					rdr.Close()
					continue
				}
				err = rdr.Iterate(func(id oid.ID) bool { return false })
				if err != nil {
					log.Println("read search response:", err)
					rdr.Close()
					continue
				}
				rdr.Close()
			}
		}()
	}
	wg.Wait()
	fmt.Println(time.Since(st))
}

Potential

besides obvisuly required implementing the same approach to other components/RPC which is gonna TBD in next PRs, i got following thoughts:

  1. the past data is less in demand than the current one, so less space can be allocated for it
  2. alternatives to LRU strategy
  3. dynamic volume estimation taking into account the number of requested containers and storage nodes in the network
  4. storing indices instead of node info structures (netmap: Provide placement methods returning indices instead of copied node descriptors neofs-sdk-go#541)
  5. invalidate cached results on container removal. Normally, they will be automatically evicted according to the LRU principle. But it’s still better to force the displacement
  6. in proposal, the cache is invalidated by changing the epoch. However, the network map may not change, or the change does not affect the policies of some containers

Copy link

codecov bot commented Jul 11, 2024

Codecov Report

Attention: Patch coverage is 96.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 23.73%. Comparing base (4f90cc2) to head (10d05a4).
Report is 3 commits behind head on master.

Files Patch % Lines
cmd/neofs-node/policy.go 96.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2892      +/-   ##
==========================================
+ Coverage   23.67%   23.73%   +0.05%     
==========================================
  Files         775      775              
  Lines       44908    44933      +25     
==========================================
+ Hits        10633    10664      +31     
+ Misses      33420    33417       -3     
+ Partials      855      852       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cthulhu-rider cthulhu-rider force-pushed the optimize/search-policy branch 3 times, most recently from be46f4c to 13a115d Compare July 11, 2024 16:57
@cthulhu-rider cthulhu-rider marked this pull request as ready for review July 11, 2024 17:38
cmd/neofs-node/policy.go Outdated Show resolved Hide resolved
cmd/neofs-node/policy.go Show resolved Hide resolved
cmd/neofs-node/policy.go Outdated Show resolved Hide resolved
@carpawell
Copy link
Member

carpawell commented Jul 16, 2024

Also, the last commit says it was tried to be done unsuccessfully before, can you, please, provide more info about it (to the commit)?

cmd/neofs-node/policy.go Outdated Show resolved Hide resolved
cmd/neofs-node/policy.go Outdated Show resolved Hide resolved
cmd/neofs-node/policy.go Outdated Show resolved Hide resolved
@cthulhu-rider
Copy link
Contributor Author

Also, the last commit says it was tried to be done unsuccessfully before, can you, please, provide more info about it (to the commit)?

done

cmd/neofs-node/policy.go Outdated Show resolved Hide resolved
Application result of container (C) storage policy to the network map
(N) does not change for fixed C and N. Previously, `Search` and
`Replicate` object server handlers always calculated the list of
container nodes from scratch. This resulted in excessive node resource
consumption when there was a dense flow of requests for a small number
of containers per epoch. The obvious solution is to cache the latest
results.

A similar attempt had already been made earlier with
9269ed3, but it turned out to be
incorrect and did not change anything. As can be seen from the code, the
cache was checked only if the pointers of the received network map and
the last processed one matched. The latter was never set, so there were
no cache callsю

This adds a caching component for up to 1000 recently requested lists of
container nodes. By increasing the amount of memory retained, the
component will mitigate load spikes on a small number of containers. The
volume limit of 1000 was chosen heuristically as a first approximation.

Tests on the development environment showed a pretty good improvement,
but results on real load tests are yet to be obtained. Based on this,
similar optimization for other layers and queries will be done later.

Refs #2692.

Signed-off-by: Leonard Lyubich <[email protected]>
@cthulhu-rider cthulhu-rider merged commit a497252 into master Jul 18, 2024
20 of 22 checks passed
@cthulhu-rider cthulhu-rider deleted the optimize/search-policy branch July 18, 2024 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants