Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop Celo L1 at a specific block #2322

Closed
wants to merge 21 commits into from
Closed

Stop Celo L1 at a specific block #2322

wants to merge 21 commits into from

Conversation

alecps
Copy link
Contributor

@alecps alecps commented Jul 31, 2024

Closes https://github.com/celo-org/celo-blockchain-planning/issues/419

Changes:

  • '--l2migrationblock' flag added to geth cmd. This specifies what will be the first block of the l2 network, and the block after the last block of the l2 network. This does not affect the forkid or hardfork config.

  • 3 checks are added to stop block production, insertion and communication

  1. The mainLoop in miner/worker.go is updated to check if the next block is the l2-migration-block every time it starts generating a new block. When the l2-migration-block is reached, it the worker stops itself. This halts block production.
  2. writeHeadBlock in core/blockchain.go is updated to check if the block it is writing is >= the l2-migation-block before it does anything else, and also whether the block it just wrote is >= l2-migration-block - 1 at the end of the function. If either of these checks pass StopInsert() is called. This prevents any more blocks from being inserted and stops all processes related to block insertion. Normally the check at the end of the function will pass first, but if a node is restarted with the same l2-migration-block configured after already reaching and stopping on l2-migration block, the check at the beginning of writeHeadBlock will pass first and log an error.InsertChain will return an error if block insertion is stopped during its execution.
  3. Commit in consensus/istanbulbackend.go is updated to check if the block that was just committed is the block before the l2-migration-block right before it returns. If the check passes thenStopAnnounce() and 'Close()` are called. This stops blocks from being shared between nodes via the istanbul announce protocol and frees up associated resources.

Other changes:

  • Adds e2e test support for networks where some nodes are full nodes (not validating)
  • Updates logging to display latest blockNumber when blockchain is stopped

Tested with e2e tests

  • go test -v ./e2e_test -run TestStopNetworkAtL2BlockSimple -count 1000 -timeout 1h
  • go test -v ./e2e_test -run TestStopNetworkAtL2Block

Tested on Alfajores:

  • Use the following launch.json config
    { "name": "Launch geth afaljores", "type": "go", "request": "launch", "mode": "auto", "program": "./cmd/geth", "args": [ "--alfajores", "--datadir", "<DATADIR>", "--http", "--http.api", "eth,net,web3,debug,txpool", "--syncmode", "full", "--verbosity", "3", "--light.serve", "90", "--light.maxpeers", "1000", "--maxpeers", "1100", "--l2migrationblock", "<MIGRATION_BLOCK>", ] }

  • Or run with cli command
    ./build/bin/geth --alfajores --datadir <DATADIR> --http --http.api eth,net,web3,debug,txpool --syncmode full --verbosity 3 --light.serve 90 --light.maxpeers 1000 --maxpeers 1100 --port 30304 --l2migrationblock <MIGRATION_BLOCK>

Check on syncing with

./build/bin/geth attach <DATADIR>/geth.ipc --exec "eth.syncing"

and

./build/bin/geth attach <DATADIR>/geth.ipc --exec "eth.blockNumber"

Copy link

github-actions bot commented Jul 31, 2024

Coverage from tests in ./e2e_test/... for ./consensus/istanbul/... at commit d76885b

coverage: 55.1% of statements across all listed packages
coverage:  68.2% of statements in consensus/istanbul
coverage:  63.2% of statements in consensus/istanbul/announce
coverage:  57.2% of statements in consensus/istanbul/backend
coverage:   0.0% of statements in consensus/istanbul/backend/backendtest
coverage:  24.3% of statements in consensus/istanbul/backend/internal/replica
coverage:  66.3% of statements in consensus/istanbul/core
coverage:  50.0% of statements in consensus/istanbul/db
coverage:   0.0% of statements in consensus/istanbul/proxy
coverage:  64.2% of statements in consensus/istanbul/uptime
coverage:  52.4% of statements in consensus/istanbul/validator
coverage:  79.2% of statements in consensus/istanbul/validator/random

eth/handler.go Outdated Show resolved Hide resolved
@alecps alecps requested a review from ezdac August 8, 2024 18:34
@alecps alecps marked this pull request as ready for review August 8, 2024 18:35
cmd/geth/main.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mcortesi mcortesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, i would change a few things here.

First it is important to differentiate between two asks:

  1. Feature is node should not produce (or accept) block after L2 Block
  2. Feature is node should STOP after L2 block

To me, initially the feature would be (1).

As impl plant than means only 2 places to change:

  1. Not Produce => modify worker create block block to STOP once it reaches a certain number
  2. Not accept => Modify "Blockchain.Insert()" and other insert methods on blockchain object to FAIL after L2 block is reached.

What it doesn't mean is triggering the whole shtudown mechanism

cmd/geth/main.go Outdated Show resolved Hide resolved
@@ -811,6 +811,10 @@ func (bc *BlockChain) ExportN(w io.Writer, first uint64, last uint64) error {
//
// Note, this function assumes that the `mu` mutex is held!
func (bc *BlockChain) writeHeadBlock(block *types.Block) {
if bc.Config().IsL2Migration(new(big.Int).Sub(block.Number(), big.NewInt(1))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel there are better places to do this check, and return an error (vs. just log Crit which i'm not sure if it does not have the side effect of killing the process)

I remeber there are 2 entry point to add block here.

  1. As part of sync process. Blockchain.Insert()
  2. As part of consensus validators use another method that bypass the first pass of Blockchain, but it is a public method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the log.Crit here, still looking into whether there's a better place to put the check

Copy link
Contributor Author

@alecps alecps Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be many code paths that can call writeHeadBlock including
InsertPreprocessedBlock
reorg (shouldn’t be relevant for Celo as I understand it)
InsertChain
InsertChainWithoutSealVerification
insertSideChain
My reasoning behind calling bc.StopInsert() in writeHeadBlock is just that it seems to be called every time we add a new head block regardless of the code path and so feels safer than adding the check everywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test to check that InsertChain behaves as expected and returns an error when inserting large chains containing the migration block.

Given that StopInsert() can be called multiple times I'm not worried about calling it in a private method, and this approach avoids changes in all the functions that call writeHeadBlock. Given I'm a bit new to the code, it feels safer to put the checks here so that it's easy to reason about when StopInsert() gets called, and so it gets called as early as possible after the last l1 block is added.

There may be something I'm missing so please let me know if you still prefer a different spot for the checks!

@alecps alecps requested review from mcortesi and karlb August 12, 2024 18:52
@celo-org celo-org deleted a comment from github-actions bot Aug 19, 2024
@celo-org celo-org deleted a comment from github-actions bot Aug 19, 2024
@celo-org celo-org deleted a comment from github-actions bot Aug 19, 2024
@celo-org celo-org deleted a comment from github-actions bot Aug 19, 2024
Copy link

github-actions bot commented Aug 19, 2024

5881 passed, 7 failed, 45 skipped

Test failures:
  TestGethClient: gethclient

Failed
  TestGethClient/TestGetProof: gethclient
Failed
  TestGethClient/TestGCStats: gethclient
Failed
  TestGethClient/TestMemStats: gethclient
Failed
  TestGethClient/TestGetNodeInfo: gethclient
Failed
  TestGethClient/TestSetHead: gethclient
Failed
  TestGethClient/TestSubscribePendingTxHashes: gethclient
Failed
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:2036 +0x8e
/opt/hostedtoolcache/go/1.19.13/x64/src/time/sleep.go:176 +0x32
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1412 +0x4a5
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1452 +0x144
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1844 +0x456
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1726 +0x5d9
_testmain.go:95 +0x255
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:63 +0x3b
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:55 +0x75
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:63 +0x3b
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:55 +0x75
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:63 +0x3b
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:55 +0x75
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:63 +0x3b
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:55 +0x75
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:63 +0x3b
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:55 +0x75
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:63 +0x3b
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:55 +0x75
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:63 +0x3b
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:55 +0x75
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:63 +0x3b
/runner/_work/celo-blockchain/celo-blockchain/core/tx_cacher.go:55 +0x75
/home/runner/go/pkg/mod/github.com/rjeczalik/[email protected]/tree_nonrecursive.go:36 +0x45
/home/runner/go/pkg/mod/github.com/rjeczalik/[email protected]/tree_nonrecursive.go:29 +0x16a
/home/runner/go/pkg/mod/github.com/rjeczalik/[email protected]/tree_nonrecursive.go:81 +0x5a
/home/runner/go/pkg/mod/github.com/rjeczalik/[email protected]/tree_nonrecursive.go:30 +0x1be
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1494 +0x37a
/runner/_work/celo-blockchain/celo-blockchain/ethclient/gethclient/gethclient_test.go:135 +0x405
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1446 +0x10b
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1493 +0x35f
/runner/_work/celo-blockchain/celo-blockchain/accounts/manager.go:136 +0x14d
/runner/_work/celo-blockchain/celo-blockchain/accounts/manager.go:96 +0x2b5
/runner/_work/celo-blockchain/celo-blockchain/core/chain_indexer.go:314 +0xc7
/runner/_work/celo-blockchain/celo-blockchain/core/chain_indexer.go:122 +0x32f
/runner/_work/celo-blockchain/celo-blockchain/core/blockchain.go:2377 +0x105
/runner/_work/celo-blockchain/celo-blockchain/core/blockchain.go:389 +0x1c8d
/runner/_work/celo-blockchain/celo-blockchain/core/blockchain.go:2456 +0x233
/runner/_work/celo-blockchain/celo-blockchain/core/blockchain.go:396 +0x1d07
/runner/_work/celo-blockchain/celo-blockchain/core/chain_indexer.go:214 +0x1f1
/runner/_work/celo-blockchain/celo-blockchain/core/chain_indexer.go:155 +0x10a
/runner/_work/celo-blockchain/celo-blockchain/core/tx_pool.go:1225 +0x2f4
/runner/_work/celo-blockchain/celo-blockchain/core/tx_pool.go:355 +0x6be
/runner/_work/celo-blockchain/celo-blockchain/core/tx_pool.go:399 +0x2d7
/runner/_work/celo-blockchain/celo-blockchain/core/tx_pool.go:372 +0x985
/runner/_work/celo-blockchain/celo-blockchain/eth/downloader/statesync.go:83 +0x91
/runner/_work/celo-blockchain/celo-blockchain/eth/downloader/downloader.go:270 +0x7f6
/runner/_work/celo-blockchain/celo-blockchain/miner/worker.go:435 +0x34c
/runner/_work/celo-blockchain/celo-blockchain/miner/worker.go:138 +0x5d
/runner/_work/celo-blockchain/celo-blockchain/miner/worker.go:136 +0x3bb
/runner/_work/celo-blockchain/celo-blockchain/miner/miner.go:135 +0x1c5
/runner/_work/celo-blockchain/celo-blockchain/miner/miner.go:77 +0x26b
/runner/_work/celo-blockchain/celo-blockchain/eth/downloader/api.go:63 +0x1f3
/runner/_work/celo-blockchain/celo-blockchain/eth/downloader/api.go:49 +0x105
/runner/_work/celo-blockchain/celo-blockchain/eth/filters/filter_system.go:455 +0x40c
/runner/_work/celo-blockchain/celo-blockchain/eth/filters/filter_system.go:138 +0x3a5
/runner/_work/celo-blockchain/celo-blockchain/eth/filters/api.go:82 +0x9b
/runner/_work/celo-blockchain/celo-blockchain/eth/filters/api.go:70 +0x149
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/session_util.go:189 +0x59b
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/session.go:93 +0x2e5
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db_compaction.go:91 +0x158
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db.go:148 +0x4ea
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db_state.go:101 +0xa8
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db.go:149 +0x52a
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db_compaction.go:836 +0x6d7
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db.go:155 +0x59b
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db_compaction.go:773 +0x113
/home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db.go:156 +0x5d8
/runner/_work/celo-blockchain/celo-blockchain/miner/worker.go:353 +0x433
/runner/_work/celo-blockchain/celo-blockchain/miner/worker.go:428 +0x35
/runner/_work/celo-blockchain/celo-blockchain/miner/worker.go:427 +0x4bf
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/table.go:251 +0x2fc
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/v4_udp.go:153 +0x3d6
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/v4_udp.go:453 +0x2e5
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/v4_udp.go:156 +0x42a
/opt/hostedtoolcache/go/1.19.13/x64/src/runtime/netpoll.go:305 +0x89
/opt/hostedtoolcache/go/1.19.13/x64/src/internal/poll/fd_poll_runtime.go:84 +0x32
/opt/hostedtoolcache/go/1.19.13/x64/src/internal/poll/fd_poll_runtime.go:89
/opt/hostedtoolcache/go/1.19.13/x64/src/internal/poll/fd_unix.go:277 +0x1e5
/opt/hostedtoolcache/go/1.19.13/x64/src/net/fd_posix.go:72 +0x29
/opt/hostedtoolcache/go/1.19.13/x64/src/net/udpsock_posix.go:59 +0x85
/opt/hostedtoolcache/go/1.19.13/x64/src/net/udpsock.go:149 +0x31
/opt/hostedtoolcache/go/1.19.13/x64/src/net/udpsock.go:141 +0x50
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/v4_udp.go:531 +0xf4
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/v4_udp.go:157 +0x48d
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/lookup.go:135 +0xaa
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/lookup.go:112 +0xa9
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/lookup.go:69 +0x36
/runner/_work/celo-blockchain/celo-blockchain/p2p/discover/lookup.go:214 +0xaa
/runner/_work/celo-blockchain/celo-blockchain/p2p/enode/iter.go:279 +0xbb
/runner/_work/celo-blockchain/celo-blockchain/p2p/enode/iter.go:180 +0x250
/runner/_work/celo-blockchain/celo-blockchain/p2p/enode/iter.go:241
/runner/_work/celo-blockchain/celo-blockchain/p2p/enode/iter.go:228 +0x230
/runner/_work/celo-blockchain/celo-blockchain/p2p/dial.go:327 +0x9f
/runner/_work/celo-blockchain/celo-blockchain/p2p/dial.go:185 +0x450
/runner/_work/celo-blockchain/celo-blockchain/p2p/dial.go:248 +0x2cf
/runner/_work/celo-blockchain/celo-blockchain/p2p/dial.go:186 +0x4ca
/runner/_work/celo-blockchain/celo-blockchain/p2p/server.go:902 +0x985
/runner/_work/celo-blockchain/celo-blockchain/p2p/server.go:589 +0x57c
/runner/_work/celo-blockchain/celo-blockchain/eth/protocols/eth/discovery.go:48 +0x11d
/runner/_work/celo-blockchain/celo-blockchain/eth/protocols/eth/discovery.go:45 +0xda
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:50 +0xcc
/runner/_work/celo-blockchain/celo-blockchain/eth/bloombits.go:48 +0x34
/runner/_work/celo-blockchain/celo-blockchain/eth/handler.go:589 +0x118
/runner/_work/celo-blockchain/celo-blockchain/eth/handler.go:462 +0xfa
/runner/_work/celo-blockchain/celo-blockchain/eth/handler.go:577 +0x8a
/runner/_work/celo-blockchain/celo-blockchain/eth/handler.go:467 +0x1ad
/runner/_work/celo-blockchain/celo-blockchain/eth/sync.go:128 +0x26e
/runner/_work/celo-blockchain/celo-blockchain/eth/handler.go:471 +0x20a
/runner/_work/celo-blockchain/celo-blockchain/rpc/server.go:87 +0x165
/runner/_work/celo-blockchain/celo-blockchain/rpc/inproc.go:29 +0xc5
/runner/_work/celo-blockchain/celo-blockchain/rpc/client.go:579 +0x2cc
/runner/_work/celo-blockchain/celo-blockchain/rpc/client.go:241 +0x3ac
/runner/_work/celo-blockchain/celo-blockchain/eth/fetcher/block_fetcher.go:381 +0x367
/runner/_work/celo-blockchain/celo-blockchain/eth/fetcher/block_fetcher.go:233 +0x56
/runner/_work/celo-blockchain/celo-blockchain/eth/fetcher/tx_fetcher.go:355 +0x165
/runner/_work/celo-blockchain/celo-blockchain/eth/fetcher/tx_fetcher.go:337 +0x56
/runner/_work/celo-blockchain/celo-blockchain/ethclient/gethclient/gethclient_test.go:283 +0x445
/runner/_work/celo-blockchain/celo-blockchain/ethclient/gethclient/gethclient_test.go:124 +0x1d
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1446 +0x10b
/opt/hostedtoolcache/go/1.19.13/x64/src/testing/testing.go:1493 +0x35f
/runner/_work/celo-blockchain/celo-blockchain/rpc/client.go:579 +0x2cc
/runner/_work/celo-blockchain/celo-blockchain/rpc/client.go:241 +0x3ac
/opt/hostedtoolcache/go/1.19.13/x64/src/net/pipe.go:159 +0x157
/opt/hostedtoolcache/go/1.19.13/x64/src/net/pipe.go:142 +0x25
/opt/hostedtoolcache/go/1.19.13/x64/src/encoding/json/stream.go:165 +0x188
/opt/hostedtoolcache/go/1.19.13/x64/src/encoding/json/stream.go:140 +0xbb
/opt/hostedtoolcache/go/1.19.13/x64/src/encoding/json/stream.go:63 +0x78
/runner/_work/celo-blockchain/celo-blockchain/rpc/json.go:209 +0x4d
/runner/_work/celo-blockchain/celo-blockchain/rpc/client.go:652 +0xb7
/runner/_work/celo-blockchain/celo-blockchain/rpc/client.go:576 +0x155
/opt/hostedtoolcache/go/1.19.13/x64/src/net/pipe.go:159 +0x157
/opt/hostedtoolcache/go/1.19.13/x64/src/net/pipe.go:142 +0x25
/opt/hostedtoolcache/go/1.19.13/x64/src/encoding/json/stream.go:165 +0x188
/opt/hostedtoolcache/go/1.19.13/x64/src/encoding/json/stream.go:140 +0xbb
/opt/hostedtoolcache/go/1.19.13/x64/src/encoding/json/stream.go:63 +0x78
/runner/_work/celo-blockchain/celo-blockchain/rpc/json.go:209 +0x4d
/runner/_work/celo-blockchain/celo-blockchain/rpc/client.go:652 +0xb7
/runner/_work/celo-blockchain/celo-blockchain/rpc/client.go:576 +0x155
/runner/_work/celo-blockchain/celo-blockchain/eth/filters/api.go:160 +0x15f
/runner/_work/celo-blockchain/celo-blockchain/eth/filters/api.go:154 +0xec
/opt/hostedtoolcache/go/1.19.13/x64/src/runtime/select.go:590 +0x23e
/opt/hostedtoolcache/go/1.19.13/x64/src/reflect/value.go:2952 +0xd2
/runner/_work/celo-blockchain/celo-blockchain/rpc/subscription.go:331 +0x2fe
/runner/_work/celo-blockchain/celo-blockchain/rpc/subscription.go:294 +0x66
/runner/_work/celo-blockchain/celo-blockchain/rpc/handler.go:283 +0x1f9
This test report was produced by the test-summary action.  Made with ❤️ in Cambridge.

e2e_test/e2e_test.go Outdated Show resolved Hide resolved
e2e_test/e2e_test.go Outdated Show resolved Hide resolved
test/node.go Show resolved Hide resolved
Comment on lines +848 to +854

nextBlockNum := new(big.Int).Add(block.Number(), big.NewInt(1))
if bc.Config().IsL2Migration(nextBlockNum) {
log.Info("The next block is the L2 migration block, stopping block insertion", "currentBlock", block.NumberU64(), "hash", block.Hash(), "nextBlock", nextBlockNum.Uint64())
bc.StopInsert()
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this, the check up front should be sufficient. Sopping insertion isn't really important unless there is an attempt to insert the next block.

Copy link
Contributor Author

@alecps alecps Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I remove this the long test actually starts failing. I was under the impression bc.StopInsert() was essential to call based on what I saw during testing. The function flips an atomic value that is checked at the beginning of numerous functions related to block insertion, including those in HeaderChain like writeHeaders which you asked about above.

@@ -811,6 +811,14 @@ func (bc *BlockChain) ExportN(w io.Writer, first uint64, last uint64) error {
//
// Note, this function assumes that the `mu` mutex is held!
func (bc *BlockChain) writeHeadBlock(block *types.Block) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also core.HeaderChain.writeHeaders where rawdb.WriteCanonicalHash is called. I'm just wondering if this could cause any issues. For example we may sync a header later than the stop block.

Copy link
Contributor Author

@alecps alecps Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that by calling bc.StopInsert() we prevent the HeaderChain code from continuing to update the head header. writeHeaders checks something called procInterrupt(), which wraps a call to insertStopped(), which is injected when NewHeaderChain() is called. When we call StopInsert(), it makes procInterrupt() return true in writeHeaders before it updates the head header references etc.

// this check will pass first and log an error.
if bc.Config().IsL2Migration(block.Number()) {
log.Error("Attempt to insert block number >= l2MigrationBlock, stopping block insertion", "block", block.NumberU64(), "hash", block.Hash())
bc.StopInsert()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should need to call StopInsert here since this check will effectively prevent any additional blocks being added.

Copy link
Contributor Author

@alecps alecps Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think StopInsert() is actually preventing some threads from inserting blocks, as removing it below makes the test fail. Specifically, the writeHeaders function you asked about above checks something called procInterrupt(), which wraps a call to insertStopped(), which is injected when NewHeaderChain() is called. When we call StopInsert(), it makes procInterrupt() return true in writeHeaders before it updates the head header references etc.
This check was in place to fix an error that shows up via command line testing when a node is restarted with the same l2-migration-block number configured after already reaching it. I'll test whether removing this call to StopInsert re-introduces that error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure why we need changes to writeHeadBlock at all. So I put a panic in for the IsL2Migration migration case to see when the check is relevant. The panic was triggered by the block fetcher.

It might be more elegant to discard the blocks when the block fetcher receives than rather then preventing the write here. Suggested commit: 90dbc6c

I still don't have the full picture, so let me know if I'm missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlb that looks pretty neat to me, and the tests seem to work, so I'd be in favour of this approach.

Copy link
Contributor

@piersy piersy Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've pushed a couple of updates to karlb/stop and removed basically everything except the extra flag, the rejection in the validate function and the e2e test. Tests seem to be working, so I think that would be the way to go now. - https://github.com/celo-org/celo-blockchain/pull/2330/files

if sb.chain.Config().IsL2Migration(nextBlockNum) {
sb.logger.Info("The next block is the L2 migration block, stopping announce protocol and closing istanbul backend", "currentBlock", block.NumberU64(), "hash", block.Hash(), "nextBlock", nextBlockNum)
sb.StopAnnouncing()
sb.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is safe to call from here since Close is also called from other threads, it might be enough to add a lock inside Close so that only one thread can be executing in there at any time. Needs a bit more thought though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to call Close though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like calling Close() may be a bit overkill actually. StopAnnouncing() stops the announce protocol and it looks like Close() basically prevents it from being easily restarted with StartAnnouncing(). I removed it and the e2e test still passed so I'll remove it from the PR

if err != nil {
log.Error("Error while calling engine.StopValidating", "err", err)
if istanbul, ok := w.engine.(*istanbulBackend.Backend); ok {
if istanbul.IsValidating() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the added condition here?

Copy link
Contributor Author

@alecps alecps Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the miner.Stop() function gets called when the user (or the e2e test) actually shuts down the node, in which case istanbul.StopValidating() throws an error here since it's already been called (in the code below). Let me know if that makes sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but this is introducing a race condition, since you could still have 2 goroutines calling stop. But it might be good enough for our purposes if the time between the 2 calls to stop is sufficient.

e2e_test/e2e_test.go Outdated Show resolved Hide resolved
@karlb karlb mentioned this pull request Sep 17, 2024
@piersy
Copy link
Contributor

piersy commented Sep 19, 2024

Closed in favour of #2330

@piersy piersy closed this Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants