Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck lookup reports #5833

Open
dapplion opened this issue May 24, 2024 · 2 comments
Open

Stuck lookup reports #5833

dapplion opened this issue May 24, 2024 · 2 comments

Comments

@dapplion
Copy link
Collaborator

dapplion commented May 24, 2024

Collection of investigations of Notify the devs, a sync lookup is stuck events:

ho-h-bn-toga - 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997

May 24 01:57:15.980 WARN Notify the devs, a sync lookup is stuck, ancestor_lookup: SingleBlockLookup { 
    id: 56, 
    block_request_state: BlockRequestState { 
        state: SingleLookupRequestState { 
            state: AwaitingDownload, failed_processing: 0, failed_downloading: 0 
            } 
        }, 
    blob_request_state: BlobRequestState { 
        state: SingleLookupRequestState { 
            state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
        } 
    }, 
    peers: 7, 

Lookup 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997

^ this block has 1 blob

  • We had a child lookup with cached block (id = 55), and zero peers.
  • When parent is processed, child continues
  • The child block is sent for processing
  • The blob request fails because it has no peers, and the lookup is dropped
  • When the block processing result returns, it triggers and unknown lookup error
  • Immediately then we get "unknown block hash message" for same root, and create a new lookup with (id = 56)
May 24 01:41:01.108 DEBG Unknown parent for gossip block         root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997
May 24 01:41:01.108 DEBG Searching for components of a block with unknown parent, id: 55, block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, peer_ids: [], service: lookup_sync, service: sync
..
May 24 01:41:01.518 DEBG Received unknown block hash message     peer: PeerId("16Uiu2HAmKdxFEg3UKJRQ1BUGQaFjU2TBpE7qcz3YibWqaqX1T6RH"), block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, service: sync
May 24 01:41:01.518 DEBG Searching for block components          id: 56, block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, peer_ids: [PeerId("16Uiu2HAmKdxFEg3UKJRQ1BUGQaFjU2TBpE7qcz3YibWqaqX1T6RH")], service: lookup_sync, service: sync
..
May 24 01:41:01.900 DEBG Beacon block awaiting blobs             block_slot: 1717705, block_root: 0x9bf61d9f31473fc8a9a91891b9601f9c6a04450ebe55bbde0a083c60a0b36997, service: beacon
May 24 01:41:01.900 DEBG Dropping lookup on request error        error: UnknownLookup, source: processing_result, id: 55, service: lookup_sync, service: sync

Stuck cause:

The new lookup 56 is created while the block is in the processing cache, block_lookup_request returns Pending. However, the block is processing from a previous RPC lookup. The event that is returned from processing gets ignored because it's for a previous ID.

Fix

TBD

@dapplion
Copy link
Collaborator Author

dapplion commented May 24, 2024

ho-h-bn-toga - 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf

May 24 04:00:45.978 WARN Notify the devs, a sync lookup is stuck, lookup: SingleBlockLookup { 
    id: 403, 
    block_request_state: BlockRequestState { 
        state: SingleLookupRequestState { 
            state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
        }
    },
    blob_request_state: BlobRequestState { 
        state: SingleLookupRequestState {
            state: AwaitingDownload, failed_processing: 0, failed_downloading: 0
        }
    }, 
    peers: 1,
    block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf,
    awaiting_parent: None

This block has 2 blobs

  • We received and verified a gossip block (insert into da_checker)
  • We received blobs from gossip
  • Then, receive an attestation that triggers a lookup for the block. The request does not progress as block_lookup_request returns pending because block is in da_checker.
  • When blob processing completes, block is imported but no event is sent to sync causing the lookup to be stuck. GossipBlockProcessResult is only sent at the end of block gossip import, but blob gossip can also remove the block from the da_checker.
May 24 03:44:37.566 DEBG Successfully verified gossip block      commitments: [0xa9da…aada, 0xb3d2…6dd3], root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, slot: 1718323, graffiti: int-hol-vouch-test-03, service: beacon
May 24 03:44:37.713 DEBG Beacon block awaiting blobs             block_slot: 1718323, block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, service: beacon
May 24 03:44:37.927 DEBG Successfully verified gossip blob       commitment: 0xa9da…aada, index: 0, root: 0x4449…0baf, slot: 1718323
May 24 03:44:37.931 DEBG Successfully verified gossip blob       commitment: 0xb3d2…6dd3, index: 1, root: 0x4449…0baf, slot: 1718323
May 24 03:44:37.934 DEBG Received unknown block hash message     peer: PeerId("16Uiu2HAm1Ng1VETpeb6iu2A31PcrnSV1oh6GyzVLaLDeWp9R6h9J"), block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, service: sync
May 24 03:44:37.934 DEBG Searching for block components          id: 403, block_root: 0x4449a0eb35516c34a10fe21ded0467131674b4ea5dad86465305ba32e3ca0baf, peer_ids: [PeerId("16Uiu2HAm1Ng1VETpeb6iu2A31PcrnSV1oh6GyzVLaLDeWp9R6h9J")], service: lookup_sync, service: sync
May 24 03:44:38.111 DEBG Writing blobs to store                  count: 2, block_root: 0x4449…0baf, service: beacon
May 24 03:44:38.112 INFO Gossipsub blob processed, imported fully available block, block_root: 0x4449…0baf

Stuck cause

GossipBlockProcessResult should be sent both from gossip block import and gossip blob import

Fix

TBD


Same status and root cause for this other incidents:

  • ho-h-bn-toga - 0x53c24ca2fdc60a30e6f567f2ef82b2622df0e34497dc282df1c55061f867e9eb
  • ho-h-bn-toga - 0x50686dd264241032e5d0512e7990df980ef1587fc38ec70176c3d72897c5a892
  • ho-h-bn-toga - 0x90a943df3c898961bd7780e75b38e2942289032f4801b5f6840c588b4be88355
  • ho-h-bn-toga - 0x28b2050145adac935f106401a0ba10ec09c84f760b1bd27213b2e7e74705911b

@michaelsproul
Copy link
Member

A specific subcase documented here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants