[Bug]: Network history is extremally slow on mainnet for the archival nodes #10987

daniel1302 · 2024-03-25T11:02:03Z

Problem encountered

Let's assume We have:

an archival node.
We have ZFS setup with the ZSTD compression.
Network history takes 1.7 TB on the disk space - compression not possible because segments are already compressed)

The core is catching up quickly, but the data node is not catching up so quickly. See the graph below(diff between core and data-node) - It goes down very slowly:

Network history snapshot creation takes a very long time. If We sum network history copy time for all tables it is spending 100% of time copying data from PostgreSQL to the network history - so it has little time to catch up. See the graph for api0.vega.community and api2(it has less data both in the DB and the network history)

Who is affected?

All people with full network history and archival node.

How to mitigate

Move to faster disks (not always possible- especially in clouds. Sometimes it is very expensive)
Disable network history publishing segments - It will create a gap in the network history and make it useless on the specific node)

Observed behaviour

The data node is not able to catch up because copying data from the database to network history takes too much time.

Expected behaviour

Data-node should catch up quicker.

Steps to reproduce

N/A

Software version

v0.7.10

Failing test

No response

Jenkins run

No response

Configuration used

No response

Relevant log output

No response

daniel1302 added the bug label Mar 25, 2024

daniel1302 assigned gordsport Mar 25, 2024

vega-issues added this to Core Kanban Mar 25, 2024

JonRay15 added this to the ⏭️ TBC milestone Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Network history is extremally slow on mainnet for the archival nodes #10987

[Bug]: Network history is extremally slow on mainnet for the archival nodes #10987

daniel1302 commented Mar 25, 2024 •

edited

Loading

[Bug]: Network history is extremally slow on mainnet for the archival nodes #10987

[Bug]: Network history is extremally slow on mainnet for the archival nodes #10987

Comments

daniel1302 commented Mar 25, 2024 • edited Loading

Problem encountered

Who is affected?

How to mitigate

Observed behaviour

Expected behaviour

Steps to reproduce

Software version

Failing test

Jenkins run

Configuration used

Relevant log output

daniel1302 commented Mar 25, 2024 •

edited

Loading