Skip to content

Commit

Permalink
Limit CLUSTER_CANT_FAILOVER_DATA_AGE log to 10 times period
Browse files Browse the repository at this point in the history
If a replica is step into data_age too old stage, it can not
trigger the failover and currently it can not be automatically
recovered and we will print a log every CLUSTER_CANT_FAILOVER_RELOG_PERIOD,
which is every second. If the primary has not recovered or there is
no manual failover, this log will flood the log file.

In this case, limit its frequency to 10 times period, which is
10 seconds in our code. Also in this data_age too old stage,
the repeated logs also can stand for the progress of the failover.

See also valkey-io#780 for more details about it.

Signed-off-by: Binbin <[email protected]>
  • Loading branch information
enjoy-binbin committed Oct 18, 2024
1 parent a62d1f1 commit 2fb5558
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions src/cluster_legacy.c
Original file line number Diff line number Diff line change
Expand Up @@ -4439,6 +4439,12 @@ void clusterLogCantFailover(int reason) {
time(NULL) - lastlog_time < CLUSTER_CANT_FAILOVER_RELOG_PERIOD)
return;

/* If data age is too old, this log may be printed repeatedly since it
* can not be automatically recovered. In this case, limit its frequency. */
if (reason == server.cluster->cant_failover_reason && reason == CLUSTER_CANT_FAILOVER_DATA_AGE &&
time(NULL) - lastlog_time < 10 * CLUSTER_CANT_FAILOVER_RELOG_PERIOD)
return;

server.cluster->cant_failover_reason = reason;

switch (reason) {
Expand Down

0 comments on commit 2fb5558

Please sign in to comment.