You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gets hit then the malformed/corrupt ledger file is not getting ignored when a node starts from a later snapshot but has this older uncommitted ledger file in its ledger directory.
2024-11-05T05:12:11.384963Z 100 [fail ] ../src/host/ledger.h:312 | Malformed incomplete ledger file /mnt/storage/ledger/ledger_19 at seqno 32 (expecting entry of size 54978, remaining 49144)
2024-11-05T05:12:11.415505Z 100 [debug] ../src/host/ledger.h:1107 | Recovering file from main ledger directory: ledger_19
The text was updated successfully, but these errors were encountered:
More generally if a node is starting in join mode with uncommitted ledger files in its ledger directory that are further behind than the committed snapshot files in its snapshot directory then the uncommitted ledger files should get ignored and not interfere with node start up. The situation I faced was eventually the below (after multiple scale up/down/ recovery attempts):
2024-11-04T14:40:27.733657Z -0.012 0 [info ][gov] ode/gov/handlers/recovery.h:170 | 1/1 recovery shares successfully submitted
End of recovery procedure initiated - initiating recovery
2024-11-04T14:40:27.741599Z -0.020 0 [info ][gov] /gov/gov_endpoint_registry.h:58 | RequestCompletedEvent: POST /recovery/members/{memberId}:recover 200 0ms 1 attempt(s)
2024-11-04T14:40:28.702008Z -0.004 0 [info ] ../src/node/node_state.h:2167 | Initiating end of recovery (primary)
2024-11-04T14:40:28.705587Z -0.008 0 [info ] ../src/node/snapshot_serdes.h:111 | Deserialising snapshot (size: 457616, public only: false)
2024-11-04T14:40:28.705679Z -0.008 0 [info ] ../src/node/snapshot_serdes.h:123 | Snapshot successfully deserialised at seqno 117
2024-11-04T14:40:28.705692Z 100 [fail ] ../src/host/ledger.h:489 | Cannot find entries: 118 - 31 in ledger file ledger_19
2024-11-04T14:40:28.705702Z 100 [debug] ../src/host/ledger.h:1435 | Ledger commit: 150/150
2024-11-04T14:40:28.761173Z 100 [fail ] ../src/host/main.cpp:779 | Exception in ccf::run: std::exception
2024-11-04T14:40:28.761947Z -0.064 0 [fail ] ../src/ds/messaging.h:170 | Exception while processing message <::consensus::ledger_no_entry_range:1107064419> of size 17
libc++abi: terminating due to uncaught exception of type std::exception: std::exception
Per my understanding of what happened: the node started up with ledger_19 file around and also with committed snapshot with seq no 117 and the presence of ledger_19 file resulted in cchost crashing.
If the path under
CCF/src/host/ledger.h
Line 321 in f1bd349
The text was updated successfully, but these errors were encountered: