Fix metadata digest mismatch after failed SYSTEM RESTART REPLICA by alexey-milovidov · Pull Request #98469 · ClickHouse/ClickHouse

alexey-milovidov · 2026-03-02T04:47:15Z

Summary

When SYSTEM RESTART REPLICA fails after exhausting all retries (e.g. due to fault injection causing MEMORY_LIMIT_EXCEEDED) or after query cancellation, the table is left permanently detached from the in-memory tables map. The RestartReplicaGuard is destroyed when the function exits, but the table is never reattached. Subsequent DDL operations that call checkDigestValid see a mismatch between local_digest (from the tables map, excluding the missing table) and tables_metadata_digest (which still includes the table's hash), causing a "Digest does not match" LOGICAL_ERROR exception.
Fix by adjusting tables_metadata_digest in the error path to subtract the hash of the lost table, keeping the in-memory digest consistent with the tables map. The table's metadata file still exists on disk and will be restored on server restart or DatabaseReplicated recovery.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=acf6aca664cc5cf8f7b0f75e4c7094ef3b6b17bf&name_0=MasterCI&name_1=BuzzHouse%20%28amd_ubsan%29

Test plan

Verify the BuzzHouse (amd_ubsan) test no longer fails with "Digest does not match"
Verify the fix handles both retry exhaustion and query cancellation paths
Verify normal SYSTEM RESTART REPLICA behavior is unaffected

Changelog category (leave one):

CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

🤖 Generated with Claude Code

When `SYSTEM RESTART REPLICA` fails after exhausting all retries (e.g. due to fault injection causing `MEMORY_LIMIT_EXCEEDED`) or after query cancellation, the table is left permanently detached from the in-memory `tables` map. The `RestartReplicaGuard` (which suppresses digest checks during the restart) is destroyed when the function exits, but the table is never reattached. Subsequent DDL operations that call `checkDigestValid` see a mismatch between `local_digest` (computed from the `tables` map, which excludes the missing table) and `tables_metadata_digest` (which still includes the table's hash), causing a "Digest does not match" LOGICAL_ERROR exception. Fix by adjusting `tables_metadata_digest` in the error path to subtract the hash of the lost table, keeping the in-memory digest consistent with the `tables` map. The table's metadata file still exists on disk and will be restored on server restart or `DatabaseReplicated` recovery. https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=acf6aca664cc5cf8f7b0f75e4c7094ef3b6b17bf&name_0=MasterCI&name_1=BuzzHouse%20%28amd_ubsan%29 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

clickhouse-gh · 2026-03-02T04:47:51Z

Workflow [PR], commit [ff73aa4]

Summary: ✅

Copilot

Pull request overview

This PR addresses a failure mode in SYSTEM RESTART REPLICA where, if the restart process exits on error/cancellation after detaching the table, DatabaseReplicated can later hit a debug/sanitizer “Digest does not match” assertion due to the table being missing from the in-memory tables map while the digest still reflects the pre-detach state.

Changes:

Wrap the post-detach table re-creation retry loop in doRestartReplica() with an outer try/catch and, on failure, call into DatabaseReplicated to adjust the in-memory metadata digest.
Add DatabaseReplicated::adjustDigestOnTableLostFromRestart() API and implement it to subtract the table’s metadata hash from tables_metadata_digest.
Add a warning log when this adjustment occurs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`src/Interpreters/InterpreterSystemQuery.cpp`	Adds failure-path handling for `SYSTEM RESTART REPLICA` to adjust `DatabaseReplicated`’s digest when restart ultimately fails.
`src/Databases/DatabaseReplicated.h`	Declares `adjustDigestOnTableLostFromRestart()` for use by the restart path.
`src/Databases/DatabaseReplicated.cpp`	Implements digest adjustment (subtract metadata hash) and logs a warning when invoked.

Copilot · 2026-03-02T05:00:24Z

src/Interpreters/InterpreterSystemQuery.cpp

+        /// The table is left permanently detached from the in-memory tables map.
+        /// Adjust the metadata digest so it stays consistent with the tables map,
+        /// preventing false "Digest does not match" LOGICAL_ERROR exceptions.
+        /// The table's metadata file still exists on disk, so it will be restored
+        /// on server restart or after DatabaseReplicated recovery.
+        if (auto * replicated_db = typeid_cast<DatabaseReplicated *>(database.get()))
+        {
+            try
+            {
+                replicated_db->adjustDigestOnTableLostFromRestart(replica_table_id.table_name);
+            }
+            catch (...)
+            {
+                tryLogCurrentException(log, "Failed to adjust digest after failed SYSTEM RESTART REPLICA");
+            }
        }


Adjusting DatabaseReplicated::tables_metadata_digest here makes it diverge from its documented meaning (sum of local on-disk metadata files). Since the metadata file still exists, subsequent DatabaseReplicated DDL commits may base /replicas/<replica>/digest updates on this adjusted value, potentially masking the inconsistency with ZooKeeper metadata or triggering recovery decisions based on an artificial digest. Consider instead rolling back the detach on failure (reattach the original table instance, or reload/ATTACH from metadata) so both the tables map and digest stay consistent, or add an explicit "restart failed" state that suppresses digest assertions without rewriting the digest value.

Suggested change

/// The table is left permanently detached from the in-memory tables map.

/// Adjust the metadata digest so it stays consistent with the tables map,

/// preventing false "Digest does not match" LOGICAL_ERROR exceptions.

/// The table's metadata file still exists on disk, so it will be restored

/// on server restart or after DatabaseReplicated recovery.

if (auto * replicated_db = typeid_cast<DatabaseReplicated *>(database.get()))

{

try

{

replicated_db->adjustDigestOnTableLostFromRestart(replica_table_id.table_name);

}

catch (...)

{

tryLogCurrentException(log, "Failed to adjust digest after failed SYSTEM RESTART REPLICA");

}

}

/// Failed to restart replica; table remains detached in the in-memory tables map.

/// Do not adjust DatabaseReplicated metadata digest here, because on-disk metadata

/// has not changed and the digest must continue to reflect the sum of local metadata files.

tryLogCurrentException(log, "Failed to restart replica, table will remain detached");

Copilot · 2026-03-02T05:00:24Z

src/Interpreters/InterpreterSystemQuery.cpp

+    /// If all retries fail (or the query is cancelled), we adjust the in-memory metadata digest
+    /// to reflect the table's absence, preventing false "Digest does not match" assertions.

    StoragePtr new_table;
-    size_t non_zk_retries = 0;
-    constexpr size_t max_non_zk_retries = 10;
-    while (true)
+    try
    {
-        try
-        {
-            new_table = StorageFactory::instance().get(create,
-                data_path,
-                system_context,
-                system_context->getGlobalContext(),
-                columns,
-                constraints,
-                LoadingStrictnessLevel::ATTACH);
-
-            break;
-        }
-        catch (const Coordination::Exception & e)
+        size_t non_zk_retries = 0;
+        constexpr size_t max_non_zk_retries = 10;
+        while (true)
        {
-            /// Only retry on transient ZooKeeper errors (connection loss, session expired, etc.)
-            if (!Coordination::isHardwareError(e.code))
-                throw;
+            try
+            {
+                new_table = StorageFactory::instance().get(create,
+                    data_path,
+                    system_context,
+                    system_context->getGlobalContext(),
+                    columns,
+                    constraints,
+                    LoadingStrictnessLevel::ATTACH);
+
+                break;
+            }
+            catch (const Coordination::Exception & e)
+            {
+                /// Only retry on transient ZooKeeper errors (connection loss, session expired, etc.)
+                if (!Coordination::isHardwareError(e.code))
+                    throw;

-            tryLogCurrentException(
-                getLogger("InterpreterSystemQuery"),
-                fmt::format("Failed to restart replica {}, will retry", replica_table_id.getNameForLogs()));
+                tryLogCurrentException(
+                    getLogger("InterpreterSystemQuery"),
+                    fmt::format("Failed to restart replica {}, will retry", replica_table_id.getNameForLogs()));

-            /// Check if the query was cancelled (e.g. server is shutting down)
-            if (auto process_list_element = getContext()->getProcessListElementSafe())
-                process_list_element->checkTimeLimit();
+                /// Check if the query was cancelled (e.g. server is shutting down)
+                if (auto process_list_element = getContext()->getProcessListElementSafe())
+                    process_list_element->checkTimeLimit();

-            sleepForSeconds(1);
-        }
-        catch (...)
-        {
-            if (++non_zk_retries > max_non_zk_retries)
-                throw;
+                sleepForSeconds(1);
+            }
+            catch (...)
+            {
+                if (++non_zk_retries > max_non_zk_retries)
+                    throw;

-            tryLogCurrentException(
-                getLogger("InterpreterSystemQuery"),
-                fmt::format("Failed to restart replica {} (attempt {}/{}), will retry",
-                    replica_table_id.getNameForLogs(), non_zk_retries, max_non_zk_retries));
+                tryLogCurrentException(
+                    getLogger("InterpreterSystemQuery"),
+                    fmt::format("Failed to restart replica {} (attempt {}/{}), will retry",
+                        replica_table_id.getNameForLogs(), non_zk_retries, max_non_zk_retries));

-            if (auto process_list_element = getContext()->getProcessListElementSafe())
-                process_list_element->checkTimeLimit();
+                if (auto process_list_element = getContext()->getProcessListElementSafe())
+                    process_list_element->checkTimeLimit();

-            sleepForSeconds(1);
+                sleepForSeconds(1);
+            }
+        }
+    }
+    catch (...)
+    {
+        /// The table is left permanently detached from the in-memory tables map.
+        /// Adjust the metadata digest so it stays consistent with the tables map,
+        /// preventing false "Digest does not match" LOGICAL_ERROR exceptions.
+        /// The table's metadata file still exists on disk, so it will be restored
+        /// on server restart or after DatabaseReplicated recovery.
+        if (auto * replicated_db = typeid_cast<DatabaseReplicated *>(database.get()))
+        {
+            try
+            {
+                replicated_db->adjustDigestOnTableLostFromRestart(replica_table_id.table_name);
+            }
+            catch (...)
+            {
+                tryLogCurrentException(log, "Failed to adjust digest after failed SYSTEM RESTART REPLICA");
+            }
        }
+        throw;


This new failure-path behavior is hard to validate without a regression test. Please add a stateless/integration test that forces the post-detach re-create step to fail (e.g. via fault injection / memory limit) and then verifies we don’t hit the debug/sanitizer "Digest does not match" path and that the table is eventually recoverable (restart or DatabaseReplicated recovery).

clickhouse-gh bot added the pr-ci label Mar 2, 2026

tiandiwonder requested a review from Copilot March 2, 2026 04:53

Copilot started reviewing on behalf of tiandiwonder March 2, 2026 04:54 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

tiandiwonder self-assigned this Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix metadata digest mismatch after failed SYSTEM RESTART REPLICA#98469

Fix metadata digest mismatch after failed SYSTEM RESTART REPLICA#98469
alexey-milovidov wants to merge 1 commit intomasterfrom
fix-digest-mismatch-on-restart-replica-failure

alexey-milovidov commented Mar 2, 2026

Uh oh!

clickhouse-gh bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexey-milovidov commented Mar 2, 2026

Summary

Test plan

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Uh oh!

clickhouse-gh bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clickhouse-gh bot commented Mar 2, 2026 •

edited

Loading