Skip to content

Fix HiveCatalog connection stability with retry mechanism for TTransportException#98471

Open
otselnik wants to merge 4 commits intoClickHouse:masterfrom
otselnik:fix/hive-catalog-ttransport-exception
Open

Fix HiveCatalog connection stability with retry mechanism for TTransportException#98471
otselnik wants to merge 4 commits intoClickHouse:masterfrom
otselnik:fix/hive-catalog-ttransport-exception

Conversation

@otselnik
Copy link

@otselnik otselnik commented Mar 2, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed HiveCatalog connection stability by adding automatic retry mechanism and reconnection logic for handling TTransportException errors when communicating with Hive Metastore.

Documentation entry for user-facing changes

  • Documentation is not required (internal implementation improvement)

Technical Details

Motivation:
HiveCatalog connections to Hive Metastore could fail with TTransportException when the metastore service restarts or network issues occur. This caused queries to fail even though the issue was temporary and could be resolved by reconnecting.

Changes:

  • Added reconnect() method to properly close and reopen connections to Hive Metastore
  • Implemented executeWithRetry() template method that automatically retries operations up to 3 times on TTransportException

Behavior:
When a TTransportException occurs during any Hive Metastore operation (getTables, existsTable, etc.), the system will:

  1. Attempt to reconnect to the metastore
  2. Retry the operation up to 3 times
  3. Throw NO_HIVEMETASTORE error if all retries fail

This makes DataLakeCatalog with Hive backend more resilient to temporary connection issues.

@CLAassistant
Copy link

CLAassistant commented Mar 2, 2026

CLA assistant check
All committers have signed the CLA.

@aalexfvk
Copy link
Contributor

aalexfvk commented Mar 2, 2026

Could we use the following class here (maybe with extending) ?

class HiveMetastoreClient

It looks like it already provides retry logic inside.

#include <Storages/ObjectStorage/DataLakes/Iceberg/SchemaProcessor.h>
#include <Common/ProxyConfigurationResolverProvider.h>
#include <Databases/DataLake/Common.h>
# include <optional>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These indentations look unnecessary when we wrap by #if the entire module like this.

@otselnik
Copy link
Author

otselnik commented Mar 2, 2026

@aalexfvk

Storages/Hive/HiveCommon.h

It uses a client pool under the hood. It seems like overkill for the HiveCatalog. I reused the retry logic from the hive client in this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants