[WIP] Improve performance of querying `system.jdbc.tables` for Hive, Iceberg, and Delta #24110

The parameter for specifying the maximum number of threads fetching tables ("hive.metadata.parallelism") aligns with the naming convention used in the BigQuery connector ("bigquery.metadata.parallelism"). Parallelization has been introduced in HiveMetadata rather than in specific metastore implementations, primarily to avoid reintroducing a cache storing tables for all schemas, which was removed in trinodb@cb4d168. This approach attempts to parallelize table retrieval for all metastore types, even though not all support concurrent access. Currently, only the FileHiveMetastore does not support multithreaded access, making parallelization ineffective. Question: Should we consider setting the default value of "hive.metadata.parallelism" to 1 when using the "file" metastore?

Before introducing DeltaLakeMetadata::getRelationTypes, ConnectorMetadata::getRelationTypes was used to retrieve relation types for Delta Lake. The original implementation classified all tables as RelationType.TABLE, except those with the extended relational type TRINO_VIEW, which were classified as RelationType.VIEW. This is why the resolveRelationType method was added in this commit. Question: Is this resolution necessary? Could we instead use the existing mapping between ExtendedRelationType and RelationType that's already encapsulated in RelationType?

Parallelization has been implemented at the TrinoCatalog level, rather than in IcebergMetadata, because some catalogs (e.g., Nessie) seem to support optimized table retrieval across all schemas. Currently, parallelization has been added for Glue and Hive catalogs, but it can easily be extended to other catalogs as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Improve performance of querying `system.jdbc.tables` for Hive, Iceberg, and Delta #24110

[WIP] Improve performance of querying `system.jdbc.tables` for Hive, Iceberg, and Delta #24110

Commits on Nov 12, 2024

Commits on Nov 13, 2024

[WIP] Improve performance of querying system.jdbc.tables for Hive, Iceberg, and Delta #24110

Are you sure you want to change the base?

[WIP] Improve performance of querying system.jdbc.tables for Hive, Iceberg, and Delta #24110

Commits on Nov 12, 2024

Commits on Nov 13, 2024

[WIP] Improve performance of querying `system.jdbc.tables` for Hive, Iceberg, and Delta #24110

[WIP] Improve performance of querying `system.jdbc.tables` for Hive, Iceberg, and Delta #24110