refactor(bigquery-jdbc): optimize thread management and unify discovery logic in `DatabaseMetaData` methods by keshavdandeva · Pull Request #13560 · googleapis/google-cloud-java

keshavdandeva · 2026-06-26T16:08:57Z

b/520407325
b/520406763

This PR refactors and optimizes the database metadata retrieval methods in the BigQuery JDBC driver. It resolves thread management inefficiencies, eliminates duplicate API blocks, ensures consistent project discovery support, and introduces consistent error propagation across all asynchronous metadata methods.

Key Changes

1. Catalog-Based Routing in `getSchemas`

Synchronous Single-Catalog Path: If a specific catalog (project) is requested, getSchemas now executes completely synchronously on the calling thread, eliminating background executor and queue overhead.
Parallel Multi-Catalog Path: If no catalog is specified, the query executor runs parallel scans across all accessible projects (primary, additional, and discovered) in the background.

2. Unified Dataset Discovery & Deduplication

Introduced a single, shared helper method fetchMatchingDatasets to serve as the sole entry point for listing and filtering datasets.
Deduplicated dataset-listing logic across 7 metadata methods (getSchemas, getTables, getColumns, getProcedures, getProcedureColumns, getFunctions, and getFunctionColumns), ensuring consistent support for project discovery and SQL wildcard matching across all of them.

3. Robust Background Error Propagation & Deduplication

Refactored catch blocks across all 7 asynchronous metadata methods to consistently capture unexpected background thread exceptions and write them to the result set queue. This prevents silent empty-result failures, ensuring that any backend or network errors during metadata scans are properly propagated to the client as a SQLException.

…ry logic in `DatabaseMetaData` methods

gemini-code-assist

Code Review

This pull request refactors BigQueryDatabaseMetaData.java to clean up unused imports, correct catalog null-checks, consolidate dataset fetching logic into a helper method, and optimize getSchemas with a synchronous single-catalog path and a parallel multi-catalog path. The review feedback highlights several critical improvements: ensuring submitted futures are cancelled in the finally block of getSchemas to prevent resource leaks, wrapping project scans in a try-catch block to make sequential dataset fetching robust against single-project failures, restoring the thread's interrupted status when catching InterruptedException, and replacing an unnecessary Collections.synchronizedList with a plain ArrayList.

keshavdandeva · 2026-06-26T17:15:26Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors metadata fetching in BigQueryDatabaseMetaData by consolidating dataset fetching into a helper method, introducing a synchronous path for single-catalog queries in getSchemas, and improving error handling. It also adds a null check in BigQueryJsonResultSet when cancelling tasks. The review comments point out three critical issues: a potential deadlock in the synchronous path of getSchemas when the schema count exceeds the bounded queue capacity, a compilation error due to a missing parameter in a BigQueryJsonResultSet.of call, and swallowed InterruptedException along with silent failures in the new fetchMatchingDatasets helper.

logachev · 2026-06-29T23:17:27Z

+  private List<Dataset> fetchMatchingDatasets(
+      String catalog, String schemaPattern, Pattern schemaRegex) throws SQLException {
+    List<String> projects =
+        (catalog != null) ? Collections.singletonList(catalog) : getAccessibleCatalogNames();


catalog="" is a special case, I think in BQ it should return an empty list since in BQ all datasets belong to some project.

Yeah, this is handled in all metadata methods already. We check if ((catalog != null && catalog.isEmpty()) and return empty resultset. So, any catalog with "" would not reach here

On a side note, while confirming this, I found determineEffectiveCatalogAndSchema method that is used to help the connection property isFilterTablesOnDefaultDataset is not exactly optimized/working as it should. Will refactor it in separate PR

logachev · 2026-06-29T23:56:20Z

+        Thread.currentThread().interrupt();
+        break;
+      } catch (Exception e) {
+        if (catalog != null) {


Why special case for this one? I don't see anything about it in spec, so 2 scenarios in my mind:

Service account has some access to projectA, but no BQ access to list datasets..

call getSchemas(null, null) -> returns empty result, no exception

call getSchemas(projectA, null) -> throws exception..

So I think we should unify the behavior.
Spec is saying only "SQLException - if a database access error occurs").. So I think we can throw exception unless we were able to successfully fetch all data.

Yeah, so the reason I did it like this was for the case when projectDiscovery is enabled. Basically:

Targeted queries (catalog != null): Fail-fast and throw a SQLException on failure.

Discovery queries (catalog == null): Resilient/best-effort. We skip individual project failures (e.g. permission/billing issues on sandbox/test projects) with a warning log, and return results from other accessible projects. This prevents a single inactive project from breaking the entire metadata call when using user credentials (ADC).

To unify this, which approach would you prefer we take?

Fully Unified (Fail-Fast): Always throw SQLException if any project fails during the scan, regardless of whether it was a targeted or discovery query. (as you suggested above)

Conditional Hybrid (Project Discovery Only): Only use the best-effort model (log warning and skip) if EnableProjectDiscovery=true If discovery is disabled, any failure on the primary catalog or manually configured AdditionalProjects will throw a SQLException

keshavdandeva added 2 commits June 26, 2026 16:07

refactor(bigquery-jdbc): optimize thread management and unify discove…

4a55013

…ry logic in `DatabaseMetaData` methods

Merge branch 'main' into jdbc/refactor-metadata-methods

9682263

gemini-code-assist Bot reviewed Jun 26, 2026

View reviewed changes

keshavdandeva added 4 commits June 26, 2026 16:22

chore: address pr feedback

3da2740

chore: move error handling to helper method

50e4ec5

fix NPE for getSchemas sync call

9ced828

lint

8b9d901

gemini-code-assist Bot reviewed Jun 26, 2026

View reviewed changes

keshavdandeva added 2 commits June 26, 2026 17:39

throw error when catalog is not null

0aacf50

fix failing IT

ed113d6

keshavdandeva marked this pull request as ready for review June 26, 2026 17:47

keshavdandeva requested review from a team as code owners June 26, 2026 17:47

keshavdandeva requested review from Neenu1995 and logachev June 26, 2026 17:47

keshavdandeva added 2 commits June 26, 2026 18:46

fix failing tests

aba306f

fix error propagation

ec24a06

logachev reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(bigquery-jdbc): optimize thread management and unify discovery logic in `DatabaseMetaData` methods#13560

refactor(bigquery-jdbc): optimize thread management and unify discovery logic in `DatabaseMetaData` methods#13560
keshavdandeva wants to merge 10 commits into
mainfrom
jdbc/refactor-metadata-methods

keshavdandeva commented Jun 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

keshavdandeva commented Jun 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

logachev Jun 29, 2026

Uh oh!

keshavdandeva Jun 30, 2026

Uh oh!

keshavdandeva Jun 30, 2026

Uh oh!

logachev Jun 29, 2026

Uh oh!

keshavdandeva Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

keshavdandeva commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

1. Catalog-Based Routing in getSchemas

2. Unified Dataset Discovery & Deduplication

3. Robust Background Error Propagation & Deduplication

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

keshavdandeva commented Jun 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

logachev Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

keshavdandeva Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

keshavdandeva Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

logachev Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

keshavdandeva Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keshavdandeva commented Jun 26, 2026 •

edited

Loading

1. Catalog-Based Routing in `getSchemas`