refactor(bigquery-jdbc): optimize thread management and unify discovery logic in DatabaseMetaData methods#13560
refactor(bigquery-jdbc): optimize thread management and unify discovery logic in DatabaseMetaData methods#13560keshavdandeva wants to merge 10 commits into
DatabaseMetaData methods#13560Conversation
…ry logic in `DatabaseMetaData` methods
There was a problem hiding this comment.
Code Review
This pull request refactors BigQueryDatabaseMetaData.java to clean up unused imports, correct catalog null-checks, consolidate dataset fetching logic into a helper method, and optimize getSchemas with a synchronous single-catalog path and a parallel multi-catalog path. The review feedback highlights several critical improvements: ensuring submitted futures are cancelled in the finally block of getSchemas to prevent resource leaks, wrapping project scans in a try-catch block to make sequential dataset fetching robust against single-project failures, restoring the thread's interrupted status when catching InterruptedException, and replacing an unnecessary Collections.synchronizedList with a plain ArrayList.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors metadata fetching in BigQueryDatabaseMetaData by consolidating dataset fetching into a helper method, introducing a synchronous path for single-catalog queries in getSchemas, and improving error handling. It also adds a null check in BigQueryJsonResultSet when cancelling tasks. The review comments point out three critical issues: a potential deadlock in the synchronous path of getSchemas when the schema count exceeds the bounded queue capacity, a compilation error due to a missing parameter in a BigQueryJsonResultSet.of call, and swallowed InterruptedException along with silent failures in the new fetchMatchingDatasets helper.
| private List<Dataset> fetchMatchingDatasets( | ||
| String catalog, String schemaPattern, Pattern schemaRegex) throws SQLException { | ||
| List<String> projects = | ||
| (catalog != null) ? Collections.singletonList(catalog) : getAccessibleCatalogNames(); |
There was a problem hiding this comment.
catalog="" is a special case, I think in BQ it should return an empty list since in BQ all datasets belong to some project.
There was a problem hiding this comment.
Yeah, this is handled in all metadata methods already. We check if ((catalog != null && catalog.isEmpty()) and return empty resultset. So, any catalog with "" would not reach here
There was a problem hiding this comment.
On a side note, while confirming this, I found determineEffectiveCatalogAndSchema method that is used to help the connection property isFilterTablesOnDefaultDataset is not exactly optimized/working as it should. Will refactor it in separate PR
| Thread.currentThread().interrupt(); | ||
| break; | ||
| } catch (Exception e) { | ||
| if (catalog != null) { |
There was a problem hiding this comment.
Why special case for this one? I don't see anything about it in spec, so 2 scenarios in my mind:
- Service account has some access to projectA, but no BQ access to list datasets..
- call getSchemas(null, null) -> returns empty result, no exception
- call getSchemas(projectA, null) -> throws exception..
So I think we should unify the behavior.
Spec is saying only "SQLException - if a database access error occurs").. So I think we can throw exception unless we were able to successfully fetch all data.
There was a problem hiding this comment.
Yeah, so the reason I did it like this was for the case when projectDiscovery is enabled. Basically:
- Targeted queries (
catalog != null): Fail-fast and throw aSQLExceptionon failure. - Discovery queries (
catalog == null): Resilient/best-effort. We skip individual project failures (e.g. permission/billing issues on sandbox/test projects) with a warning log, and return results from other accessible projects. This prevents a single inactive project from breaking the entire metadata call when using user credentials (ADC).
To unify this, which approach would you prefer we take?
- Fully Unified (Fail-Fast): Always throw
SQLExceptionif any project fails during the scan, regardless of whether it was a targeted or discovery query. (as you suggested above) - Conditional Hybrid (Project Discovery Only): Only use the best-effort model (log warning and skip) if
EnableProjectDiscovery=trueIf discovery is disabled, any failure on the primary catalog or manually configuredAdditionalProjectswill throw aSQLException
b/520407325
b/520406763
This PR refactors and optimizes the database metadata retrieval methods in the BigQuery JDBC driver. It resolves thread management inefficiencies, eliminates duplicate API blocks, ensures consistent project discovery support, and introduces consistent error propagation across all asynchronous metadata methods.
Key Changes
1. Catalog-Based Routing in
getSchemasgetSchemasnow executes completely synchronously on the calling thread, eliminating background executor and queue overhead.2. Unified Dataset Discovery & Deduplication
fetchMatchingDatasetsto serve as the sole entry point for listing and filtering datasets.getSchemas,getTables,getColumns,getProcedures,getProcedureColumns,getFunctions, andgetFunctionColumns), ensuring consistent support for project discovery and SQL wildcard matching across all of them.3. Robust Background Error Propagation & Deduplication
SQLException.