fix(table): use the snapshot's schema version for time travel reads#379
Conversation
Time travel previously only switched which snapshot was scanned while the table, scan pruning, read evolution target, and the DataFusion provider all kept using the latest schema, so historical reads lost the historical shape (columns dropped later were invisible, type updates were applied retroactively). Mirror Java AbstractFileStoreTable.copy/tryTimeTravel: the new async Table::copy_with_time_travel merges options and, when they select a snapshot with a different schema id, replaces the table schema with that snapshot's schema (options stay the merged ones, matching Java TableSchema.copy semantics via the new copy_with_replaced_options). Resolution failures fall back silently like Java; invalid selectors still fail at scan planning. Snapshot resolution is extracted from TableScan::resolve_snapshot into table::time_travel::travel_to_snapshot (Java TimeTravelUtil) and shared. DataFusion entry points are wired up: the SQLContext time-travel rewrite, the catalog provider's dynamic options path (SET 'paimon.scan.*'), and PaimonRelationPlanner (bridged through block_on_with_runtime since the planner hook is synchronous). Writing through a time-travelled table copy is rejected explicitly; Java avoids this structurally by using copyWithoutTimeTravel on write paths, which the shared provider here cannot do.
- Reject UPDATE/DELETE/MERGE INTO/INSERT OVERWRITE/TRUNCATE while a session-level time-travel selector is active. These statements fetch the target table without dynamic options and would silently operate on the latest state while reads in the same session resolve through the snapshot schema (and INSERT through the provider is already rejected), which was confusingly inconsistent. - Cache the snapshot resolved by copy_with_time_travel on the table copy and reuse it in TableScan::resolve_snapshot, so each scan of a time-travelled table no longer re-reads tag/snapshot files to resolve the same selector. The cache is invalidated whenever options change through copy_with_options. - Drop the redundant selector pre-check in copy_with_time_travel; travel_to_snapshot already returns Ok(None) without IO when no selector is configured, and the if-let keeps the silent-fallback semantics for resolution errors.
| // Java avoids this structurally (write paths use copyWithoutTimeTravel); | ||
| // here the same table copy can serve both reads and writes, so reject | ||
| // explicitly. Commit-only flows (new_commit) stay untouched. | ||
| if self.table.is_time_traveled() { |
There was a problem hiding this comment.
This guard is too narrow. is_time_traveled() is only true when the resolved snapshot uses a different schema id, so writes are still allowed when scan.version / scan.timestamp-millis resolves to a snapshot with the same schema id.
That leaves ordinary INSERT able to write the latest table while the session is reading a historical snapshot. A minimal case is: create two snapshots without schema evolution, SET 'paimon.scan.version' = '1', then INSERT INTO ... succeeds.
Please reject writes whenever the table options contain a time-travel selector, not only when the table switched to a historical schema.
There was a problem hiding this comment.
Fixed in 05bdc13. new_write now rejects whenever the table options contain a time-travel selector (CoreOptions::try_time_travel_selector() returning Some or a conflicting-selector error), independent of whether the schema was switched — matching the SQL-layer guard semantics. Covered by test_copy_with_time_travel_same_schema_still_rejects_write (your exact scenario: two snapshots without schema evolution) and a datafusion case asserting SET 'paimon.scan.version'='2' + INSERT fails.
| schema: self.schema.copy_with_options(extra), | ||
| schema_manager: self.schema_manager.clone(), | ||
| rest_env: self.rest_env.clone(), | ||
| time_traveled: self.time_traveled, |
There was a problem hiding this comment.
copy_with_options keeps time_traveled and the current schema, but clears travel_snapshot. If this is called on a table already returned by copy_with_time_travel, the result can keep the old snapshot schema while resolving a different snapshot later during scan.
That can produce a mismatched table state: historical schema from one snapshot, scan snapshot from another selector/latest. Please either make this transition impossible, reset back to the base/latest schema, or re-resolve time travel when options change.
There was a problem hiding this comment.
Fixed in 05bdc13 by making the mismatched state fail loudly (your first option). copy_with_options now only invalidates the resolved snapshot when the merged options actually change the selector (scan.version / scan.timestamp-millis); unrelated option merges keep the snapshot/schema pair intact. If the selector is changed on a time-travelled copy, TableScan::resolve_snapshot returns an error directing to copy_with_time_travel, which re-resolves both the snapshot and the schema (re-traveling an already-travelled copy is supported and tested). Re-resolving inside copy_with_options itself wasn't viable since it is synchronous and resolution needs IO. Covered by test_changing_selector_after_travel_fails_scan.
…state Review feedback: - The write guard keyed off is_time_traveled(), which is only set when the resolved snapshot has a different schema id. A selector resolving to a same-schema snapshot still pins reads, so new_write now rejects whenever the table options contain a time-travel selector (or conflicting ones), matching the SQL-layer guard semantics. - copy_with_options on a time-travelled copy kept the historical schema while clearing the resolved snapshot, so a later scan could resolve a different snapshot and evolve its files to the stale schema. The snapshot is now only invalidated when the merged options change the selector, and scanning a time-travelled copy whose snapshot was invalidated fails with a clear error instead; copy_with_time_travel re-resolves both.
Purpose
Time travel (
scan.version/scan.timestamp-millis/ SQLVERSION AS OF/TIMESTAMP AS OF) previously only switched which snapshot was scanned, while the table schema, scan pruning, the read evolution target, and the DataFusion provider all kept using the latest schema —Snapshot.schemaIdwas never consumed on the read path. Reading an old snapshot therefore lost its historical shape: columns dropped later were invisible, and type updates were applied retroactively to historical data.Java switches the table to the snapshot's schema in
AbstractFileStoreTable.copy(dynamicOptions)→tryTimeTravel→schemaManager.schema(snapshot.schemaId()).copy(mergedOptions). This PR mirrors that behavior.Brief change log
Table::copy_with_time_travel(extra), mirroring Javacopy(dynamicOptions): merge options, resolve the time-travel selector, and when the resolved snapshot has a different schema id, replace the table schema with the snapshot's schema while keeping the merged options. Resolution failures fall back silently (JavatryTimeTravelcatch-all); invalid selectors still fail at scan planning, so existing error behavior is unchanged. The existingTable::copy_with_optionskeeps its non-traveling semantics (JavacopyWithoutTimeTravel).TableSchema::copy_with_replaced_options, matching JavaTableSchema.copy(Map)(options are replaced, not merged; id/fields/keys/comment/timeMillis preserved).TableScan::resolve_snapshotintotable::time_travel::travel_to_snapshot(JavaTimeTravelUtil) and share it.SQLContexttime-travel rewrite path, the catalog provider's dynamic-options path (SET 'paimon.scan.version'), andPaimonRelationPlanner(bridged via the existingblock_on_with_runtime, since the planner hook is synchronous).new_write()on a time-travelled table copy. Java has no runtime guard but avoids the situation structurally — write paths always usecopyWithoutTimeTravel— whereas the shared DataFusion provider here can serve both reads and INSERT, so an explicit error is safer than silently writing data shaped like the old schema. This is isolated and easy to drop if undesired.SET 'paimon.scan.version'/'paimon.scan.timestamp-millis') is active, instead of silently ignoring the selector and writing to the latest state while reads in the same session resolve through the snapshot schema.copy_with_time_travelis cached on the table copy and reused byTableScan::resolve_snapshot, so each scan of a time-travelled table doesn't re-read tag/snapshot files to resolve the same selector (invalidated whenever options change throughcopy_with_options).Because the schema switch happens at the
Tablelevel, scan stats pruning, the per-file evolution target, and the provider's Arrow schema stay consistent automatically; the existing field-id based stats devolution is unaffected (files in an old snapshot always haveschema_id <=the snapshot's schema id).Tests
table::time_travel(multi-schema fixture built by persistingschema-0/schema-1and committing one snapshot per version): schema switch by version/tag/timestamp, no-selector no-op, silent fallback for invalid/conflicting selectors, merged-options replacement semantics, write rejection, and an end-to-end read asserting the old snapshot returns only the old columns.time_travel_schema_testsin paimon-datafusion:VERSION AS OF(SQLContext path and the relation-planner path on a rawSessionContext),TIMESTAMP AS OF,SET 'paimon.scan.version'+ SELECT/INSERT, and that selecting a later-added column at an old snapshot fails at planning.API and Format
New public APIs:
Table::copy_with_time_travel,Table::is_time_traveled,TableSchema::copy_with_replaced_options. No storage format change.Scope notes (deliberate, follow-ups welcome):
scan.version,scan.timestamp-millis); Java additionally supportsscan.snapshot-id/scan.tag-name/scan.watermark/scan.timestamp.FileSystemCatalog::get_tabledoes not auto-travel selectors persisted in table options (JavaFileStoreTableFactory.createdoes); only dynamic entry points are covered here.Documentation
None required.