Skip to content

fix(table): use the snapshot's schema version for time travel reads#379

Merged
JingsongLi merged 3 commits into
apache:mainfrom
TheR1sing3un:pr/time-travel-snapshot-schema
Jun 13, 2026
Merged

fix(table): use the snapshot's schema version for time travel reads#379
JingsongLi merged 3 commits into
apache:mainfrom
TheR1sing3un:pr/time-travel-snapshot-schema

Conversation

@TheR1sing3un

@TheR1sing3un TheR1sing3un commented Jun 11, 2026

Copy link
Copy Markdown
Member

Purpose

Time travel (scan.version / scan.timestamp-millis / SQL VERSION AS OF / TIMESTAMP AS OF) previously only switched which snapshot was scanned, while the table schema, scan pruning, the read evolution target, and the DataFusion provider all kept using the latest schema — Snapshot.schemaId was never consumed on the read path. Reading an old snapshot therefore lost its historical shape: columns dropped later were invisible, and type updates were applied retroactively to historical data.

Java switches the table to the snapshot's schema in AbstractFileStoreTable.copy(dynamicOptions)tryTimeTravelschemaManager.schema(snapshot.schemaId()).copy(mergedOptions). This PR mirrors that behavior.

Brief change log

  • Add async Table::copy_with_time_travel(extra), mirroring Java copy(dynamicOptions): merge options, resolve the time-travel selector, and when the resolved snapshot has a different schema id, replace the table schema with the snapshot's schema while keeping the merged options. Resolution failures fall back silently (Java tryTimeTravel catch-all); invalid selectors still fail at scan planning, so existing error behavior is unchanged. The existing Table::copy_with_options keeps its non-traveling semantics (Java copyWithoutTimeTravel).
  • Add TableSchema::copy_with_replaced_options, matching Java TableSchema.copy(Map) (options are replaced, not merged; id/fields/keys/comment/timeMillis preserved).
  • Extract snapshot resolution from TableScan::resolve_snapshot into table::time_travel::travel_to_snapshot (Java TimeTravelUtil) and share it.
  • Wire up the DataFusion entry points: the SQLContext time-travel rewrite path, the catalog provider's dynamic-options path (SET 'paimon.scan.version'), and PaimonRelationPlanner (bridged via the existing block_on_with_runtime, since the planner hook is synchronous).
  • Reject new_write() on a time-travelled table copy. Java has no runtime guard but avoids the situation structurally — write paths always use copyWithoutTimeTravel — whereas the shared DataFusion provider here can serve both reads and INSERT, so an explicit error is safer than silently writing data shaped like the old schema. This is isolated and easy to drop if undesired.
  • For the same consistency reason, the SQL layer rejects UPDATE / DELETE / MERGE INTO / INSERT OVERWRITE / TRUNCATE while a session-level time-travel selector (SET 'paimon.scan.version' / 'paimon.scan.timestamp-millis') is active, instead of silently ignoring the selector and writing to the latest state while reads in the same session resolve through the snapshot schema.
  • The snapshot resolved by copy_with_time_travel is cached on the table copy and reused by TableScan::resolve_snapshot, so each scan of a time-travelled table doesn't re-read tag/snapshot files to resolve the same selector (invalidated whenever options change through copy_with_options).

Because the schema switch happens at the Table level, scan stats pruning, the per-file evolution target, and the provider's Arrow schema stay consistent automatically; the existing field-id based stats devolution is unaffected (files in an old snapshot always have schema_id <= the snapshot's schema id).

Tests

  • New unit tests in table::time_travel (multi-schema fixture built by persisting schema-0/schema-1 and committing one snapshot per version): schema switch by version/tag/timestamp, no-selector no-op, silent fallback for invalid/conflicting selectors, merged-options replacement semantics, write rejection, and an end-to-end read asserting the old snapshot returns only the old columns.
  • New time_travel_schema_tests in paimon-datafusion: VERSION AS OF (SQLContext path and the relation-planner path on a raw SessionContext), TIMESTAMP AS OF, SET 'paimon.scan.version' + SELECT/INSERT, and that selecting a later-added column at an old snapshot fails at planning.
  • Existing time-travel tests (conflicting/invalid selector behavior) pass unchanged.

API and Format

New public APIs: Table::copy_with_time_travel, Table::is_time_traveled, TableSchema::copy_with_replaced_options. No storage format change.

Scope notes (deliberate, follow-ups welcome):

  • Selector coverage stays the existing Rust subset (scan.version, scan.timestamp-millis); Java additionally supports scan.snapshot-id / scan.tag-name / scan.watermark / scan.timestamp.
  • FileSystemCatalog::get_table does not auto-travel selectors persisted in table options (Java FileStoreTableFactory.create does); only dynamic entry points are covered here.

Documentation

None required.

Time travel previously only switched which snapshot was scanned while the
table, scan pruning, read evolution target, and the DataFusion provider
all kept using the latest schema, so historical reads lost the historical
shape (columns dropped later were invisible, type updates were applied
retroactively).

Mirror Java AbstractFileStoreTable.copy/tryTimeTravel: the new async
Table::copy_with_time_travel merges options and, when they select a
snapshot with a different schema id, replaces the table schema with that
snapshot's schema (options stay the merged ones, matching Java
TableSchema.copy semantics via the new copy_with_replaced_options).
Resolution failures fall back silently like Java; invalid selectors still
fail at scan planning.

Snapshot resolution is extracted from TableScan::resolve_snapshot into
table::time_travel::travel_to_snapshot (Java TimeTravelUtil) and shared.

DataFusion entry points are wired up: the SQLContext time-travel rewrite,
the catalog provider's dynamic options path (SET 'paimon.scan.*'), and
PaimonRelationPlanner (bridged through block_on_with_runtime since the
planner hook is synchronous).

Writing through a time-travelled table copy is rejected explicitly; Java
avoids this structurally by using copyWithoutTimeTravel on write paths,
which the shared provider here cannot do.
- Reject UPDATE/DELETE/MERGE INTO/INSERT OVERWRITE/TRUNCATE while a
  session-level time-travel selector is active. These statements fetch
  the target table without dynamic options and would silently operate on
  the latest state while reads in the same session resolve through the
  snapshot schema (and INSERT through the provider is already rejected),
  which was confusingly inconsistent.
- Cache the snapshot resolved by copy_with_time_travel on the table copy
  and reuse it in TableScan::resolve_snapshot, so each scan of a
  time-travelled table no longer re-reads tag/snapshot files to resolve
  the same selector. The cache is invalidated whenever options change
  through copy_with_options.
- Drop the redundant selector pre-check in copy_with_time_travel;
  travel_to_snapshot already returns Ok(None) without IO when no
  selector is configured, and the if-let keeps the silent-fallback
  semantics for resolution errors.
// Java avoids this structurally (write paths use copyWithoutTimeTravel);
// here the same table copy can serve both reads and writes, so reject
// explicitly. Commit-only flows (new_commit) stay untouched.
if self.table.is_time_traveled() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard is too narrow. is_time_traveled() is only true when the resolved snapshot uses a different schema id, so writes are still allowed when scan.version / scan.timestamp-millis resolves to a snapshot with the same schema id.

That leaves ordinary INSERT able to write the latest table while the session is reading a historical snapshot. A minimal case is: create two snapshots without schema evolution, SET 'paimon.scan.version' = '1', then INSERT INTO ... succeeds.

Please reject writes whenever the table options contain a time-travel selector, not only when the table switched to a historical schema.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 05bdc13. new_write now rejects whenever the table options contain a time-travel selector (CoreOptions::try_time_travel_selector() returning Some or a conflicting-selector error), independent of whether the schema was switched — matching the SQL-layer guard semantics. Covered by test_copy_with_time_travel_same_schema_still_rejects_write (your exact scenario: two snapshots without schema evolution) and a datafusion case asserting SET 'paimon.scan.version'='2' + INSERT fails.

schema: self.schema.copy_with_options(extra),
schema_manager: self.schema_manager.clone(),
rest_env: self.rest_env.clone(),
time_traveled: self.time_traveled,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_with_options keeps time_traveled and the current schema, but clears travel_snapshot. If this is called on a table already returned by copy_with_time_travel, the result can keep the old snapshot schema while resolving a different snapshot later during scan.

That can produce a mismatched table state: historical schema from one snapshot, scan snapshot from another selector/latest. Please either make this transition impossible, reset back to the base/latest schema, or re-resolve time travel when options change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 05bdc13 by making the mismatched state fail loudly (your first option). copy_with_options now only invalidates the resolved snapshot when the merged options actually change the selector (scan.version / scan.timestamp-millis); unrelated option merges keep the snapshot/schema pair intact. If the selector is changed on a time-travelled copy, TableScan::resolve_snapshot returns an error directing to copy_with_time_travel, which re-resolves both the snapshot and the schema (re-traveling an already-travelled copy is supported and tested). Re-resolving inside copy_with_options itself wasn't viable since it is synchronous and resolution needs IO. Covered by test_changing_selector_after_travel_fails_scan.

…state

Review feedback:
- The write guard keyed off is_time_traveled(), which is only set when the
  resolved snapshot has a different schema id. A selector resolving to a
  same-schema snapshot still pins reads, so new_write now rejects whenever
  the table options contain a time-travel selector (or conflicting ones),
  matching the SQL-layer guard semantics.
- copy_with_options on a time-travelled copy kept the historical schema
  while clearing the resolved snapshot, so a later scan could resolve a
  different snapshot and evolve its files to the stale schema. The snapshot
  is now only invalidated when the merged options change the selector, and
  scanning a time-travelled copy whose snapshot was invalidated fails with
  a clear error instead; copy_with_time_travel re-resolves both.

@JingsongLi JingsongLi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 7911ecc into apache:main Jun 13, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants