[CALCITE-7620] Result of FILTER clause in window functions is incorrect#5040
[CALCITE-7620] Result of FILTER clause in window functions is incorrect#5040xuzifu666 wants to merge 3 commits into
Conversation
| * partition by ORDER BY keys. The output order is therefore not defined by | ||
| * a simple collation in the general case, so we conservatively report no | ||
| * collations. */ | ||
| public static @Nullable List<RelCollation> window(RelMetadataQuery mq, RelNode input, |
There was a problem hiding this comment.
The reason for modifying RelMdCollation.window is that the original window sorting derivation was too optimistic, which would cause the optimizer to mistakenly believe that the window output retained the input order, thus mistakenly deleting the top-level Sort.
The original implementation had the following problem:
-
Previously,
RelMdCollation.windowdirectly returnedmq.collations(input), meaning "the window operator will preserve the order of the input rows as is." However, the actual implementation ofEnumerableWindowfirst groups the rows by thePARTITION BYkey usingSortedMultiMap, and then sorts them within each group by thewindow ORDER BYkey. Therefore, the input order is not preserved; the global output order isPARTITION BY keys + ORDER BY keys, not simply the input order.
This caused the top-levelSortto be incorrectly optimized away. -
When
order by empnois written in the SQL, if the window also happens to be sorted byempno, the optimizer will mistakenly assume that the window output is globally ordered, thus deleting the top-levelEnumerableSort. The resulting output is grouped bydeptno, not sorted byempno.
| this.hints = calc.getHints(); | ||
| this.cluster = calc.getCluster(); | ||
| this.traits = calc.getTraitSet(); | ||
| this.traits = calc.getTraitSet() |
There was a problem hiding this comment.
The reason for modifying CalcRelSplitter.java is that when ProjectToWindowRule splits Calc/Project containing window functions, it passes the original node's trait set (including the contaminated collation) to the new node after splitting, causing the optimizer to incorrectly remove the top-level Sort before the window expands.
| !ok | ||
|
|
||
| # Test 3: Multiple FILTER with OVER on different aggregates | ||
| select empno, deptno, |
There was a problem hiding this comment.
another case
select ename, job, hiredate,
avg(sal) over (order by hiredate rows 3 preceding) as avg_sal,
avg(sal) filter (where job = 'MANAGER') over (order by hiredate rows 3 preceding)
as avg_mgr_sal
from emp
order by hiredate;There was a problem hiding this comment.
OK, this test has been added; the AVG_MGR_SAL field is related to this filter modification, and the data is consistent with https://onecompiler.com/postgresql/44smkpfxb
|



jira: https://issues.apache.org/jira/browse/CALCITE-7620