Bug 750668

Summary: Windowed functions don't work on a complex query,
Product: Red Hat Enterprise Linux 6 Reporter: Yaniv Lavi <ylavi>
Component: postgresqlAssignee: Tom Lane <tgl>
Status: CLOSED NOTABUG QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: hhorak
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-12 06:31:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
The example query. none

Description Yaniv Lavi 2011-11-01 22:59:17 UTC
Description of problem:
When you try to run a complex query with a windowed function it returns an error:

ERROR: column "v3_0_vm_interface_daily_history_view.receive_rate_percent" must appear in the GROUP BY clause or be used in an aggregate function
SQL state: 42803

It ignore the partition information.

Version-Release number of selected component (if applicable):
8.4.7

How reproducible:
Always (recreate on a rhevm history database environment)

Steps to Reproduce:
1. Run the attached query.
2. Remove the aggregate function and run again.
  
Actual results:
First failed, Second worked

Expected results:
Both should work

Additional info:
See attached query

Comment 1 Yaniv Lavi 2011-11-01 23:00:11 UTC
Created attachment 531210 [details]
The example query.

Comment 3 RHEL Program Management 2011-11-01 23:28:47 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 4 Tom Lane 2011-11-02 01:01:36 UTC
AFAICT, the bug is in the query not in Postgres.  AVG() with an OVER clause is not an aggregate function, it is a window function.  In this example, since there is a GROUP BY, the window function is supposed to act on the grouped rows, so it must have an argument expression that is well-defined for a grouped row.  And receive_rate_percent isn't well-defined, because it is not one of the grouping columns.

It is a bit unhelpful that Postgres doesn't show an error cursor position :-(.  I'll see about improving that.  But I don't believe the query is valid.

Comment 5 Yaniv Lavi 2011-11-02 15:36:31 UTC
From my experience with other databases, windowed functions are performed after the select is created in memory. this means that it doesn't matter what is the select, only the partition. 

The fact that postgres is not able to perform this select is a major disadvantage. I don't understand why postgres is not able to run aggregate functions and windowed functions at the same time. grouping should ignore windowed functions.

Comment 6 Tom Lane 2011-11-02 16:10:37 UTC
(In reply to comment #5)
> grouping should ignore windowed functions.

I'm sorry, but you're wrong, and I challenge you to find another DBMS that will execute this query as written.  The SQL standard is perfectly clear that a window function operates on the group rows when there is a GROUP BY.  In SQL:2008, section 4.14.9 Windowed tables, third paragraph:

If a <table expression> is grouped and also has a window, then there is a syntactic transformation that segregates the grouping into a <derived table>, so that the window partitions consist of rows of the <derived table> rather than groups of rows.

Comment 7 Yaniv Lavi 2011-11-02 16:20:34 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > grouping should ignore windowed functions.
> 
> I'm sorry, but you're wrong, and I challenge you to find another DBMS that will
> execute this query as written.  The SQL standard is perfectly clear that a
> window function operates on the group rows when there is a GROUP BY.  In
> SQL:2008, section 4.14.9 Windowed tables, third paragraph:
> 
> If a <table expression> is grouped and also has a window, then there is a
> syntactic transformation that segregates the grouping into a <derived table>,
> so that the window partitions consist of rows of the <derived table> rather
> than groups of rows.

But this still means the query will run on the rows according to the partition you define on the derived table. postgres fails to do this.

Comment 8 Tom Lane 2011-11-02 16:52:05 UTC
The point is that the query is attempting to refer to a column that doesn't exist in the derived grouped table.  The grouped table contains the grouping columns, the results of plain aggregate functions over each group, and nothing else.  Window functions run *after* plain aggregate functions, and are logically a separate level of query operations.  (The syntax the SQL committee chose for them makes this remarkably obscure :-().

Perhaps it would be more profitable to discuss how to write this query correctly.  I'm not sure what it is you are actually trying to get the average of?  The Rx_rate and Tx_rate columns are averages over each vm_interface_id/interface_name/history_datetime group, but what are you trying to accomplish with window_col?

Comment 9 Yaniv Lavi 2011-11-02 17:26:39 UTC
I want the average of receive_rate_percent by the id without minding the datetime.

Comment 10 Tom Lane 2011-11-02 17:55:19 UTC
(In reply to comment #9)
> I want the average of receive_rate_percent by the id without minding the
> datetime.

Hm.  Not sure if it's possible to do that in the same query that produces this other slicing of the data.

You could get an approximation to it with

avg(avg(receive_rate_percent)) OVER (PARTITION BY v3_0_vm_interface_daily_history_view.vm_interface_id)

which would produce the average of the per-datetime averages.  (Here, the inner avg() is a plain aggregate and the outer one is a window function.)  However, if the per-datetime groups contain significantly different numbers of observations (original rows), you'd be weighting the observations in the smaller groups more heavily than those in the larger groups.  Not knowing the table design, I don't know if this is a problem or not.

If that doesn't work for you then I think you need to do a bunch of sub-selects, that is

(SELECT avg(receive_rate_percent) FROM tables WHERE vm_interface_id matches the outer value) AS window_col

Comment 11 Tom Lane 2012-01-12 06:31:09 UTC
Closing -- I see no bug here.