Bug 780431 (SOA-2864)

Summary: Wrong count(*) result when joining data from two models
Product: [JBoss] JBoss Enterprise SOA Platform 5 Reporter: Claudio Venturini <claudio.venturini1>
Component: EDSAssignee: Van Halbert <vhalbert>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.1.0.ER8CC: vijay.nagar
Target Milestone: ---   
Target Release: 5.1.0.CR1   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/SOA-2864
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Teiid 7.3 deployed on Jboss AS 5.1.0 GA running on Ubuntu Server 10.04 LTS, MySQL 5.1.51 running on the same machine as JBoss, SQL Server 2000 Standard SP4 running on Windows Server 2003
Last Closed: 2011-02-09 21:42:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Claudio Venturini 2011-02-01 15:58:15 UTC
project_key: SOA

count(\*) gets a wrong result when is performed in a query which joins two tables coming from two different models.

The first (named "sqlsrv") is a source model for a table stored in SQL Server 2000. The second (named "mysql") is a source model for a table stored in MySQl 5.1.

The query performs an inner join on the column "product_id", which is shared by the two tables. All records match correctly, so there should be no difference in using an inner join instead of other join types.

The query is the following:

{noformat}
SELECT
    invoice,
    invoice_date,
    code,
    sum(quantity) AS s,
    avg(price) AS av,
    count(*) AS dim
FROM
    sqlsrv.test2.dbo.sales SL
    INNER JOIN mysql.test.product_code PC ON SL.product_id = PC.product_id
GROUP BY invoice_date, invoice, code
ORDER BY invoice_date ASC, invoice ASC, code ASC
{noformat}

The result of count(\*) is 1 for all records, while for some of them it should be 2, as there are duplicate records in the sales table, which are grouped by the GROUP BY clause.

Note that the problem exists only with the inner join. Left, right and outer joins work well, even if the set of records that they produce is the same as that produced by the inner join.

I noticed the problem in Teiid 7.2. Tonight I upgraded to 7.3, but the problem is still there.
If the data are all in the same DB, the query works as expected. It is not significant in which of the two DB each table resides. I think it is neither a problem of the SQL Server connector, nor of the MySQL connector.

The data are the following:

*product_code*:
||code||category||product_id|| 
|1|1|125|
|2|1|127|
|3|1|123|
|4|1|121|
|5|1|126|
|6|1|124|
|7|1|122|

*sales*:
||invoice||invoice_date||product_id||quantity||price|| 
|009831|2009-08-15 00:00:00|125|350|1.070261|
|009831|2009-08-15 00:00:00|124|960|1.070261|
|009843|2009-08-15 00:00:00|121|648|1.515264|
|009843|2009-08-15 00:00:00|126|145|2.763902|
|009843|2009-08-15 00:00:00|126|25|2.407148|
|009855|2009-08-15 00:00:00|122|768|1.122835|
|009855|2009-08-15 00:00:00|123|540|1.158511|
|009855|2009-08-15 00:00:00|125|480|1.070261|
|009857|2009-08-15 00:00:00|122|440|1.498365|
|009857|2009-08-15 00:00:00|126|115|2.585525|
|009866|2009-08-15 00:00:00|122|736|1.498365|
|009866|2009-08-15 00:00:00|123|558|1.391339|
|009866|2009-08-15 00:00:00|125|378|1.336887|
|009866|2009-08-15 00:00:00|127|510|1.605391|
|009866|2009-08-15 00:00:00|126|435|2.585525|
|009847|2009-08-15 00:00:00|126|55|2.763902|
|009847|2009-08-15 00:00:00|126|5|2.407148|
|009847|2009-08-15 00:00:00|121|240|1.872018|

The result of the above query is:

||invoice||invoice_date||code||s||av||dim||
|009831|2009-08-15 00:00:00|1|350.0|1.070261|1|
|009831|2009-08-15 00:00:00|6|960.0|1.070261|1|
|009843|2009-08-15 00:00:00|4|648.0|1.515264|1|
|009843|2009-08-15 00:00:00|5|170.0|2.585525|1|
|009847|2009-08-15 00:00:00|4|240.0|1.872018|1|
|009847|2009-08-15 00:00:00|5|60.0|2.585525|1|
|009855|2009-08-15 00:00:00|1|480.0|1.070261|1|
|009855|2009-08-15 00:00:00|3|540.0|1.158511|1|
|009855|2009-08-15 00:00:00|7|768.0|1.122835|1|
|009857|2009-08-15 00:00:00|5|115.0|2.585525|1|
|009857|2009-08-15 00:00:00|7|440.0|1.498365|1|
|009866|2009-08-15 00:00:00|1|378.0|1.336887|1|
|009866|2009-08-15 00:00:00|2|510.0|1.605391|1|
|009866|2009-08-15 00:00:00|3|558.0|1.391339|1|
|009866|2009-08-15 00:00:00|5|435.0|2.585525|1|
|009866|2009-08-15 00:00:00|7|736.0|1.498365|1|


As you can see, count(\*) (the "dim" column) always returns 1, even if it should return 2 for the product with id 126 in invoices 009843 and 009847.

If needed, I can provide you the two source models, and a dump of the two DBs.

Comment 1 Van Halbert 2011-02-01 15:58:15 UTC
Link: Added: This issue Cloned from TEIID-1458


Comment 2 Van Halbert 2011-02-01 15:58:58 UTC
Security: Added: Public


Comment 4 Paul Nittel 2011-02-09 21:42:11 UTC
Tested with ER9 and closed.

Comment 5 vijay nagar 2012-11-19 14:04:30 UTC
Still having this issue on the following setup:

TEIID= 7.7.0
JBOSS= 5.1.0.GA

The query is sourced from two sql server databases, the join and the nested query give different results. The nested query result matches with while running the inner join query.

INNER JOIN QUERY:WRONG COUNT
SELECT s.CUSIP FROM "NewMixedModel"."Tranche" T, 
"NewMixedModel"."BondStatic" S 
where T.CUSIP = S.CUSIP 

NESTED QUERY: CORRECT DATA BUT TOO SLOW
select count(CUSIP) FROM "NewMixedModel"."BondStatic" 
where cusip in (
select cusip from "NewMixedModel"."Tranche")


Also created a new issue that describes the same.
https://bugzilla.redhat.com/show_bug.cgi?id=878038