Bug 949088

Summary: Duplicates returned when using findXxxByCriteria with no paging and fetchSomeCollectionField(true)
Product: [Other] RHQ Project Reporter: Jay Shaughnessy <jshaughn>
Component: Core ServerAssignee: Lukas Krejci <lkrejci>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: hrupp
Target Milestone: ---   
Target Release: RHQ 4.9   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-26 08:30:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jay Shaughnessy 2013-04-05 20:56:58 UTC
Duplicates returned when using findXxxByCriteria with no paging and fetchSomeCollectionField(true)

This is pretty subtle and unlikely, but possible.  It makes sense actually but is probably unexpected.

Basically, to fetch optional data we use (under the covers) a LEFT JOIN FETCH.  This works great but when the field being fetched is a Collection it can return duplicates because of standard SQL join behavior where you get the number of result rows can be the cross-product of the elements being joined.

The behavior is impl specific for JPA providers, it seems that Hibernate, when paging is involved, tries to apply some DISTINCT logic which seems to provide the desired results.

So the workaround is to use some paging to your findXxxByCriteria call. (paging is in place by default).  You can use a really large page size if you really want to pull it all back in one fetch (like Integer.MAX_SIZE).

This is basically here as reference, I don't recommend we make any changes for this, but if we did, it would probably be trying to use SELECT DISTINCT in the generated query, when we have this sort of LEFT JOIN FETCH situation.

Comment 1 Jay Shaughnessy 2013-08-29 14:22:25 UTC
Note the workaround above has some performance impact given more recent changes by lkrejci around limit queries.

There are 5 GUI classes suffering from this problem under the package:

  org.rhq.enterprise.gui.coregui.client.inventory.resource.type


Lkrejci is working on the proper fix, applying DISTINCT as needed in the Criteria Query Generator.  If that for some reason has an issue then we still need to minimally fix these classes.

Comment 2 Lukas Krejci 2013-08-29 16:13:34 UTC
Repro steps:

1) Install RHQ server
2) Connect to it using CLI and execute the following:
var crit = new ResourceTypeCriteria
crit.setPageControl(new UnlimitedPageControl)
crit.addFilterId(10001)
crit.fetchChildResourceTypes(true)
crit.fetchOperationDefinitions(true)
crit.fetchSubCategory(true)
var rts = ResourceTypeManager.findResourceTypesByCriteria(crit)

var totalSizeField = java.lang.Class.forName("org.rhq.core.domain.util.PageList").getDeclaredField("totalSize")
totalSizeField.setAccessible(true)
var declaredTotalSize = totalSizeField.get(rts);
var reportedTotalSize = rts.totalSize

assertEquals(declaredTotalSize, reportedTotalSize)
3) The above script should run without exception

Comment 3 Lukas Krejci 2013-08-29 16:19:19 UTC
Actually, a better test script for the step 2) would be this:

var crit = new ResourceTypeCriteria
crit.setPageControl(new UnlimitedPageControl)
crit.addFilterId(10001)
crit.fetchChildResourceTypes(true)
crit.fetchOperationDefinitions(true)
crit.fetchSubCategory(true)
crit.setRestriction(Criteria.Restriction.COLLECTION_ONLY)
var collectionSize = ResourceTypeManager.findResourceTypesByCriteria(crit).getTotalSize()

crit.setRestriction(Criteria.Restriction.COUNT_ONLY)
var count = ResourceTypeManager.findResourceTypesByCriteria(crit).getTotalSize()

assertEquals(collectionSize, count)

Comment 4 Lukas Krejci 2013-08-29 16:24:18 UTC
This was found quite severe, because it the remote API calls suffer from this error, too, as can be seen from the above repro steps.

The fix, as outlined above by Jay (i.e. to add DISTINCT to the SELECT clause if the query contains a LEFT JOIN FETCH), doesn't impose any performance degradation and is actually very minimal in terms of LOC changed.

commit 9a1325a4d40f17bfa469eedfc54e57d903bf0008
Author: Lukas Krejci <lkrejci>
Date:   Thu Aug 29 17:20:48 2013 +0200

    [BZ 949088] - Use DISTINCT with JOIN FETCH to avoid duplicates in criteria queries

Comment 5 Heiko W. Rupp 2014-03-26 08:30:58 UTC
Bulk closing now that 4.10 is out.

If you think an issue is not resolved, please open a new BZ and link to the existing one.