Bug 949088 - Duplicates returned when using findXxxByCriteria with no paging and fetchSomeCollectionField(true)
Summary: Duplicates returned when using findXxxByCriteria with no paging and fetchSome...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server
Version: 4.6
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: RHQ 4.9
Assignee: Lukas Krejci
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-05 20:56 UTC by Jay Shaughnessy
Modified: 2014-03-26 08:30 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-03-26 08:30:58 UTC
Embargoed:


Attachments (Terms of Use)

Description Jay Shaughnessy 2013-04-05 20:56:58 UTC
Duplicates returned when using findXxxByCriteria with no paging and fetchSomeCollectionField(true)

This is pretty subtle and unlikely, but possible.  It makes sense actually but is probably unexpected.

Basically, to fetch optional data we use (under the covers) a LEFT JOIN FETCH.  This works great but when the field being fetched is a Collection it can return duplicates because of standard SQL join behavior where you get the number of result rows can be the cross-product of the elements being joined.

The behavior is impl specific for JPA providers, it seems that Hibernate, when paging is involved, tries to apply some DISTINCT logic which seems to provide the desired results.

So the workaround is to use some paging to your findXxxByCriteria call. (paging is in place by default).  You can use a really large page size if you really want to pull it all back in one fetch (like Integer.MAX_SIZE).

This is basically here as reference, I don't recommend we make any changes for this, but if we did, it would probably be trying to use SELECT DISTINCT in the generated query, when we have this sort of LEFT JOIN FETCH situation.

Comment 1 Jay Shaughnessy 2013-08-29 14:22:25 UTC
Note the workaround above has some performance impact given more recent changes by lkrejci around limit queries.

There are 5 GUI classes suffering from this problem under the package:

  org.rhq.enterprise.gui.coregui.client.inventory.resource.type


Lkrejci is working on the proper fix, applying DISTINCT as needed in the Criteria Query Generator.  If that for some reason has an issue then we still need to minimally fix these classes.

Comment 2 Lukas Krejci 2013-08-29 16:13:34 UTC
Repro steps:

1) Install RHQ server
2) Connect to it using CLI and execute the following:
var crit = new ResourceTypeCriteria
crit.setPageControl(new UnlimitedPageControl)
crit.addFilterId(10001)
crit.fetchChildResourceTypes(true)
crit.fetchOperationDefinitions(true)
crit.fetchSubCategory(true)
var rts = ResourceTypeManager.findResourceTypesByCriteria(crit)

var totalSizeField = java.lang.Class.forName("org.rhq.core.domain.util.PageList").getDeclaredField("totalSize")
totalSizeField.setAccessible(true)
var declaredTotalSize = totalSizeField.get(rts);
var reportedTotalSize = rts.totalSize

assertEquals(declaredTotalSize, reportedTotalSize)
3) The above script should run without exception

Comment 3 Lukas Krejci 2013-08-29 16:19:19 UTC
Actually, a better test script for the step 2) would be this:

var crit = new ResourceTypeCriteria
crit.setPageControl(new UnlimitedPageControl)
crit.addFilterId(10001)
crit.fetchChildResourceTypes(true)
crit.fetchOperationDefinitions(true)
crit.fetchSubCategory(true)
crit.setRestriction(Criteria.Restriction.COLLECTION_ONLY)
var collectionSize = ResourceTypeManager.findResourceTypesByCriteria(crit).getTotalSize()

crit.setRestriction(Criteria.Restriction.COUNT_ONLY)
var count = ResourceTypeManager.findResourceTypesByCriteria(crit).getTotalSize()

assertEquals(collectionSize, count)

Comment 4 Lukas Krejci 2013-08-29 16:24:18 UTC
This was found quite severe, because it the remote API calls suffer from this error, too, as can be seen from the above repro steps.

The fix, as outlined above by Jay (i.e. to add DISTINCT to the SELECT clause if the query contains a LEFT JOIN FETCH), doesn't impose any performance degradation and is actually very minimal in terms of LOC changed.

commit 9a1325a4d40f17bfa469eedfc54e57d903bf0008
Author: Lukas Krejci <lkrejci>
Date:   Thu Aug 29 17:20:48 2013 +0200

    [BZ 949088] - Use DISTINCT with JOIN FETCH to avoid duplicates in criteria queries

Comment 5 Heiko W. Rupp 2014-03-26 08:30:58 UTC
Bulk closing now that 4.10 is out.

If you think an issue is not resolved, please open a new BZ and link to the existing one.


Note You need to log in before you can comment on or make changes to this bug.