Bug 111760

Summary: Category#getObjects() uses an in subquery filter
Product: [Retired] Red Hat Web Application Framework Reporter: Daniel Berrangé <berrange>
Component: otherAssignee: Vadim Nasardinov <vnasardinov>
Status: CLOSED RAWHIDE QA Contact: Jon Orris <jorris>
Severity: medium Docs Contact:
Priority: medium    
Version: nightly   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-02-11 22:41:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 102281, 113496    
Attachments:
Description Flags
implementation of Dan's suggestion
none
working patch none

Description Daniel Berrangé 2003-12-09 18:32:11 UTC
Description of problem:
From the categorization API requirements:

    * The use of inSubquery filters results SQL that is overly
      complex & thus difficult for the DB to optmize.

http://post-office.corp.redhat.com/archives/ccm-engineering-list/2003-August/msg00019.html

Removal of the need to use inSubquery filters for this was one of the
main performance aims of this work.

The new implementatino of Category#getObjects():

    public CategorizedCollection getObjects(String objectType, String
path) {
        if ( objectType == null ) {
            throw new NullPointerException("objectType is null");
        }
        final CategorizedCollection result = new CategorizedCollection
            (getSession().retrieve(objectType));
        Filter immediateChildObjects = result.addInSubqueryFilter
            (addPaths(path, ID),
             "com.arsdigita.categorization.immediateChildObjectIDs");
        immediateChildObjects.set(CATEGORY_ID, getID());
        return result;
    }



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Vadim Nasardinov 2004-01-28 22:33:40 UTC
Just to get everybody on the same page, here's the status quo.

One of the core requirements of the Categorization API RFC was to make
it possible to efficiently retrieve a category's child objects
restricted to a particular object type.  So, instead of saying, give
me all objects that are children of category X, we would say, give me
all Events that are children of X.

As discussed in
http://porter.boston.redhat.com/acs6/doc/categorization/categorization-api-RFC.html
there were two approaches for accomplishing this:

  (a) start with category X and work your way to Events; or
  (b) start with a collection of Events and work your way to category
      X.

As indicated by bug 104837 comment #3, the consensus was finally
reached to implement (b).

Given that we agreed to do (b), the inSubqueryFilter method was (and
remains to be) the only known way to implement getObjects(String
objectType).

As far as I understand, eliminating in-subquery clauses requires
changes to persistence.  I don't know whether Rafi's OQL work allows
in-subquery clauses to be rewritten as inner joins or not.

I'm going to have to defer to Archit's and Rafi's judgement about how
to proceed with this one.


Comment 2 Daniel Berrangé 2004-01-29 10:05:21 UTC
I don't think it does require changes to persistence. I've got code in
several applications that is doing this:

  DataCollection events = SessionManager.getSession.retreive(Event.TYPE);
  events.addEqualsFilter("parent.categories.id", cat.getID());
  events.addOrder("parent.categories.link.sort_key");

Which seems to work, although the SQL it generates is currently less
than optimal, since it joins to the cat_object_category_map table
twice - I guess that's what Rafi's OQL work is doing.


Comment 3 Daniel Berrangé 2004-01-29 10:05:51 UTC
s/OQL work is doing/OQL work is addressing/

Comment 4 Vadim Nasardinov 2004-01-29 12:43:32 UTC
Ok, I'll try that.  Thanks for straightening me out.


Comment 5 Vadim Nasardinov 2004-01-29 22:44:36 UTC
Created attachment 97352 [details]
implementation of Dan's suggestion

Getting CMSSuite in //cms/internal/dev/test/src/com/arsdigita/cms/ to run
proved a bit of a hassle.  Now that I got it up and running,
CategorizationSuite
in core and CategorizationTest in cms/internal all pass.  I'll commit this
patch tomorrow.


In the simple case of CategoryTest#getGetMapping(), the offending query changed

from

select
  g.group_id,
  ao.object_type,
  ao.display_name,
  ao.default_domain_class,
  p.primary_email,
  p.uri,
  g.name
from
  groups g
join
  acs_objects ao on ao.object_id = g.group_id
join
  parties p on p.party_id = g.group_id
where
  exists
    (select
       subquery_id
     from
       (select object_id as subquery_id
	from
	  (select
	      object_id
	    from
	      cat_object_category_map
	    where
	      category_id = ?) insub1) insub2
     where
       insub2.subquery_id = g.group_id)


to this:

select
  g.group_id,
  ao.object_type,
  ao.display_name,
  ao.default_domain_class,
  p.primary_email,
  p.uri,
  g.name
from
  groups g
join
  acs_objects ao on ao.object_id = g.group_id
join
  parties p on p.party_id = g.group_id
left join
  cat_object_category_map cocm on cocm.object_id = g.group_id
where
  (cocm.category_id = ?)


I haven't yet looked at the hairier query produced in CategorizationTest in 
cms/internal.

Comment 6 Vadim Nasardinov 2004-01-30 19:18:23 UTC
Created attachment 97373 [details]
working patch

The attached patch seems to work.  I'm going to hold off committing it until
Monday, unless you want it committed today.

Commit message:

	ALERT: Changed the implementation of Category#getObjects(String
	objectType,String path) to _not_ use an in-subquery filter.

	The semantics of "path" argument has changed.  Previously, you would do

	this to retrieve a collection of Articles:
	
	    category.getObjects(Article.TYPE, "parent");
	
	where "parent" is the path the bundle containing the article.
	
	After this change, you have to do this:
	
	    category.getObjects(Article.TYPE, "parent.categories.id");
	
	where "parent.categories.id" is the PDL path from Article through its
	content bundle to its category.


====================================================
Other comments:
There doesn't seem to be a way to test CategorizedObjectsList, because
it seems to only be used by OrderedCategorizedObjectsList.  The latter
isn't used anywhere.

IndexItemSelectionForm does seem to work.  (Create a couple of content
items, categorize them.  Go to "Categories" pane, select a category to
which the items were categorized, and click on "Change index item"".)

Unit tests for this feature happen to live in two separate suites.

All tests in
//core-platform/dev/test/src/com/arsdigita/categorization/CategorizationSuite.java

pass.

The following separate test also passes:
//cms/internal/dev/test/src/com/arsdigita/cms/CategorizationTest.java

The query for retrieving categorized articles changed from

|select 
|  ca.article_id,
|  ao.object_type,
|  ...
|  cp.launch_date
|from
|  cms_articles ca
|join
|  acs_objects ao on ao.object_id = ca.article_id
|join
|  vc_objects vo on vo.object_id = ca.article_id
|join
|  cms_items ci on ci.item_id = ca.article_id
|join
|  cms_pages cp on cp.item_id = ca.article_id
|where
| exists
|   (select
|      subquery_id
|    from
|      (select object_idas subquery_id
|      from
|	 (select object_id
|	  from cat_object_category_map
|	  where category_id = ?) insub1) insub2
|    where insub2.subquery_id = ci.parent_id)


to

|select
|  ca.article_id,
|  ao.object_type,
|  ..
|  cp.launch_date
|from
|  cms_articles ca
|join
|  acs_objects ao on ao.object_id = ca.article_id
|join
|  vc_objects vo on vo.object_id = ca.article_id
|join
|  cms_items ci on ci.item_id = ca.article_id
|join
|  cms_pages cp on cp.item_id = ca.article_id
|left join
|  cat_object_category_map cocm on cocm.object_id = ci.parent_id
|where
|  (cocm.category_id = ?)


Question: The "path" always ends in ".id".  Would it make sense to change the
current behavior of the patch such that instead of requiring
    category.getObjects(Article.TYPE, "parent.categories.id");

we would only require
    category.getObjects(Article.TYPE, "parent.categories");

and the getObjects method would append ".id" to "parent.categories" (or
any other path) automatically?	That would seem to be more in line with
the DRY principle.

Comment 7 Vadim Nasardinov 2004-02-03 17:01:40 UTC
Fixed on the trunk in change 40015.