Bug 109718

Summary: Hierarchy denormalization is broken for all live items
Product: [Retired] Red Hat Enterprise CMS Reporter: Daniel Berrangé <berrange>
Component: otherAssignee: Vadim Nasardinov <vnasardinov>
Status: CLOSED RAWHIDE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: nightlyCC: akananov, archit.shah, bche, clasohm, mikeb, richardl
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-04-06 14:59:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 100952, 106481, 106597, 113496    
Attachments:
Description Flags
Log4j output for test case program
none
Rows from cms_items table resulting from test case program
none
Test case program to illustrate broken denormalization
none
Replacement denormalization in plpgsql none

Description Daniel Berrangé 2003-11-11 11:13:51 UTC
Description of problem:

P4 changes to troika (37880) and dev (37881) for bug 108811 have
merely reverted a previous patch to content items from p4 33634.
Thus all live items now once again have a broken hierarchy
denormalization. This problem is due to isPropertyModified(PARENT)
returning false, even when the property in question was modified prior
to the object being saved.

Note the following paragraph in the commit message from the original fix:

  "Only wierd thing left is that isPropertyModified(PARENT) is
   returning 'false' most of the time, even though the setParent
   method has been called on the item....."

Also note a similar comment in the code:

// XXX why doesn't isPropertyModified report that PARENT is modified?
/*
         if (get(ANCESTORS) == null ||
             isPropertyModified(PARENT)) {
             setAncestors(getParent());
         }
*/




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create an item 'foo'
2. Publish item 'foo'
3. Look at the ancestors field in cms_items:
  
Actual results:
dan_rickshaw=# select item_id, parent_id, name, version, language,
ancestors from cms_items where name = 'foo' or name = '/' order by
version, ancestors;
 item_id | parent_id | name | version | language |    ancestors    
---------+-----------+------+---------+----------+-----------------
      60 |           | /    | draft   |          | 60/
   12002 |        60 | foo  | draft   |          | 60/12002/
   12001 |     12002 | foo  | draft   | en       | 60/12002/12001/
   13003 |     13004 | foo  | live    | en       | 13003/
   13005 |           | /    | live    |          | 13005/
   13004 |     13005 | foo  | live    |          | 13005/13004/
(7 rows)

dan_rickshaw=# 

Note, that the live item instance (item_id 13003) has an ancestors
property of '13003' when it should in fact be '13005/13004/13003'

Expected results:

Here is an example of how it worked, prior to p437881:

dan_rickshaw=# select item_id, parent_id, name, version, language,
ancestors from cms_items where name = 'eek' or name = '/' order by
version, ancestors;
 item_id | parent_id | name | version | language |     ancestors      
---------+-----------+------+---------+----------+--------------------
      60 |           | /    | draft   |          | 60/
   18002 |        60 | eek  | draft   |          | 60/18002/
   18001 |     18002 | eek  | draft   | en       | 60/18002/18001/
   13005 |           | /    | live    |          | 13005/
   19004 |     13005 | eek  | live    |          | 13005/19004/
   19003 |     19004 | eek  | live    | en       | 13005/19004/19003/
(7 rows)



Additional info:

NB, correct data in the ancestors property is critical for the
ContentItem#getPath() method to work correctly.

Comment 1 Richard Li 2003-11-11 12:13:13 UTC
mike, carsten: both you have tested the patch we applied to troika in
starwood and francetv production. any comment?

Comment 2 Daniel Berrangé 2003-11-11 12:24:40 UTC
To see if the problem is affecting your system, create & publish an
item and then compare the 'ancestors' property for the live & draft items.

Comment 3 Daniel Berrangé 2003-11-11 12:34:24 UTC
I have thought about the problem some more & /think/ (ie i haven't
written code to verify this hypothesis) that the underlying problem is
as follows:

 * There are two separate associations represented by the same
underlying DB mapping:

    -> 'parent' is the generic content item -> container association
that has been present since year dot.
    -> 'instances' is a new association from content bundle ->
language instances that uses the same DB table/field to represent the
data.

 * In draft items we are explicitly setting the parent on the language
instance, so isPropertyModified(PARENT) returns true & everything
works as normal.
 * In live items, the VersionCopier ends up setting the 'instances'
association on the content bundle instead, so
isPropertyModified(PARENT) on the instance will never pick up the change.

Not sure what the easiest way to solve this is - probably removing all
code that sets the 'instances' association & making it set the
'parent' association instead. Or maybe we stuff something in
beforeSave on the ContentBundle to detect when 'instances' is modified
and update the denormalization from there (but this kind of breaks the
O-O encapsulation, since denorm should be a private impl detail).



Comment 4 Mike Bonnet 2003-11-11 16:08:24 UTC
I can't find any code that's actually modifying the "instances"
association.  ContentBundle.copyProperty() ignores "instances", and
the add/removeInstances() methods call setParent() on the instance. 
The problem may lie elsewhere.

Comment 5 Carsten Clasohm 2003-11-12 10:39:19 UTC
The problem is not that "instances" is set instead of "parent". During
publishing, the article and not the bundle is the starting point, and
the VersionCopier copies the "parent" property of the article.

But then persistence saves first the bundle and then the article. I
didn't analyze what exactly happens there, but I guess that when the
bundle is saved, persistence clears the isModified flag for all
properties which are based on the parent_id column. So when
beforeSave() is called for the article afterwards, isPropertyModified
returns false.

For FTVI, I fixed this with changelist 37918, which extends the code
dealing with ContentBundle parents in ContentItem.copyProperty() a bit.

I have successfully tested this with publication, and with adding a
new language instance to a bundle using the standard UI.

Comment 6 Carsten Clasohm 2003-11-13 07:39:23 UTC
I have noticed that the ancestors field was still broken when one
copied an item from one folder to another.

So I have changed ContentItem.copyProperty() to not do anything and
return true when the parent property is copied outside of a publishing
operation, and beforeSave() to detect and fix cases where ancestors
only contains the item's ID and the parent is not null. Besides
copying between folders, this also happens when one adds a new
language instance to a bundle.

The changelist is 37951.


Comment 7 Daniel Berrangé 2003-11-13 09:26:59 UTC
I just can't help thinking we'd be better off if the denormalization
were maintained with a PL/SQL trigger, since then it wouldn't matter
what object updated the PARENT property, or from what end the update
happened. The PL/SQL would trivially be able to detect all changes to
the parent_id field in cms_items. 

Comment 8 Jon Orris 2003-11-14 03:39:33 UTC
Per Archit's suggestion, I've rolled back Archit's change @37995.

A 'full' fix to this problem is still complicated & elusive, but
should be available for the Dec 5th deliverable.


Comment 9 Archit Shah 2003-12-11 15:47:42 UTC
The full fix is implemented on dev (38422, 38644) and 6.0.x (37880,
38410, 38414, 38641)

Comment 10 Daniel Berrangé 2004-01-15 12:49:05 UTC
This is still fubar.

The order in which the ancestors property is subject to race
conditions which are corrupting the denormalization. Consider you have
a hierarchy of 3 objects, say Folder, Bundle, and Article. This is
setup as follows:

                    Folder root = content.getRootFolder();

                    Folder fol = new Folder();
                    fol.setParent(root);
                    fol.setName("foo");
                    fol.setLabel("Foo");
                    fol.save();
                    
                    
                    Article art = new Article();
                    art.setLanguage("en");
                    art.setName("bar");
                    art.setTitle("Bar");
                    
                    ContentBundle bundle = new ContentBundle(art);
                    bundle.setParent(fol);
                    bundle.setContentSection(content);

Now, the root folder has ancestors 77/, so when the Folder foo is
created it gets its ancestors set to '77/49003/', which is correct.

Now, we create the Article, this gets id 49004. 
Next up the Bundle is creating, getting allocated it 49005. The
constructor of the bundle sets the parent of the primary instance
passed in.

This triggers an update of the 'ancestors' property in the Article.
Since the bundle has no parent yet, its 'ancestors' property hasn't
been initialized and so the 'ancestors' for the article is set to
'null/46004' rather than '46005/46004/'. The HierarchyDenormalization
observer records the old value '<null>' and the new value 'null/46004'

The after save event of the HierarchyDenormalization on Article is now
run, and since the old value was null, it is a no-op.

Ignoring that small error for now, lets continue. We now set the
parent of the Bundle to be the folder. This means that the bundle's
'ancestors' property is updated to '77/46003/46005/'. The
HierarchyDenormalization observers recorsd the old value '<null>' and
the new value '77/46003/46005/'.

The after save even of the HiearchyDernormalization on BUndle is now
run, and since the old value was null, it is a no-op too.

So in the database now we have:

  Folder: /77/46003/
  Bundle: /77/46003/46005
  Article: null/46004/


Now, back to that error wrt to null/46004. The setAncestors method on
ContentItem does essentially: (I've changed this code slightly from
what's actually there for better illustration - no functional
diffenrence though): 

    String parentAncestors = parent.get(ANCESTORS)
    String newAncestors = parentAncestors + / + getID();

THe get(ANCESTORS) method is returning null because the Bundle hasn't
had a parent set yet. I can refactor this so if the ANCESTORS of the
parent is null, it recursively sets it.



   private getAncetsors() {
    String ancestors = get(ANCESTORS);
    if (ancestors == null) {
       setAncestors(getParent());
       ancestors = get(ANCESTORS);
    }
    return ancestors;
   }

NOw the setAncestors method can be changed to use this:

   String parentAncestors = parent.getAncestors();
   String newAncestors = parentAncestors + / + getID();


If I re-run the above example, we now get closer to desired behaviour:

So in the database now we have:

  Folder: /77/46003/
  Bundle: /77/46003/46005
  Article: /46005/46004/

But the denormalization for Article is still fubar.

Basically the problem is that the HierarchyDenormalization has an
implicit assumption that objects are all created and saved top-down.
ie create folder, save it, create bundle, set bundle parent, save
bundle, create item, set item parent, save item.

This is a bogus assumption and hence the article denormlaization  is
getting screwed up.

This seems to be related to the way HierarchyDenormlalizaiton recorsds
new & old values of the ancestors acttribute:

       if (!m_isModified) {
          m_oldAttributeValue = (String) old_value;
          m_newAttributeValue = (String) new_value;
          m_isModified = true;
       } else {
          m_newAttributeValue = (String) new_value;
       }

In the above example, bundle starts out with 'null' and one the first
change is set to '46005/'. The article is created based on this value.
Then when content bundle has its parent set to the folder, this code
records the new value '77/46003/46005/', the old value is left
unchanged as 'null'.

The parent of an item could change several times during a session,
correspondingly updating the ancestors property. The trouble is that
the data operation only updates the denormalization of children
created based on the initial value. Any items created based on one of
the intermediate values are left unchanged.

Frankly I don't think it is possible to reliably track and update this
denormalization using this current hybrid Java/data operation
approach. We can trivially remove all potential for these race
conditions by maintaining the denormalization in PL/SQL.

I'm attaching a bunch of files helpful in analysing and reproducing
the problem. 


Comment 11 Daniel Berrangé 2004-01-15 12:52:53 UTC
Created attachment 97018 [details]
Log4j output for test case program

Comment 12 Daniel Berrangé 2004-01-15 12:53:14 UTC
Created attachment 97019 [details]
Rows from cms_items table resulting from test case program

Comment 13 Daniel Berrangé 2004-01-15 12:53:44 UTC
Created attachment 97020 [details]
Test case program to illustrate broken denormalization

Comment 14 Daniel Berrangé 2004-01-15 15:25:10 UTC
Created attachment 97029 [details]
Replacement denormalization in plpgsql

Comment 15 Daniel Berrangé 2004-01-15 15:27:21 UTC
I've just attached a file providing a sample replacement
denormalization written in pl/pgsql instead of Java/SQL. I've
subjected it to the following tests so far with no problems:

  * Creating new item
  * Publishing item
  * Creating new language instance
  * Moving item
  * Creating deep subfolders
  * Moving top level folders
  * Deleting items


I called the column 'pl_ancestors' so it can be run in parallel with
existing denormalziation for testing.

Comment 16 Vadim Nasardinov 2004-03-01 16:04:38 UTC
I'm taking this item off Archit's plate for a couple of days, as he is
up to his ass in alligators at the moment.


Comment 17 Vadim Nasardinov 2004-03-01 16:05:17 UTC
I think we need to gain a little more clarity on this ticket before
committing to any particular short- or long-term solution.

Let's take a few moments to look at the bigger picture, before zeroing
back in on the specific problem with the CMS_ITEMS.ANCESTORS column.

  Definition.  A denormalized column is one whose data can be
     recreated from other columns and tables in the same schema, even
     if the denormalized column were accidentally or intentionally
     cleared.

Broadly speaking, there are three possible ways to maintain a
denormalization:

 (a) maintain the denormalized column entirely in PL/SQL or PL/pgSQL;

 (b) maintain the denormalization entirely in Java;

 (c) some combination of (a) and (b)

Let's compare the cons and pros of (a) and (b), excluding (c) from
consideration for the moment, since it is likely to combine the best
features of (a) and (b), while exluding their worst ones.


  Pros of (a):
    * it gives us the highest degree of confidence in the correctness
      of denormalization.

      This is because we have a fairly good understanding of how
      triggers work.  The execution sequence is fairly deterministic.
      This is in contrast to persistence where the interplay between
      automatic flushing, before- and after-save methods can be quite
      complex.

    * it is already implemented in Dan's attachment 97029 [details].

  Cons of (a):
    * it bypasses persistence

      If you call DataObject#get("ancestors"), you are likely to get
      an out-of-date value, because persistence can't detect changes
      effected by database triggers.

      Possible fix: make persistence always refetch denormalized
                    columns from the database, instead of relying on
                    the cached in-memory value.

               Con: since refetching a value from the database
                    triggers a flush of all pending events, this may
                    cause new complications owing to the changed order
                    of persistent read/writes.

    * it requires forking.

      This has to be implemented in PL/SQL for Oracle, and PL/pgSQL
      for PostgreSQL, with the all the usual headaches that such
      forking entails.

    * it is a fairly radical change pretty late in the development
      cycle.

      Possible fix: implement in the next development cycle after this
                    release.


  Pros of (b):

     * hopefully better interoperability with persistence.

       If the denormalizatioin is maintained using the usual
       persistence API, then persistence is more likely to maintain
       the correct, coherent in-memory view of the data (modulo the
       apparent bug in isPropertyModified that needs to be tracked
       down).

     * Works with Oracle and PosgreSQL

  Cons of (b):

     * could be pretty hairy


Comment 18 Vadim Nasardinov 2004-03-01 16:05:55 UTC
I think Dan hit the nail on the head with his insight that

  > Basically the problem is that the HierarchyDenormalization has an
  > implicit assumption that objects are all created and saved
  > top-down.

Let's formalize and slightly generalize the notion of "top-down".

We've already defined what it means for a column to be "denormalized".
Our definition operates exclusively on the database level: we only
talk of tables and columns.  We can, if we want to, project this
definition onto the higher-level abstractions that we build on top of
the DB: (a) persistence events, and/or (b) data objects, and/or (c)
domain objects.

Let's think of this in terms of data objects for a moment.

                     +---------------+
            +--------+ Data Object A +--------+
            |        +---------------+        |
            |                                 |
            |                                 |
            |                                 |
            V                                 V
    +---------------+                 +---------------+
    | Data Object B |                 | Data Object C |
    +-------+-------+                 +-------+-------+
            |                                 |
            |                                 |
            |                                 |
            |        +---------------+        |
            +------->| Data Object D |<-------+
            |        +-------+-------+        |
            |                |                |
            |                |                |
            V                |                V
    +---------------+        |        +---------------+
    | Data Object E |        +------->| Data Object F |
    +---------------+                 +---------------+


Let's say we have data object A, B, ..., F of the same type "Foo" that
has a denormalized property "baz".  We posit that or any instance X of
Foo, its value of "baz" can be derived from the remaining "normalized"
properties of X and from properties of the "immediate predecessors" of
X, where the "predecessor" relationship is defined by directed edges
in the above graph, so that, for example, A is the immediate
predecessor of B and C, and B is the predecessor of D and E.

For the "baz" property to be computable, the predecessor graph should
contain no cycles and its "source nodes" such as A (i.e. nodes with no
predecessors) should be able to compute the value of "baz" based
solely on their own normalized values, without having to query the
value of "baz" on any other data object of type Foo.

Our current "hierarchy denormalization" is a simple edge case of the
above, where every node can only have 0 or 1 predecessor, so that all
predecessor graph have a simple-chain form of:

  A ---> B ---> ... --> Q

As Dan points out, our HierarchyDenormalization class can only
maintain the denormalization correctly, if nodes are created in order:
first A, then B, then C, etc.  In the more general case of an
arbitrary acyclic graph, we could trivially generalize
HierarchyDenormalization into AcyclicGraphDenormalization by requiring
that nodes are created in the topological order[1,2].

To sum up, the notion of "topological order" generalizes the notion of
"top-down" that Dan was talking about.

References
  1. http://mathworld.wolfram.com/TopologicalSort.html
  2. http://www.nist.gov/dads/HTML/topologicalOrder.html


Comment 19 Vadim Nasardinov 2004-03-01 16:06:15 UTC
So, in view of the above, there is a number of possible solutions.

1. Triggers + PL/SQL.

2. Fix HierarchyDenormalization.

   Currently, each content item (CI) gets its own instance of
   HierarchyDenormalization (HD).  The HD thingy _only_ keeps track of
   its own CI.  From the above discussion, it should be clear that HD
   should also be keeping track of its adjacent "successor" nodes.  If
   its CI's "ancestor" property changes, HD must notify its successor
   HDs of the change, so they can update their view accordingly.

3. Make persistence treat denormalized columns separately.

   From our definition, it is clear that we need not bother keep track
   of all the intermediate changes.  We could, in theory, update the
   denormalized column _after_ all the normalized columns have been
   saved.  This cannot be implemented correctly in the domain layer
   without special support from persistence.  Persistence would have
   to introduce a special phase, possibly initiated by a commit, that
   goes like so:

     (a) save all "normalized" properties.
     (b) lock all "normalized" properties against any futher writes;
     (c) save all "denormalized" properties.
 
   We'd most likely have to support the notion of "denormalized"
   property on the PDL level with special syntax.

   The logic for computing denormalized properties can be passed to
   the persistence engine in either of two forms:

     i. register some of sort of callback mechanism that persistence
        can call when it's time to compute denormalized properties;

        or, alternatively,

    ii. allow the logic to be encoded in PDL, either in purely
        declarative form, or in some sort of procedural
        Pythonesque/JavaScriptish language.  (Looking farther ahead
        down tahis road, we can anticipate wanting to support the
        general notion of "PDL" trigges written in some sort of
        PL/PDL, rather than merely supporting "denormalized"
        properties.)

This completes the astronautical overview of this ticket.  I'm taking
the rest of the day off to run an errand.


Comment 20 Vadim Nasardinov 2004-03-02 16:32:54 UTC
Xref: bug 117289.  Another example of why HD should be keeping track
of the successors of the node to which HD is attached.


Comment 21 Daniel Berrangé 2004-03-02 16:34:10 UTC
Another major mark against doing the denorm in Java is that we can't
write any SQL upgrade scripts which touch the parent/child mappings
since the denormalization won't be updated.

I did take a look at places in the Code which call get("ancestors")
and there is in fact only one that would be troubled by out-of-date
values & that can be trivially re-written to use the value directlry
in the DB since its a custom data query in any case.

We could persue options b) & c), but I think they are both high risk &
horifically over-engineered strategies, so it would be very hard to
estimate just how much work it would involve, or even whether we'd be
ultimately successfull. In any case it would be practically impossible
to prove the correctness of the resulting Java implementation since
there are so many code paths to consider. 

Keep it simple. Use PL/SQL with its well defined update rules & remove
use of 'get(ancestors)' from any Java code - only allow refences to
'cms_items.ancestors' directly in SQL queries.

Comment 22 Vadim Nasardinov 2004-03-08 15:48:50 UTC
A naive port of the PL/pgSQL code to Oracle's PL/SQL (as in
//users/vadim/code/PLSQL/cms_items_ancestors.sql#1) leads to the
following error:

|2004-03-05 17:43:36,136 [000-1] ERROR rdbms.RDBMSEngine - update cms_items
|set parent_id = ?
|where cms_items.item_id = ?
|java.sql.SQLException: ORA-04091: table VADIMN.CMS_ITEMS is mutating, trigger/function may not see it
|ORA-06512: at "VADIMN.CMS_ITEMS_ANCESTORS_UP_TRG", line 14
|ORA-04088: error during execution of trigger 'VADIMN.CMS_ITEMS_ANCESTORS_UP_TRG'
|
|        at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:168)
|        at oracle.jdbc.ttc7.TTIoer.processError(TTIoer.java:208)
|        at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:543)
|        at oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1405)
|        at oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java:822)
|        at oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.java:1446)
|        at oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.java:1371)
|        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1900)
|        at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:363)
|        at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:407)
|        at com.redhat.persistence.engine.rdbms.RDBMSEngine.execute(RDBMSEngine.java:487)


To quote from
http://download-west.oracle.com/docs/cd/B10501_01/appdev.920/a96590/adg13trg.htm#704,

    A mutating table is a table that is currently being modified by an
    UPDATE, DELETE, or INSERT statement, or a table that might be
    updated by the effects of a DELETE CASCADE constraint.

    The session that issued the triggering statement cannot query or
    modify a mutating table. This restriction prevents a trigger from
    seeing an inconsistent set of data.

    This restriction applies to all triggers that use the FOR EACH ROW
    clause, and statement triggers that are fired as the result of a
    DELETE CASCADE. Views being modified in INSTEAD OF triggers are
    not considered mutating.

    When a trigger encounters a mutating table, a runtime error
    occurs, the effects of the trigger body and triggering statement
    are rolled back, and control is returned to the user or
    application.





<vadim> aram: so, there is no way to get it working without changing the data
    model?  <aram> cant think about anything simple <aram> althoug the
    statement level trigger might do that, but it will be complex stuff
<vadim>
    what's a "statement level trigger"?
<aram>
    there are 2 types of trigger in oracle, row level - fired for every row,
    and statement level, which is executed once per sql statement [09:50-03/08]
<aram>
    but im not even sure if it might work lemme check if my memory serves me
    right here  [09:51-03/08]
<aram>
    do you need to read only access or read/write?
<vadim>
    ok, found it:
http://download-west.oracle.com/docs/cd/B10501_01/appdev.920/a96590/adg13trg.htm#526
<vadim>
    quoting: The FOR EACH ROW option determines whether the trigger is a row
    trigger or a statement trigger. If you specify FOR EACH ROW, then the
    trigger fires once for each row of the table that is affected by the
    triggering statement. The absence of the FOR EACH ROW option indicates
    that the trigger fires only once for each applicable statement, but not
    separately for each row affected by the statement. [09:52-03/08]
<vadim>
    aram: read access to what and at what time?  As far as the application is
    concerned, cms_items.ancestors is a read-only column. [09:53-03/08]
<aram>
    in the trigger
<vadim>
    well, need write access to the cms_items.ancestors column, and read-only
    access for the rest of the table [09:54-03/08]
<aram>
    just a sec, lemme check one thing [09:55-03/08]
<aram>
    vadim, k so it is possible to update table from statement level trigger [09:56-03/08]
<aram>
    so what we can do here log all ancestors_id in the row level trigger to
    some collection property in the package [09:57-03/08]
<aram>
    than in "after" update/insert/delete statement level trigger read this
    collection and to the change
[09:58-03/08]
<aram>
    although we should somehow prevent endless recursion
<vadim>
    aram: that sounds a bit flaky.  what's the isolation level of this
    package-scoped "collection property".  does each transaction get its own
    value?
<aram>
    package properties are transaction specific [09:59-03/08]
<aram>
    but of course this solution is too funky
<vadim>
    it sure seems like a hack [10:00-03/08]
* aram doesnt like this whole ancestors idea in cms_items, since having
  denormalization in the "normal" table is a bad taste
<vadim>
    I suppose you're right [10:01-03/08]
<aram>
    vadim: is it story about moving denormalization logic from java into db
    for cms_items?
<vadim>
    yes, that's what this is about.
<aram>
    vadim: thats good news ;-)
<danpb>
    vadim: was the port of the pg/plsql not straightforward then ?
<vadim>
    danpb: the straightforward port doesn't work.  Oracle doesn't let you
    update the table on which the trigger is fired.  You can only operate in
    the :new namespace, as in ":new.ancestors := 'foo'" [10:03-03/08]
<aram>
    vadim: we can actually maintain denorm data not from trigger but with
    separate call to pg/plsql
<vadim>
    you can't do "update foo set bar = ... where baz = ..."
<aram>
    even more funny i've seen on insert trigger which tries to delete row
    which was inserted in db2
<danpb>
    ah i see
<vadim>
    aram: what do you mean by "a separate statement"?
<aram>
    like have an pg/pl/sql api which maintains denormalization, and call it
    directly throug java
<aram>
    it still will save some traffic to db
<danpb>
    that's not much better from what we've got now
<danpb>
    because the difficulty is figure out when to call it
<vadim>
    that sounds pretty much like what we have now, I think.
<aram>
    in that case lets keep denormalization in the separate place
<danpb>
    i wanted to use triggers because that let us track 100% reliably when
    particular rows changed
<vadim>
    well, we can vertically partition the table, but that also seems like a
hack.
<aram>
    danpb: agree, my prevous solution is dirty hack
<danpb>
    hmm
<danpb>
    so what to do...
<aram>
    move denorm to separate table for gods sake!
<aram>
    is it too much effort?
<danpb>
    that's a pretty major change at this point in the dev cycle
<aram>
    hmmm
<vadim>
    I agree.  Under different circustances, I would change the data model.
    but doing it now is cutting it a little too close.
<aram>
    vadim, than nothing really left except hack with statement level trigger
<danpb>
    perhaps we should postpone this change to the denormalization till after
    rickshaw ?
<vadim>
    we sure could
<aram>
    this hack is nasty but not really complex
<aram>
    but this is HACK
<aram>
    <quote source="oracle docs"> If you need to update a mutating table, you
    could bypass these restrictions by using a temporary table, a PL/SQL
    table, or a package variable. For example, in place of a single AFTER row
    trigger that updates the original table, resulting in a mutating table
    error, you might use two triggers--an AFTER row trigger that updates a
    temporary table, and an AFTER statement trigger that updates the original
    table with the values fr
<aram>
    om the temporary table. </quote>
<vadim>
    strictly speaking, we should the same thing for
    cat_categories.default_ancestors and site_nodes.url
<vadim>
    s/should/should do/
<vadim>
    aram: what's the URL for the above quote?
<aram>
    http://download-west.oracle.com/docs/cd/B10501_01/appdev.920/a96590/adg13trg.htm#704
<danpb>
    ah, that sounds interestnig
<aram>
    this is workable but nasty
<aram>
    i committed similar sin couple of times
<aram>
    although i've hated that
<aram>
    this hack still better than current situtation with ccm_items denorm
<aram>
    IMHO
<vadim>
    that doesn't look too bad
<danpb>
    seems reasonable to me
<aram>
    vadim: i can give you hand with writing triggers if we go with this hack
<vadim>
    aram: ok, I'll ping you if I run into problems.  thanks for your help.
<aram>
    np


Comment 23 Vadim Nasardinov 2004-03-09 22:19:22 UTC
As of change 41195 (applied to the trunk), the cms_items.ANCESTORS
denormalization is now maintained by triggers in the db.

Comment 24 Bryan Che 2004-03-10 21:11:06 UTC
The triggers have a pretty big impact on db performance during an
authoring workflow.  This is because they cause so many updates to
occur.  For example, I ran the cms/populate code and created 100
items, without applying publishing lifecycles to them.  Inserting
these 100 items caused over 3 million updates to cms_items, according
to oracle statspack:
                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s)
Hash Value
--------------- ------------ -------------- ------ -------- ---------
----------
     15,804,564    3,151,853            5.0   76.8   401.45    401.71
3143147938
Module: java.redhat.com (TNS V1
UPDATE cms_items     set ancestors = :b2     where item_id = :b1

Comment 25 Richard Li 2004-03-10 21:17:39 UTC
Aram, can you take a look?

Comment 26 Vadim Nasardinov 2004-03-10 21:27:15 UTC
I think I know what the problem is.  I'll give it another shot.  If
I can't fix it, I'll hand it over to Aram.

Comment 27 Vadim Nasardinov 2004-03-10 22:01:42 UTC
Should be fixed by change 41230.


Comment 28 Vadim Nasardinov 2004-03-16 16:24:26 UTC
Tom Lane says, in
  http://post-office.corp.redhat.com/archives/postgresql-project/2004-March/msg00003.html
  Message-ID: <25153.1079452732.pa.us>

that 

  update cms_items
  set ancestors = new.ancestors || substr(ancestors, char_length(old.ancestors)+1)
  where ancestors like old.ancestors || ''%'' and item_id <> new.item_id;

is going to suck performance-wise, because 

    "...there's no indexable condition in the WHERE clause (LIKE with a
    nonconstant pattern isn't indexable).  So you get a seqscan over
    cms_items"

Bummer.


Comment 29 Bryan Che 2004-03-16 21:35:28 UTC
change 41410 restores pg performance with respect to updates to cms_items.