+++ This bug was initially created as a clone of JBoss ON Bug #815869 +++ Description of problem: I created a new dynagroup with the follow customized expression: "resource.name.endsWith = .sh" Recursive flag is checked and recalculation interval is 0. Now I can't delete this group. JON server shows above: "Failed to delete the selected group definitions" Here is the stacktrace tail: "Message: Failed to delete the selected group definitions Severity : Error Time : Thursday, April 19, 2012 2:31:11 PM Etc/GMT+3 Detail : java.lang.RuntimeException:[1334856671827] java.lang.RuntimeException:javax.transaction.RollbackException: [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state -> javax.transaction.RollbackException:[com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] [com.arjuna.ats.internal.jta.transaction.arjunacore.commitwhenaborted] Can't commit because the transaction is in aborted state -> javax.persistence.EntityExistsException:org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update -> org.hibernate.exception.ConstraintViolationException:Could not execute JDBC batch update -> java.sql.BatchUpdateException:Batch entry 0 delete from RHQ_RESOURCE_GROUP where ID='10131' was aborted. Call getNextException to see the cause.[SQLException=Batch entry 0 delete from RHQ_RESOURCE_GROUP where ID='10131' was aborted. Call getNextException to see the cause. -> ERROR: update or delete on table "rhq_resource_group" violates foreign key constraint "rhq_resource_group_cluster_resource_group_id_fkey" on table "rhq_resource_group" Detail: Key (id)=(10131) is still referenced from table "rhq_resource_group".(error-code=0,sql-state=23503)] Somehow a cluster group was created, I had to manually delete the cluster group and the main group definition from rhq_resource_group Version-Release number of selected component (if applicable): 4.2
triaged 4/30/2012 by loleary, ccrouch, mfoley
Hmm, I wonder if there is a general problem with recursive compat group removal after it's been navigated in the UI... I'll check on that.
It's definitely not a general problem. I have not been able to reproduce the issue and an inspection of the code indicates that we seem to be doing the right thing. But, after just about giving up, I looked more closely at the case attachments and i see why the error is generated. I don't know yet how this can happen but basically, there is an autocluster group hanging off of a mixed group (the mixed group having been generated by the group def). That should not happen. When a group is recalculated we do have logic in place to ensure that if it goes from compat to mixed that we clean up autoclusters. Perhaps if the definition is actually changed completely we have an issue, I'll keep looking...
OK, I recreated this but it was not easy. Perhaps there is an easier way but I couldn't find it, I'm surprised this could come up in general practice. 1) Create a group definition that generates a recursive compatible group For example: resource.type.plugin = JBossAS5 resource.type.name = Web Application (WAR) recursive? yes 2) Navigate to the new group, then drill down into a child node (aka an autocluster node) In our example: Web Application Context 3) In a second GUI session, navigate to the new group and ** do not yet navigate to the autocluster node **. 4) Back in session 1, return to the group definition and change it completely, this time to generate a mixed group. For example: resource.type.plugin = JBossAS5 resource.name.contains = t 5) Back in session 2, which still shows the compat group tree, navigate to the Web Application Context autocluster node. This unwittingly creates the compatible backing group for the autocluster, and links it to the original, now mixed, resource group. And now we have a problem. The thing that surprises me is that when we change the group definition we actually re-use the resource group from the previous definition. I guess we just consider it a recalculation and that logic must preserve groups that have the same name after the recalculation. Since there is nothing changing that would affect naming, like a change to agroup-by (pivot), we end up keeping the same group for a completely different resource set. I don't recommend we change this behavior. We already protect against changes from Compat to Mixed, which is why recreating this is difficult (at least in all ways I could think of). It required that stale tree in a second window to force the issue. We could protect further, I guess, in one of two ways. Validate the autocluster root node is valid (not mixed) at create-time. Or, we could potentially just try and clean up this sort of case under-the covers, at delete time. I'll take a look. I suggest the easiest possible solution, as this, as far as I can tell, should be super-rare.
This stale group tree scenario is not totally far-fetched. Although unlikely, the more I think about it it is possible to get a stale group tree when the group changes type. Due to dynagroup recalculation, or just a manual member edit, if a session is looking at a compat tree when the group changes to mixed you can get the problem above. You can also generate an unhandled exception by trying to show a context menu on the root node. I've put in some handling for both cases. In both cases the GUI will refresh the view automatically, bringing up the mixed group tree, to indicate the state change and give the user a valid tree. Note: This is not a general group tree state change solution. It covers only these scenarios. If and when a server-side change notification mechanism is put in place then this is one more state change we should respond to.
master commit 2f9c4823ea11cfdc6f289d865661b06ed4063c6d Protect against a couple of issues that result from a stale Compat Group tree in the GUI. If a compatible group changes to mixed *while a GUI session is actively navigating the compat group tree* then bad things can happen, including this BZ. The only way to really protect against this would be a live (non-polling) server-side change event listener for the GUI. I'm not sure that's possible but it is something to investigate.
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.