If you have a resource with an alert definition whose conditions use the AND (aka ALL) conjunction (for example "if metric > 40 AND if metric < 60"), and if one of those conditions were true but the other isn't (that is, an alert hasn't been triggered yet because the full condition isn't true yet), you cannot uninventory that resource. The reason is the alert_id column is null in the rhq_agent_condition_log table, but our bulk delete assumes a non-null alert. See AlertConditionLog named query and the bulk uninventory code that uses it: @NamedQuery(name = AlertConditionLog.QUERY_DELETE_BY_RESOURCES, // query = "DELETE AlertConditionLog acl " // + " WHERE acl.alert.id IN ( SELECT alert.id " // + " FROM AlertDefinition ad " // + " JOIN ad.alerts alert " // + " WHERE ad.resource.id IN ( :resourceIds ) ))"), Notice that acl.alert.id is null in the case in question which will never match the set returned by the subquery (as you see, the subquery only ever returns alert IDs - but if the condition log was not yet associated with any alert, this DELETE won't remove that condition log). Leaving a condition log in the DB will prohibit the ability to remove the resource itself due to foreign key constraints.
How to replicate: You need to have an alert definition with more than one condition but using the ALL conjunction ("fire alert when ALL conditions are true"). For example, on a platform resource, create a condition you know will be true (e.g. for the platform resource, use something like "if free memory > 1"). Then create one or more additional conditions that you know will not be true (e.g. for the platform resource "if operation "Manual discovery" executes with FAILURE" - don't execute that operation :) Wait for the free memory metric to be collected and sent up by the agent. At that point, you should be able to look in the rhq_alert_condition_log table and see a condition log row that has null for alert_id (since this condition was true, but an alert wasn't fired yet because of it). Now try to uninventory the platform resource, and expect to see a foreign key constraint violation once the async delete job kicks off and tries to remove the platform resource from the DB
Now deletes condition logs via the alert def, not alerts, because not every condition log may not yet be associated with an alert. Also, avoids joining with the potentially large alert table.
Jay, did you already fix this? If so, what version?
Yes, already fixed. Must have forgotten to update the BZ: commit bbad56e4cc4bd0e2fae2d7a28e22ed6724162315 Author: Jay Shaughnessy <jshaughn> Date: Thu Sep 8 12:09:19 2011 -0400
Verified on build#449 (Version: 4.1.0-SNAPSHOT Build Number: 4d56f0b) Created an alert with conditions 'if free memory > 1' and 'Manual discovery executes with FAILURE" using the ALL conjunction. The rhq_alert_condition_log table has a condition log row that has null for alert_id: rhq4=# select * from rhq_alert_condition_log; id | ctime | alert_id | condition_id | value -------+---------------+----------+--------------+------------- 10001 | 1317375258767 | | 10001 | 8.4496384E7 (1 row) Navigated to 'Inventory menu->Platforms', selected the platform and clicked on 'Uninventory' button. The platform is uninventoried successfully. The server log does not display foreign key constraint violation and it displays: 2011-09-30 15:11:15,247 INFO [org.rhq.enterprise.server.scheduler.jobs.AsyncResourceDeleteJob] Async resource deletion - 833 successful, 0 failed, took [44669] ms Marking as verified.
*** Bug 736452 has been marked as a duplicate of this bug. ***
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE