+++ This bug was initially created as a clone of Bug #881848 +++ Description of problem: See bug 865166. When you apply provided query which fixes inconsistent RHQ_RESOURCE_AVAIL table, relevant resource availability type remains in unknown state. Invoking 'avail -f' operation on the agent doesn't fix the state. Version-Release number of selected component (if applicable): 3.1.2.ER2 How reproducible: Always Steps to Reproduce: 1. clean installation of JON, the agent is running and imported to inventory 2. remove row relevant for RHQ Agent from RHQ_RESOURCE_AVAIL table (you can use step 4 from bug 877176) which brings database to inconsisten state described in bug 865166 3. apply workaround from bug 865166 (INSERT INTO RHQ_RESOURCE_AVAIL... query) 4. refresh page showing RHQ Agent availability -> availability unknown 5. invoke agent's command 'avail -f' 6. refresh page showing RHQ Agent availability -> availability unknown Actual results: RHQ Agent availability remains unknown Expected results: After avail -f RHQ Agent's availability is UP Additional info: This is an edge scenario and this issue can be simply workarounded by restarting the agent or uninvetory/inventory the resource. But is there any chance this could lead to some general issue in availability scan? The same state can be reached by following steps in bug 877176 with one difference. Upgrade to 3.1.2.ER2, not to JON 3.1.1 --- Additional comment from John Mazzitelli on 2013-02-25 15:54:46 EST --- this query (followed by executing "Execute Availaiblity Scan" changes-only=false operation) will fix it too: INSERT INTO RHQ_AVAILABILITY ( ID, RESOURCE_ID, START_TIME, END_TIME, AVAILABILITY_TYPE ) SELECT RHQ_AVAILABILITY_ID_SEQ.nextval, res.ID, 0, NULL, 2 FROM RHQ_RESOURCE res WHERE NOT EXISTS ( SELECT * FROM RHQ_AVAILABILITY WHERE RESOURCE_ID = res.ID ) --- Additional comment from John Mazzitelli on 2013-02-25 16:07:28 EST --- the preivous query was for Oracle, this is for postgres: INSERT INTO RHQ_AVAILABILITY ( ID, RESOURCE_ID, START_TIME, END_TIME, AVAILABILITY_TYPE ) SELECT nextval('RHQ_AVAILABILITY_ID_SEQ'::text), res.ID, 0, NULL, 2 FROM RHQ_RESOURCE res WHERE NOT EXISTS ( SELECT * FROM RHQ_AVAILABILITY WHERE RESOURCE_ID = res.ID ) we need to add a new dbupgrade step for this --- Additional comment from John Mazzitelli on 2013-02-25 16:29:20 EST --- assigning to jay --- Additional comment from Jay Shaughnessy on 2013-02-25 16:39:39 EST --- Post-upgrade the query above must be executed to repair the situation. Otherwise the db-upgrade will take care of it (for versions including the forthcoming commit). The full avail report is required (or an agent restart) to move the avail from UNKNOWN to the actual avail state. --- Additional comment from Jay Shaughnessy on 2013-02-25 20:14:29 EST --- master commit 068f664483a2013fa84123cb6b6ba85b54ee7c5c Jay Shaughnessy <jshaughn> Mon Feb 25 16:36:41 2013 -0500 Ensure after upgrade that all resources, including those in the ADQ, have at least an initial UNKNOWN Availability. Test Notes: See above comments. --- Additional comment from John Mazzitelli on 2013-02-26 09:49:23 EST --- for the record: I tried this upgrade from JON 3.0.0 -> 3.1.2 and, rather than do the SQL query manual to correct the Db, I simply tried to restart the agent with the hope it would send up a full avail report and self-correct the DB. However, this did not happen. The RHQ_AVAILABILITY table is still missing the rows even after the agent restart. The server also spit out these messages when it got the restarted agent's avail report: 09:38:20,048 INFO [AvailabilityManagerBean] Skipping mergeAvailabilityReport() for stale resource [Resource[id=10001, uuid=null, type=<null>, key=null, name=null, parent=<null>]]. These messages should go away after the next agent synchronization with the server. In short, to fix this problem, you have to execute that SQL statement manually after you upgrade to 3.1.2. You only need to do this IF you committed resources AFTER the 3.1.2 upgrade where those resources were in the discovery queue pre-upgrade (that is, when 3.0.0 was running.)
Cloning this bug to make sure it gets QE'd as part of the upgrade testing for JON320. The key part is: "You only need to do this IF you committed resources AFTER the 3.1.2 upgrade where those resources were in the discovery queue pre-upgrade (that is, when 3.0.0 was running.)" So steps to reproduce: 1) Install JON300 2) Start an agent, but don't import its platform, leave it in the autodiscovery queue. 2) Upgrade to JON320. Import the agent and platform from step2), see that their availabilities are green.
This solution does not appear to be complete and does not match the tested one. I think we may have duplicated effort somewhere. This had already been worked as https://bugzilla.redhat.com/show_bug.cgi?id=884338.
+1, I think thisis a duplicate of Bug 884338 and shouldbe closed.
Jay, Heiko, should this bug be closed?
Closing this as a duplicate of 884338. Even though this may have identified a new issue, this issue was addressed and also reported in the original issue (bug 884338). *** This bug has been marked as a duplicate of bug 884338 ***