Description of problem: See bug 865166. When you apply provided query which fixes inconsistent RHQ_RESOURCE_AVAIL table, relevant resource availability type remains in unknown state. Invoking 'avail -f' operation on the agent doesn't fix the state. Version-Release number of selected component (if applicable): 3.1.2.ER2 How reproducible: Always Steps to Reproduce: 1. clean installation of JON, the agent is running and imported to inventory 2. remove row relevant for RHQ Agent from RHQ_RESOURCE_AVAIL table (you can use step 4 from bug 877176) which brings database to inconsisten state described in bug 865166 3. apply workaround from bug 865166 (INSERT INTO RHQ_RESOURCE_AVAIL... query) 4. refresh page showing RHQ Agent availability -> availability unknown 5. invoke agent's command 'avail -f' 6. refresh page showing RHQ Agent availability -> availability unknown Actual results: RHQ Agent availability remains unknown Expected results: After avail -f RHQ Agent's availability is UP Additional info: This is an edge scenario and this issue can be simply workarounded by restarting the agent or uninvetory/inventory the resource. But is there any chance this could lead to some general issue in availability scan? The same state can be reached by following steps in bug 877176 with one difference. Upgrade to 3.1.2.ER2, not to JON 3.1.1
this query (followed by executing "Execute Availaiblity Scan" changes-only=false operation) will fix it too: INSERT INTO RHQ_AVAILABILITY ( ID, RESOURCE_ID, START_TIME, END_TIME, AVAILABILITY_TYPE ) SELECT RHQ_AVAILABILITY_ID_SEQ.nextval, res.ID, 0, NULL, 2 FROM RHQ_RESOURCE res WHERE NOT EXISTS ( SELECT * FROM RHQ_AVAILABILITY WHERE RESOURCE_ID = res.ID )
the preivous query was for Oracle, this is for postgres: INSERT INTO RHQ_AVAILABILITY ( ID, RESOURCE_ID, START_TIME, END_TIME, AVAILABILITY_TYPE ) SELECT nextval('RHQ_AVAILABILITY_ID_SEQ'::text), res.ID, 0, NULL, 2 FROM RHQ_RESOURCE res WHERE NOT EXISTS ( SELECT * FROM RHQ_AVAILABILITY WHERE RESOURCE_ID = res.ID ) we need to add a new dbupgrade step for this
assigning to jay
Post-upgrade the query above must be executed to repair the situation. Otherwise the db-upgrade will take care of it (for versions including the forthcoming commit). The full avail report is required (or an agent restart) to move the avail from UNKNOWN to the actual avail state.
master commit 068f664483a2013fa84123cb6b6ba85b54ee7c5c Jay Shaughnessy <jshaughn> Mon Feb 25 16:36:41 2013 -0500 Ensure after upgrade that all resources, including those in the ADQ, have at least an initial UNKNOWN Availability. Test Notes: See above comments.
for the record: I tried this upgrade from JON 3.0.0 -> 3.1.2 and, rather than do the SQL query manual to correct the Db, I simply tried to restart the agent with the hope it would send up a full avail report and self-correct the DB. However, this did not happen. The RHQ_AVAILABILITY table is still missing the rows even after the agent restart. The server also spit out these messages when it got the restarted agent's avail report: 09:38:20,048 INFO [AvailabilityManagerBean] Skipping mergeAvailabilityReport() for stale resource [Resource[id=10001, uuid=null, type=<null>, key=null, name=null, parent=<null>]]. These messages should go away after the next agent synchronization with the server. In short, to fix this problem, you have to execute that SQL statement manually after you upgrade to 3.1.2. You only need to do this IF you committed resources AFTER the 3.1.2 upgrade where those resources were in the discovery queue pre-upgrade (that is, when 3.0.0 was running.)