Bug 881848 - Resource availability type remains in state unknown after application of workaround from bug 865166
Summary: Resource availability type remains in state unknown after application of work...
Keywords:
Status: ON_QA
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent, Core Server
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Jay Shaughnessy
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 916373
TreeView+ depends on / blocked
 
Reported: 2012-11-29 16:54 UTC by Filip Brychta
Modified: 2022-03-31 04:28 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
: 916373 (view as bug list)
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 865166 0 high CLOSED Resource with AvailabilityType of NULL is being returned to UI resulting in ResourceDatasource.copyValues throwing org_r... 2021-02-22 00:41:40 UTC

Internal Links: 865166

Description Filip Brychta 2012-11-29 16:54:50 UTC
Description of problem:
See bug 865166. When you apply provided query which fixes inconsistent RHQ_RESOURCE_AVAIL table, relevant resource availability type remains in unknown state. Invoking 'avail -f' operation on the agent doesn't fix the state. 

Version-Release number of selected component (if applicable):
3.1.2.ER2

How reproducible:
Always

Steps to Reproduce:
1. clean installation of JON, the agent is running and imported to inventory
2. remove row relevant for RHQ Agent from RHQ_RESOURCE_AVAIL table (you can use step 4 from bug 877176) which brings database to inconsisten state described in bug 865166
3. apply workaround from bug 865166 (INSERT INTO RHQ_RESOURCE_AVAIL... query)
4. refresh page showing RHQ Agent availability -> availability unknown 
5. invoke agent's command 'avail -f'
6. refresh page showing RHQ Agent availability -> availability unknown
  
Actual results:
RHQ Agent availability remains unknown

Expected results:
After avail -f RHQ Agent's availability is UP

Additional info:
This is an edge scenario and this issue can be simply workarounded by restarting the agent or uninvetory/inventory the resource. But is there any chance this could lead to some general issue in availability scan?

The same state can be reached by following steps in bug 877176 with one difference. Upgrade to 3.1.2.ER2, not to JON 3.1.1

Comment 1 John Mazzitelli 2013-02-25 20:54:46 UTC
this query (followed by executing "Execute Availaiblity Scan" changes-only=false operation) will fix it too:

INSERT INTO RHQ_AVAILABILITY ( ID, RESOURCE_ID, START_TIME, END_TIME, AVAILABILITY_TYPE )
SELECT RHQ_AVAILABILITY_ID_SEQ.nextval, res.ID, 0, NULL, 2
 FROM RHQ_RESOURCE res
WHERE NOT EXISTS ( SELECT * FROM RHQ_AVAILABILITY WHERE RESOURCE_ID = res.ID )

Comment 2 John Mazzitelli 2013-02-25 21:07:28 UTC
the preivous query was for Oracle, this is for postgres:

INSERT INTO RHQ_AVAILABILITY ( ID, RESOURCE_ID, START_TIME, END_TIME, AVAILABILITY_TYPE )
   SELECT nextval('RHQ_AVAILABILITY_ID_SEQ'::text), res.ID, 0, NULL, 2
   FROM RHQ_RESOURCE res
  WHERE NOT EXISTS ( SELECT * FROM RHQ_AVAILABILITY WHERE RESOURCE_ID = res.ID )

we need to add a new dbupgrade step for this

Comment 3 John Mazzitelli 2013-02-25 21:29:20 UTC
assigning to jay

Comment 4 Jay Shaughnessy 2013-02-25 21:39:39 UTC
Post-upgrade the query above must be executed to repair the situation.  Otherwise the db-upgrade will take care of it (for versions including the forthcoming commit).

The full avail report is required (or an agent restart) to move the avail from UNKNOWN to the actual avail state.

Comment 5 Jay Shaughnessy 2013-02-26 01:14:29 UTC
master commit 068f664483a2013fa84123cb6b6ba85b54ee7c5c
Jay Shaughnessy <jshaughn>
Mon Feb 25 16:36:41 2013 -0500

Ensure after upgrade that all resources, including those in the ADQ, have
at least an initial UNKNOWN Availability.


Test Notes:
See above comments.

Comment 6 John Mazzitelli 2013-02-26 14:49:23 UTC
for the record:

I tried this upgrade from JON 3.0.0 -> 3.1.2 and, rather than do the SQL query manual to correct the Db, I simply tried to restart the agent with the hope it would send up a full avail report and self-correct the DB.

However, this did not happen. The RHQ_AVAILABILITY table is still missing the rows even after the agent restart. The server also spit out these messages when it got the restarted agent's avail report:

09:38:20,048 INFO  [AvailabilityManagerBean] Skipping mergeAvailabilityReport() for stale resource [Resource[id=10001, uuid=null, type=<null>, key=null, name=null, parent=<null>]]. These messages should go away after the next agent synchronization with the server.

In short, to fix this problem, you have to execute that SQL statement manually after you upgrade to 3.1.2. You only need to do this IF you committed resources AFTER the 3.1.2 upgrade where those resources were in the discovery queue pre-upgrade (that is, when 3.0.0 was running.)


Note You need to log in before you can comment on or make changes to this bug.