Bug 916373

Summary: Resource availability type remains in state unknown after application of workaround from bug 865166
Product: [JBoss] JBoss Operations Network Reporter: Charles Crouch <ccrouch>
Component: Agent, Core ServerAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED DUPLICATE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: high    
Version: JON 3.1.2CC: ccrouch, fbrychta, hbrock, hrupp, jshaughn, loleary, mazz, tsegismo
Target Milestone: ---   
Target Release: JON 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 881848 Environment:
Last Closed: 2013-09-18 02:36:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 881848    
Bug Blocks:    

Description Charles Crouch 2013-02-27 23:04:33 UTC
+++ This bug was initially created as a clone of Bug #881848 +++

Description of problem:
See bug 865166. When you apply provided query which fixes inconsistent RHQ_RESOURCE_AVAIL table, relevant resource availability type remains in unknown state. Invoking 'avail -f' operation on the agent doesn't fix the state. 

Version-Release number of selected component (if applicable):
3.1.2.ER2

How reproducible:
Always

Steps to Reproduce:
1. clean installation of JON, the agent is running and imported to inventory
2. remove row relevant for RHQ Agent from RHQ_RESOURCE_AVAIL table (you can use step 4 from bug 877176) which brings database to inconsisten state described in bug 865166
3. apply workaround from bug 865166 (INSERT INTO RHQ_RESOURCE_AVAIL... query)
4. refresh page showing RHQ Agent availability -> availability unknown 
5. invoke agent's command 'avail -f'
6. refresh page showing RHQ Agent availability -> availability unknown
  
Actual results:
RHQ Agent availability remains unknown

Expected results:
After avail -f RHQ Agent's availability is UP

Additional info:
This is an edge scenario and this issue can be simply workarounded by restarting the agent or uninvetory/inventory the resource. But is there any chance this could lead to some general issue in availability scan?

The same state can be reached by following steps in bug 877176 with one difference. Upgrade to 3.1.2.ER2, not to JON 3.1.1

--- Additional comment from John Mazzitelli on 2013-02-25 15:54:46 EST ---

this query (followed by executing "Execute Availaiblity Scan" changes-only=false operation) will fix it too:

INSERT INTO RHQ_AVAILABILITY ( ID, RESOURCE_ID, START_TIME, END_TIME, AVAILABILITY_TYPE )
SELECT RHQ_AVAILABILITY_ID_SEQ.nextval, res.ID, 0, NULL, 2
 FROM RHQ_RESOURCE res
WHERE NOT EXISTS ( SELECT * FROM RHQ_AVAILABILITY WHERE RESOURCE_ID = res.ID )

--- Additional comment from John Mazzitelli on 2013-02-25 16:07:28 EST ---

the preivous query was for Oracle, this is for postgres:

INSERT INTO RHQ_AVAILABILITY ( ID, RESOURCE_ID, START_TIME, END_TIME, AVAILABILITY_TYPE )
   SELECT nextval('RHQ_AVAILABILITY_ID_SEQ'::text), res.ID, 0, NULL, 2
   FROM RHQ_RESOURCE res
  WHERE NOT EXISTS ( SELECT * FROM RHQ_AVAILABILITY WHERE RESOURCE_ID = res.ID )

we need to add a new dbupgrade step for this

--- Additional comment from John Mazzitelli on 2013-02-25 16:29:20 EST ---

assigning to jay

--- Additional comment from Jay Shaughnessy on 2013-02-25 16:39:39 EST ---


Post-upgrade the query above must be executed to repair the situation.  Otherwise the db-upgrade will take care of it (for versions including the forthcoming commit).

The full avail report is required (or an agent restart) to move the avail from UNKNOWN to the actual avail state.

--- Additional comment from Jay Shaughnessy on 2013-02-25 20:14:29 EST ---


master commit 068f664483a2013fa84123cb6b6ba85b54ee7c5c
Jay Shaughnessy <jshaughn>
Mon Feb 25 16:36:41 2013 -0500

Ensure after upgrade that all resources, including those in the ADQ, have
at least an initial UNKNOWN Availability.


Test Notes:
See above comments.

--- Additional comment from John Mazzitelli on 2013-02-26 09:49:23 EST ---

for the record:

I tried this upgrade from JON 3.0.0 -> 3.1.2 and, rather than do the SQL query manual to correct the Db, I simply tried to restart the agent with the hope it would send up a full avail report and self-correct the DB.

However, this did not happen. The RHQ_AVAILABILITY table is still missing the rows even after the agent restart. The server also spit out these messages when it got the restarted agent's avail report:

09:38:20,048 INFO  [AvailabilityManagerBean] Skipping mergeAvailabilityReport() for stale resource [Resource[id=10001, uuid=null, type=<null>, key=null, name=null, parent=<null>]]. These messages should go away after the next agent synchronization with the server.

In short, to fix this problem, you have to execute that SQL statement manually after you upgrade to 3.1.2. You only need to do this IF you committed resources AFTER the 3.1.2 upgrade where those resources were in the discovery queue pre-upgrade (that is, when 3.0.0 was running.)

Comment 1 Charles Crouch 2013-02-27 23:08:58 UTC
Cloning this bug to make sure it gets QE'd as part of the upgrade testing for JON320. 

The key part is: "You only need to do this IF you committed resources AFTER the 3.1.2 upgrade where those resources were in the discovery queue pre-upgrade (that is, when 3.0.0 was running.)"

So steps to reproduce:
1) Install JON300
2) Start an agent, but don't import its platform, leave it in the autodiscovery queue.
2) Upgrade to JON320. Import the agent and platform from step2), see that their availabilities are green.

Comment 2 Larry O'Leary 2013-02-28 00:04:58 UTC
This solution does not appear to be complete and does not match the tested one. I think we may have duplicated effort somewhere. This had already been worked as https://bugzilla.redhat.com/show_bug.cgi?id=884338.

Comment 3 Jay Shaughnessy 2013-04-11 19:00:07 UTC
+1, I think thisis a duplicate of Bug 884338 and shouldbe closed.

Comment 5 Thomas Segismont 2013-09-13 14:03:55 UTC
Jay, Heiko, should this bug be closed?

Comment 6 Larry O'Leary 2013-09-18 02:36:34 UTC
Closing this as a duplicate of 884338. Even though this may have identified a new issue, this issue was addressed and also reported in the original issue (bug 884338).

*** This bug has been marked as a duplicate of bug 884338 ***