Bug 865166 - Resource with AvailabilityType of NULL is being returned to UI resulting in ResourceDatasource.copyValues throwing org_rhq_core_domain_resource_Resource_currentAvailability is null
Resource with AvailabilityType of NULL is being returned to UI resulting in R...
Product: JBoss Operations Network
Classification: JBoss
Component: Inventory, UI (Show other bugs)
JON 3.1.1
All All
high Severity urgent
: ---
: JON 3.1.2
Assigned To: Jay Shaughnessy
Mike Foley
Depends On: 865167
Blocks: 877176
  Show dependency treegraph
Reported: 2012-10-10 18:20 EDT by Larry O'Leary
Modified: 2013-09-11 07:01 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 865167 (view as bug list)
Last Closed: 2013-09-11 07:01:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 228803 None None None 2012-10-10 18:21:03 EDT

  None (edit)
Description Larry O'Leary 2012-10-10 18:20:05 EDT
Description of problem:
When accessing the *Inventory* pages such as *Platforms*, the list of resources is not created and instead the *Message Center* displays an error indicating `Globally uncaught exception` with the following stack:

    com.google.gwt.core.client.JavaScriptException:(TypeError): from.org_rhq_core_domain_resource_Resource_currentAvailability is null
    (TypeError): from.org_rhq_core_domain_resource_Resource_currentAvailability is null
    stack: org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_$copyValues__Lorg_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_2Lorg_rhq_core_domain_resource_Resource_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149635
        fileName: https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html
        lineNumber: 149635
           at Unknown.anonymous(Unknown Source)
           at Unknown.anonymous(Unknown Source)

This error is due to one or more platform resources in the RHQ_RESOURCE table not having a corresponding row in the RHQ_RESOURCE_AVAIL or RHQ_AVAILABILITY tables.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Not certain at this point but the hypothesis is that the impacted resource had been committed to inventory. Then, the resource or its parent resource was removed from inventory using *UNINVENTORY*. Then, the server was shutdown resulting in a partial deletion of the affected resource from the database.

A similar or the same scenario was reported when an upgrade of the JBoss ON server was being performed. First, with the server running (not sure if the agent was running or not) the agent's platform was removed from inventory and the server was immediately shutdown in preparation for an upgrade. The upgrade was performed. After the upgrade, the inventory pages would no longer work.

Additional info:
We know that in JBoss ON 3.1 `null` availability was no longer a valid AvailabilityType and a column constraint was added to the database table to prevent `null` from being a valid value. However, what we did not take into consideration is that many of the queries retrieve availability for a resource using a LEFT OUTER JOIN. The result of a LEFT OUTER JOIN if the resource is on the left and the resource's availability is on the right would be a result that contained an availability type value of `null`. So, perhaps this underlying partial delete issue has been present for quite some time but not detected because previously we had expected availability to be `null` and treat it as UNKNOWN.

However, the underlying issue is why the partial delete. How does the RHQ_RESOURCE table contain a resource row that does not have a corresponding RHQ_RESOURCE_AVAIL or RHQ_AVAILABILITY row. Perhaps:

 - Availability has not yet been collected for this resource or an availability report has not yet been processed by the server?

   However, in that case, we would expect the inventory status to be something other then COMMITTED. Perhaps, NEW or UNCOMMITED or DELETED, but not COMMITTED. But maybe this can and does happen? When I first import a resource I would expect that not only do I add a row to RHQ_RESOURCE but also a row to both RHQ_RESOURCE_AVAIL and RHQ_AVAILABILITY with an availability of UNKNOWN to indicate that we don't know what it is yet.

 - When removing/deleting a resource from inventory, the last place we delete is the RHQ_RESOURCE table

   The expectation would be a transaction would prevent data from being removed from dependent tables but we know that the deletion of a resource and its data (metrics, calltime, events, availability, content, etc) happens asynchronously and most likely in chunks to prevent transaction timeouts. The process is very time consuming.
Comment 1 Larry O'Leary 2012-10-17 10:46:28 EDT
A code fix has been committed to upstream that would prevent this from happening in the first place but does not fix the problem if it has already occurred. Therefore, a workaround of executing SQL against the ON database is needed to fix up the database state in the event the system has been impacted by this issue. The provided SQL has not been tested but should fix the issue.

master commit c2b37fd00701018fd5df3f99879c6cf4c2fddc4c
Jay Shaughnessy <jshaughn@redhat.com>
Tue Oct 16 14:04:41 2012 -0400

** NOTE!!! Once the issue occurs, the code fix will will not automatically solve
** the problem.  This workaround code below should be applied.  The fix will
** only help future upgrades, not upgrades that have already taken place.

Workaround SQL that can be applied to databases suffering from this issue:

    SELECT nextval('RHQ_RESOURCE_AVAIL_ID_SEQ'::text), res.ID, 2
     WHERE avail.ID IS NULL

     WHERE avail.ID IS NULL
Comment 2 Charles Crouch 2012-11-06 16:01:57 EST
This BZ is to backport the fix from master which will prevent this issue going forward and db upgrade updated with the sql below
Comment 3 Larry O'Leary 2012-11-06 16:06:41 EST
GSS has confirmed that the queries do resolve this issue once it happens. So, it is recommended that we get this added to the schema update script to ensure that users who are impacted by this will fix it.
Comment 4 John Mazzitelli 2012-11-13 10:35:52 EST
cherry picked master commit (c2b37fd00701018fd5df3f99879c6cf4c2fddc4c):


this does NOT include a db-upgrade.xml change. That has not been developed or checked into master even.
Comment 5 John Mazzitelli 2012-11-13 11:06:31 EST
the db-upgrade code has been committed to master (4b888be16b1ae2ceb876e85cc82001bbdc79e516)

this has been cherry picked to release/jon3.1.x (note: the db schema version was changed from the master version of 2.126 to version 2.122.2 in the release branch):

Comment 6 Simeon Pinder 2012-11-21 16:56:17 EST
Moving to ON_QA as available for test with build : https://brewweb.devel.redhat.com//buildinfo?buildID=244662.
Comment 7 Filip Brychta 2012-11-29 08:05:37 EST
Verified that upgrade to 3.1.2.ER2 fixes inconsistent state of rhq_resource_avail table (missing rows, NULL availability_type) and import of resources works for both CLI and GUI. (as part of bug 877176)
Also verified that inconsistent rhq_resource_avail table is fixed when a resource is imported.
Issue still occures when a resource is already imported and the rhq_resource_avail table is edited manualy. 
Verified that provided queries do resolve this issue.

Note You need to log in before you can comment on or make changes to this bug.