Bug 865166 - Resource with AvailabilityType of NULL is being returned to UI resulting in ResourceDatasource.copyValues throwing org_rhq_core_domain_resource_Resource_currentAvailability is null
Summary: Resource with AvailabilityType of NULL is being returned to UI resulting in R...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Inventory, UI
Version: JON 3.1.1
Hardware: All
OS: All
high
urgent
Target Milestone: ---
: JON 3.1.2
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On: 865167
Blocks: 877176
TreeView+ depends on / blocked
 
Reported: 2012-10-10 22:20 UTC by Larry O'Leary
Modified: 2018-11-29 19:51 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
: 865167 (view as bug list)
Environment:
Last Closed: 2013-09-11 11:01:34 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 881848 0 high ON_QA Resource availability type remains in state unknown after application of workaround from bug 865166 2022-03-31 04:28:41 UTC
Red Hat Knowledge Base (Solution) 228803 0 None None None 2012-10-10 22:21:03 UTC

Internal Links: 881848

Description Larry O'Leary 2012-10-10 22:20:05 UTC
Description of problem:
When accessing the *Inventory* pages such as *Platforms*, the list of resources is not created and instead the *Message Center* displays an error indicating `Globally uncaught exception` with the following stack:

    com.google.gwt.core.client.JavaScriptException:(TypeError): from.org_rhq_core_domain_resource_Resource_currentAvailability is null
    --- STACK TRACE FOLLOWS ---
    (TypeError): from.org_rhq_core_domain_resource_Resource_currentAvailability is null
    stack: org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_$copyValues__Lorg_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_2Lorg_rhq_core_domain_resource_Resource_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149635
        org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_copyValues__Lorg_rhq_core_domain_resource_Resource_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149754
        org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_copyValues__Ljava_lang_Object_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149758
        org_rhq_enterprise_gui_coregui_client_util_RPCDataSource_copyValues__Ljava_lang_Object_2ZLcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:64843
        org_rhq_enterprise_gui_coregui_client_util_RPCDataSource_$buildRecords__Lorg_rhq_enterprise_gui_coregui_client_util_RPCDataSource_2Ljava_util_Collection_2Z_3Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:64557
        org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource$2_onTypesLoaded__Ljava_util_Map_2V@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149846
        ...
        @https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:2078
        fileName: https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html
        lineNumber: 149635
           at Unknown.anonymous(Unknown Source)
           at Unknown.anonymous(Unknown Source)
           ...



This error is due to one or more platform resources in the RHQ_RESOURCE table not having a corresponding row in the RHQ_RESOURCE_AVAIL or RHQ_AVAILABILITY tables.

Version-Release number of selected component (if applicable):
3.1.1

How reproducible:
Always

Steps to Reproduce:
Not certain at this point but the hypothesis is that the impacted resource had been committed to inventory. Then, the resource or its parent resource was removed from inventory using *UNINVENTORY*. Then, the server was shutdown resulting in a partial deletion of the affected resource from the database.

A similar or the same scenario was reported when an upgrade of the JBoss ON server was being performed. First, with the server running (not sure if the agent was running or not) the agent's platform was removed from inventory and the server was immediately shutdown in preparation for an upgrade. The upgrade was performed. After the upgrade, the inventory pages would no longer work.

  
Additional info:
We know that in JBoss ON 3.1 `null` availability was no longer a valid AvailabilityType and a column constraint was added to the database table to prevent `null` from being a valid value. However, what we did not take into consideration is that many of the queries retrieve availability for a resource using a LEFT OUTER JOIN. The result of a LEFT OUTER JOIN if the resource is on the left and the resource's availability is on the right would be a result that contained an availability type value of `null`. So, perhaps this underlying partial delete issue has been present for quite some time but not detected because previously we had expected availability to be `null` and treat it as UNKNOWN.

However, the underlying issue is why the partial delete. How does the RHQ_RESOURCE table contain a resource row that does not have a corresponding RHQ_RESOURCE_AVAIL or RHQ_AVAILABILITY row. Perhaps:

 - Availability has not yet been collected for this resource or an availability report has not yet been processed by the server?

   However, in that case, we would expect the inventory status to be something other then COMMITTED. Perhaps, NEW or UNCOMMITED or DELETED, but not COMMITTED. But maybe this can and does happen? When I first import a resource I would expect that not only do I add a row to RHQ_RESOURCE but also a row to both RHQ_RESOURCE_AVAIL and RHQ_AVAILABILITY with an availability of UNKNOWN to indicate that we don't know what it is yet.

 - When removing/deleting a resource from inventory, the last place we delete is the RHQ_RESOURCE table

   The expectation would be a transaction would prevent data from being removed from dependent tables but we know that the deletion of a resource and its data (metrics, calltime, events, availability, content, etc) happens asynchronously and most likely in chunks to prevent transaction timeouts. The process is very time consuming.

Comment 1 Larry O'Leary 2012-10-17 14:46:28 UTC
A code fix has been committed to upstream that would prevent this from happening in the first place but does not fix the problem if it has already occurred. Therefore, a workaround of executing SQL against the ON database is needed to fix up the database state in the event the system has been impacted by this issue. The provided SQL has not been tested but should fix the issue.


master commit c2b37fd00701018fd5df3f99879c6cf4c2fddc4c
Jay Shaughnessy <jshaughn>
Tue Oct 16 14:04:41 2012 -0400


** NOTE!!! Once the issue occurs, the code fix will will not automatically solve
** the problem.  This workaround code below should be applied.  The fix will
** only help future upgrades, not upgrades that have already taken place.

Workaround SQL that can be applied to databases suffering from this issue:

postgres:
INSERT INTO RHQ_RESOURCE_AVAIL ( ID, RESOURCE_ID, AVAILABILITY_TYPE )
    SELECT nextval('RHQ_RESOURCE_AVAIL_ID_SEQ'::text), res.ID, 2
      FROM RHQ_RESOURCE res
 LEFT JOIN RHQ_RESOURCE_AVAIL avail ON res.ID = avail.RESOURCE_ID
     WHERE avail.ID IS NULL


oracle:
INSERT INTO RHQ_RESOURCE_AVAIL ( ID, RESOURCE_ID, AVAILABILITY_TYPE )
    SELECT RHQ_RESOURCE_AVAIL_ID_SEQ.nextval, res.ID, 2
      FROM RHQ_RESOURCE res
 LEFT JOIN RHQ_RESOURCE_AVAIL avail ON res.ID = avail.RESOURCE_ID
     WHERE avail.ID IS NULL

Comment 2 Charles Crouch 2012-11-06 21:01:57 UTC
This BZ is to backport the fix from master which will prevent this issue going forward and db upgrade updated with the sql below

Comment 3 Larry O'Leary 2012-11-06 21:06:41 UTC
GSS has confirmed that the queries do resolve this issue once it happens. So, it is recommended that we get this added to the schema update script to ensure that users who are impacted by this will fix it.

Comment 4 John Mazzitelli 2012-11-13 15:35:52 UTC
cherry picked master commit (c2b37fd00701018fd5df3f99879c6cf4c2fddc4c):

3b1a784430f4265e3ab49fc9ea05eb30c4f0f9b1

this does NOT include a db-upgrade.xml change. That has not been developed or checked into master even.

Comment 5 John Mazzitelli 2012-11-13 16:06:31 UTC
the db-upgrade code has been committed to master (4b888be16b1ae2ceb876e85cc82001bbdc79e516)

this has been cherry picked to release/jon3.1.x (note: the db schema version was changed from the master version of 2.126 to version 2.122.2 in the release branch):

5c168e69247d5e58adc7fb963b3f2f74803dc423

Comment 6 Simeon Pinder 2012-11-21 21:56:17 UTC
Moving to ON_QA as available for test with build : https://brewweb.devel.redhat.com//buildinfo?buildID=244662.

Comment 7 Filip Brychta 2012-11-29 13:05:37 UTC
Verified that upgrade to 3.1.2.ER2 fixes inconsistent state of rhq_resource_avail table (missing rows, NULL availability_type) and import of resources works for both CLI and GUI. (as part of bug 877176)
Also verified that inconsistent rhq_resource_avail table is fixed when a resource is imported.
Issue still occures when a resource is already imported and the rhq_resource_avail table is edited manualy. 
Verified that provided queries do resolve this issue.


Note You need to log in before you can comment on or make changes to this bug.