Description of problem: When accessing the *Inventory* pages such as *Platforms*, the list of resources is not created and instead the *Message Center* displays an error indicating `Globally uncaught exception` with the following stack: com.google.gwt.core.client.JavaScriptException:(TypeError): from.org_rhq_core_domain_resource_Resource_currentAvailability is null --- STACK TRACE FOLLOWS --- (TypeError): from.org_rhq_core_domain_resource_Resource_currentAvailability is null stack: org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_$copyValues__Lorg_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_2Lorg_rhq_core_domain_resource_Resource_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149635 org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_copyValues__Lorg_rhq_core_domain_resource_Resource_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149754 org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_copyValues__Ljava_lang_Object_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149758 org_rhq_enterprise_gui_coregui_client_util_RPCDataSource_copyValues__Ljava_lang_Object_2ZLcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:64843 org_rhq_enterprise_gui_coregui_client_util_RPCDataSource_$buildRecords__Lorg_rhq_enterprise_gui_coregui_client_util_RPCDataSource_2Ljava_util_Collection_2Z_3Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:64557 org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource$2_onTypesLoaded__Ljava_util_Map_2V@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149846 ... @https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:2078 fileName: https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html lineNumber: 149635 at Unknown.anonymous(Unknown Source) at Unknown.anonymous(Unknown Source) ... This error is due to one or more platform resources in the RHQ_RESOURCE table not having a corresponding row in the RHQ_RESOURCE_AVAIL or RHQ_AVAILABILITY tables. Version-Release number of selected component (if applicable): 3.1.1 How reproducible: Always Steps to Reproduce: Not certain at this point but the hypothesis is that the impacted resource had been committed to inventory. Then, the resource or its parent resource was removed from inventory using *UNINVENTORY*. Then, the server was shutdown resulting in a partial deletion of the affected resource from the database. A similar or the same scenario was reported when an upgrade of the JBoss ON server was being performed. First, with the server running (not sure if the agent was running or not) the agent's platform was removed from inventory and the server was immediately shutdown in preparation for an upgrade. The upgrade was performed. After the upgrade, the inventory pages would no longer work. Additional info: We know that in JBoss ON 3.1 `null` availability was no longer a valid AvailabilityType and a column constraint was added to the database table to prevent `null` from being a valid value. However, what we did not take into consideration is that many of the queries retrieve availability for a resource using a LEFT OUTER JOIN. The result of a LEFT OUTER JOIN if the resource is on the left and the resource's availability is on the right would be a result that contained an availability type value of `null`. So, perhaps this underlying partial delete issue has been present for quite some time but not detected because previously we had expected availability to be `null` and treat it as UNKNOWN. However, the underlying issue is why the partial delete. How does the RHQ_RESOURCE table contain a resource row that does not have a corresponding RHQ_RESOURCE_AVAIL or RHQ_AVAILABILITY row. Perhaps: - Availability has not yet been collected for this resource or an availability report has not yet been processed by the server? However, in that case, we would expect the inventory status to be something other then COMMITTED. Perhaps, NEW or UNCOMMITED or DELETED, but not COMMITTED. But maybe this can and does happen? When I first import a resource I would expect that not only do I add a row to RHQ_RESOURCE but also a row to both RHQ_RESOURCE_AVAIL and RHQ_AVAILABILITY with an availability of UNKNOWN to indicate that we don't know what it is yet. - When removing/deleting a resource from inventory, the last place we delete is the RHQ_RESOURCE table The expectation would be a transaction would prevent data from being removed from dependent tables but we know that the deletion of a resource and its data (metrics, calltime, events, availability, content, etc) happens asynchronously and most likely in chunks to prevent transaction timeouts. The process is very time consuming.
A code fix has been committed to upstream that would prevent this from happening in the first place but does not fix the problem if it has already occurred. Therefore, a workaround of executing SQL against the ON database is needed to fix up the database state in the event the system has been impacted by this issue. The provided SQL has not been tested but should fix the issue. master commit c2b37fd00701018fd5df3f99879c6cf4c2fddc4c Jay Shaughnessy <jshaughn> Tue Oct 16 14:04:41 2012 -0400 ** NOTE!!! Once the issue occurs, the code fix will will not automatically solve ** the problem. This workaround code below should be applied. The fix will ** only help future upgrades, not upgrades that have already taken place. Workaround SQL that can be applied to databases suffering from this issue: postgres: INSERT INTO RHQ_RESOURCE_AVAIL ( ID, RESOURCE_ID, AVAILABILITY_TYPE ) SELECT nextval('RHQ_RESOURCE_AVAIL_ID_SEQ'::text), res.ID, 2 FROM RHQ_RESOURCE res LEFT JOIN RHQ_RESOURCE_AVAIL avail ON res.ID = avail.RESOURCE_ID WHERE avail.ID IS NULL oracle: INSERT INTO RHQ_RESOURCE_AVAIL ( ID, RESOURCE_ID, AVAILABILITY_TYPE ) SELECT RHQ_RESOURCE_AVAIL_ID_SEQ.nextval, res.ID, 2 FROM RHQ_RESOURCE res LEFT JOIN RHQ_RESOURCE_AVAIL avail ON res.ID = avail.RESOURCE_ID WHERE avail.ID IS NULL
This BZ is to backport the fix from master which will prevent this issue going forward and db upgrade updated with the sql below
GSS has confirmed that the queries do resolve this issue once it happens. So, it is recommended that we get this added to the schema update script to ensure that users who are impacted by this will fix it.
cherry picked master commit (c2b37fd00701018fd5df3f99879c6cf4c2fddc4c): 3b1a784430f4265e3ab49fc9ea05eb30c4f0f9b1 this does NOT include a db-upgrade.xml change. That has not been developed or checked into master even.
the db-upgrade code has been committed to master (4b888be16b1ae2ceb876e85cc82001bbdc79e516) this has been cherry picked to release/jon3.1.x (note: the db schema version was changed from the master version of 2.126 to version 2.122.2 in the release branch): 5c168e69247d5e58adc7fb963b3f2f74803dc423
Moving to ON_QA as available for test with build : https://brewweb.devel.redhat.com//buildinfo?buildID=244662.
Verified that upgrade to 3.1.2.ER2 fixes inconsistent state of rhq_resource_avail table (missing rows, NULL availability_type) and import of resources works for both CLI and GUI. (as part of bug 877176) Also verified that inconsistent rhq_resource_avail table is fixed when a resource is imported. Issue still occures when a resource is already imported and the rhq_resource_avail table is edited manualy. Verified that provided queries do resolve this issue.