+++ This bug was initially created as a clone of JBoss ON Bug #865166 +++ Description of problem: When accessing the *Inventory* pages such as *Platforms*, the list of resources is not created and instead the *Message Center* displays an error indicating `Globally uncaught exception` with the following stack: com.google.gwt.core.client.JavaScriptException:(TypeError): from.org_rhq_core_domain_resource_Resource_currentAvailability is null --- STACK TRACE FOLLOWS --- (TypeError): from.org_rhq_core_domain_resource_Resource_currentAvailability is null stack: org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_$copyValues__Lorg_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_2Lorg_rhq_core_domain_resource_Resource_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149635 org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_copyValues__Lorg_rhq_core_domain_resource_Resource_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149754 org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource_copyValues__Ljava_lang_Object_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149758 org_rhq_enterprise_gui_coregui_client_util_RPCDataSource_copyValues__Ljava_lang_Object_2ZLcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:64843 org_rhq_enterprise_gui_coregui_client_util_RPCDataSource_$buildRecords__Lorg_rhq_enterprise_gui_coregui_client_util_RPCDataSource_2Ljava_util_Collection_2Z_3Lcom_smartgwt_client_widgets_grid_ListGridRecord_2@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:64557 org_rhq_enterprise_gui_coregui_client_inventory_resource_ResourceDatasource$2_onTypesLoaded__Ljava_util_Map_2V@https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:149846 ... @https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html:2078 fileName: https://myjon.server:7443/coregui/org.rhq.enterprise.gui.coregui.CoreGUI/916AC7929C193E9E602839783B9B0857.cache.html lineNumber: 149635 at Unknown.anonymous(Unknown Source) at Unknown.anonymous(Unknown Source) ... This error is due to one or more platform resources in the RHQ_RESOURCE table not having a corresponding row in the RHQ_RESOURCE_AVAIL or RHQ_AVAILABILITY tables. Version-Release number of selected component (if applicable): 3.1.1 How reproducible: Always Steps to Reproduce: Not certain at this point but the hypothesis is that the impacted resource had been committed to inventory. Then, the resource or its parent resource was removed from inventory using *UNINVENTORY*. Then, the server was shutdown resulting in a partial deletion of the affected resource from the database. A similar or the same scenario was reported when an upgrade of the JBoss ON server was being performed. First, with the server running (not sure if the agent was running or not) the agent's platform was removed from inventory and the server was immediately shutdown in preparation for an upgrade. The upgrade was performed. After the upgrade, the inventory pages would no longer work. Additional info: We know that in JBoss ON 3.1 `null` availability was no longer a valid AvailabilityType and a column constraint was added to the database table to prevent `null` from being a valid value. However, what we did not take into consideration is that many of the queries retrieve availability for a resource using a LEFT OUTER JOIN. The result of a LEFT OUTER JOIN if the resource is on the left and the resource's availability is on the right would be a result that contained an availability type value of `null`. So, perhaps this underlying partial delete issue has been present for quite some time but not detected because previously we had expected availability to be `null` and treat it as UNKNOWN. However, the underlying issue is why the partial delete. How does the RHQ_RESOURCE table contain a resource row that does not have a corresponding RHQ_RESOURCE_AVAIL or RHQ_AVAILABILITY row. Perhaps: - Availability has not yet been collected for this resource or an availability report has not yet been processed by the server? However, in that case, we would expect the inventory status to be something other then COMMITTED. Perhaps, NEW or UNCOMMITED or DELETED, but not COMMITTED. But maybe this can and does happen? When I first import a resource I would expect that not only do I add a row to RHQ_RESOURCE but also a row to both RHQ_RESOURCE_AVAIL and RHQ_AVAILABILITY with an availability of UNKNOWN to indicate that we don't know what it is yet. - When removing/deleting a resource from inventory, the last place we delete is the RHQ_RESOURCE table The expectation would be a transaction would prevent data from being removed from dependent tables but we know that the deletion of a resource and its data (metrics, calltime, events, availability, content, etc) happens asynchronously and most likely in chunks to prevent transaction timeouts. The process is very time consuming.
master commit c2b37fd00701018fd5df3f99879c6cf4c2fddc4c Jay Shaughnessy <jshaughn> Tue Oct 16 14:04:41 2012 -0400 I believe this is the cause of the failure. In certain cases a legacy db may not have an rhq_resource_avail record for a resource. It is now eagerly created but in the past it was possible to have resource without the record. The old codepath for initializing resource avail for these resources had an issue. ** NOTE!!! Once the issue occurs, the code fix will will not automatically solve ** the problem. This workaround code below should be applied. The fix will ** only help future upgrades, not upgrades that have already taken place. Workaround SQL that can be applied to databases suffering from this issue: postgres: INSERT INTO RHQ_RESOURCE_AVAIL ( ID, RESOURCE_ID, AVAILABILITY_TYPE ) SELECT nextval('RHQ_RESOURCE_AVAIL_ID_SEQ'::text), res.ID, 2 FROM RHQ_RESOURCE res LEFT JOIN RHQ_RESOURCE_AVAIL avail ON res.ID = avail.RESOURCE_ID WHERE avail.ID IS NULL oracle: INSERT INTO RHQ_RESOURCE_AVAIL ( ID, RESOURCE_ID, AVAILABILITY_TYPE ) SELECT RHQ_RESOURCE_AVAIL_ID_SEQ.nextval, res.ID, 2 FROM RHQ_RESOURCE res LEFT JOIN RHQ_RESOURCE_AVAIL avail ON res.ID = avail.RESOURCE_ID WHERE avail.ID IS NULL Test Notes: Not really easy to test. I'd suggest probably just moving this to VERIFIED if the relevant customer cases are resolved.
DevNote: If this resolves the issues, the query above could be moved to the db-upgrade.xml script and the codepath could be eliminated completely.
As per 865166 db upgrade needs to be updated with the sql below
(In reply to comment #3) > As per 865166 db upgrade needs to be updated with the sql below git commit to master 4b888be16b1ae2ceb876e85cc82001bbdc79e516 adds the sql to the db-upgrade script.
Bulk closing of issues in old RHQ releases that are in production for a while now. Please open a new issue when running into an issue.