Created attachment 969594 [details] RHQ serverlog exception Description of problem: After upgrading to 4.13 an UnsupportedOperationException gets thrown in the log on a regular basis. Seems to be thrown every 10 minutes and relating to agent inventory reports. Not sure if maybe related to operation history not being purged, it seems to appear at about the same interval (10 mins). Attached is the full stack trace, seem to be two different exceptions with same error at about the same time. Version-Release number of selected component (if applicable): RHQ 4.13 Database Product Version : Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 Driver Version : 11.2.0.3.0 How reproducible: Always? Steps to Reproduce: 1. 2. 3. Actual results: n/a Expected results: n/a Additional info:
This is likely due to the server processing a Resource provided by the Agent. Either the Agent's recent optimizations, or some plugin discovery code, is setting a parent Resource.childResources impl to CopyOnWriteArraySet. That parent is being passed to the server as an "addedRoot" and the server code is not protecting itself from unexpected Set impls. The server-side code is old. The regression is due to it not handling the CopyOnWriteArraySet impl, which does not support Iterator.remove. This fix will likely also need to be applied to JON 3.3.1, clone the BZ as needed.
Jay; Any possible workaround for this to avoid having the exception in the log constantly?
Stian, The more I look at it, this is sort of a strange issue. I think this occurs only when a resource discovered on the Agent does not have the resource type on the Server. Which would mean, I think, that the agent has a plugin that is not actually on the Server. Or maybe not enabled on the Server? Is that possible in your problem environment? In that case, if you can get the plugin situation normalized I think this will go away. Regardless, we'll need a code change to protect against this problem in the future.
Created https://github.com/rhq-project/rhq/pull/155 with proposed fix. Will ask for review...
Jay; that's a good idea and I know we do have had problems with old plugins not being purged on the agent side after an update, because of using Snapshots some times. I had a look at some of the servers where the exception is thrown and there are only the 4.13 plugins: /opt/rhq/rhq-agent/plugins$ ls jopr-hibernate-plugin-4.13.0.jar jopr-jboss-as-5-plugin-4.13.0.jar jopr-jboss-as-plugin-4.13.0.jar jopr-jboss-cache-plugin-4.13.0.jar jopr-jboss-cache-v3-plugin-4.13.0.jar jopr-tomcat-plugin-4.13.0.jar rhq-agent-plugin-4.13.0.jar rhq-ant-bundle-plugin-4.13.0.jar rhq-apache-plugin-4.13.0.jar rhq-augeas-plugin-4.13.0.jar rhq-cassandra-plugin-4.13.0.jar rhq-filetemplate-bundle-plugin-4.13.0.jar rhq-jboss-as-7-plugin-4.13.0.jar rhq-jmx-plugin-4.13.0.jar rhq-netservices-plugin-4.13.0.jar rhq-platform-plugin-4.13.0.jar rhq-rhqserver-plugin-4.13.0.jar rhq-rhqstorage-plugin-4.13.0.jar rhq-script-plugin-4.13.0.jar Not sure why some are still called jopr-* but some of these are old jboss-plugins we need for Hibernate statistics I think? There are however some inconstency here - the jboss-cache-plugin are on Agent side but are marked Disabled in RHQ. Might this be the cause of the exceptions? I will try to enable them in RHQ and see if the error goes away - then disable them including Hibernate (we don't really need it). It could seem that plugins disabled on the server are not actually purged on the Agent side?
I'm not sure why you would want to disable that plugin, I think the jboss-as (AS4) plugin depends on it, although I'm not sure. Also, I don't think disabled plugins are purged from the agents, that would need to be investigated. Anyway, it sounds suspicious. I'm not an expert on the plugin disable/delete/purge logic. I see that the code fix has been merged into master, so a fix should be in the next release, but I think you may be able to workaround this by playing with the plugins. Let us know if you succeed...
I'm just disabling old plugins for JbossAS 3,4,5,6 since we don't actually run those any more. I tried enabling/disabling them, restarting agent and so on, but nothing seems to stop the exception from being logged. Maybe I should turn on debug for org.rhq.enterprise.server.discovery ?
Turns out, I thought this was only an annoyance in that it spammed the logs but I think it's related to RHQ being unable to discover deployments that have been uninventoried. The exception is thrown when running a discovery scan on the platform.
The fix as provided in the PR should solve the problem.
in master commit 9c374100de109ffa2f759a16b96da5ec20dc8a9a Author: Jay Shaughnessy <jshaughn> Date: Wed Dec 17 16:36:36 2014 -0500 [1174841] UnsupportedOperationException on mergeInventoryReport() Avoid the use of Iterator.remove() because Resources coming from the Agent may be using a customized impl for Resource.childResources (like CopyOnWriteArraySet). The solution "lazily protects" because the problem scenario is rare (restype reported by agent is not present on the server) and we don't want to do any unnecessary work (like changing the Set impl in advance).