Bug 1174841 - UnsupportedOperationException on mergeInventoryReport()
Summary: UnsupportedOperationException on mergeInventoryReport()
Status: ON_QA
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server
Version: 4.13
Hardware: x86_64
OS: Linux
unspecified
high vote
Target Milestone: GA
: RHQ 4.14
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 1187645
TreeView+ depends on / blocked
 
Reported: 2014-12-16 15:22 UTC by Stian Lund
Modified: 2015-11-02 00:45 UTC (History)
4 users (show)

(edit)
Clone Of:
: 1187645 (view as bug list)
(edit)
Last Closed:


Attachments (Terms of Use)
RHQ serverlog exception (36.99 KB, text/plain)
2014-12-16 15:22 UTC, Stian Lund
no flags Details

Description Stian Lund 2014-12-16 15:22:31 UTC
Created attachment 969594 [details]
RHQ serverlog exception

Description of problem:

After upgrading to 4.13 an UnsupportedOperationException gets thrown in the log on a regular basis. Seems to be thrown every 10 minutes and relating to agent inventory reports.

Not sure if maybe related to operation history not being purged, it seems to appear at about the same interval (10 mins).

Attached is the full stack trace, seem to be two different exceptions with same error at about the same time.

Version-Release number of selected component (if applicable):

RHQ 4.13

Database Product Version : Oracle Database 11g Enterprise Edition Release 11.2.0.3.0
Driver Version : 11.2.0.3.0

How reproducible:

Always?

Steps to Reproduce:
1.
2.
3.

Actual results:
n/a

Expected results:
n/a

Additional info:

Comment 1 Jay Shaughnessy 2014-12-16 17:22:11 UTC
This is likely due to the server processing a Resource provided by the Agent.  Either the Agent's recent optimizations, or some plugin discovery code, is setting a parent Resource.childResources impl to CopyOnWriteArraySet.  That parent is being passed to the server as an "addedRoot" and the server code is not protecting itself from unexpected Set impls.  The server-side code is old.  The regression is due to it not handling the CopyOnWriteArraySet impl, which does not  support Iterator.remove.

This fix will likely also need to be applied to JON 3.3.1, clone the BZ as needed.

Comment 2 Stian Lund 2014-12-16 18:56:57 UTC
Jay; Any possible workaround for this to avoid having the exception in the log constantly?

Comment 3 Jay Shaughnessy 2014-12-17 15:51:48 UTC
Stian,

The more I look at it, this is sort of a strange issue.  I think this occurs only when a resource discovered on the Agent does not have the resource type on the Server.  Which would mean, I think, that the agent has a plugin that is not actually on the Server.  Or maybe not enabled on the Server?  Is that possible in your problem environment?

In that case, if you can get the plugin situation normalized I think this will go away.

Regardless, we'll need a code change to protect against this problem in the future.

Comment 4 Jay Shaughnessy 2014-12-17 21:39:04 UTC
Created https://github.com/rhq-project/rhq/pull/155 with proposed fix.  Will ask for review...

Comment 5 Stian Lund 2014-12-18 09:21:57 UTC
Jay; that's a good idea and I know we do have had problems with old plugins not being purged on the agent side after an update, because of using Snapshots some times.

I had a look at some of the servers where the exception is thrown and there are only the 4.13 plugins:

/opt/rhq/rhq-agent/plugins$ ls 
jopr-hibernate-plugin-4.13.0.jar
jopr-jboss-as-5-plugin-4.13.0.jar
jopr-jboss-as-plugin-4.13.0.jar
jopr-jboss-cache-plugin-4.13.0.jar
jopr-jboss-cache-v3-plugin-4.13.0.jar
jopr-tomcat-plugin-4.13.0.jar
rhq-agent-plugin-4.13.0.jar
rhq-ant-bundle-plugin-4.13.0.jar
rhq-apache-plugin-4.13.0.jar
rhq-augeas-plugin-4.13.0.jar
rhq-cassandra-plugin-4.13.0.jar
rhq-filetemplate-bundle-plugin-4.13.0.jar
rhq-jboss-as-7-plugin-4.13.0.jar
rhq-jmx-plugin-4.13.0.jar
rhq-netservices-plugin-4.13.0.jar
rhq-platform-plugin-4.13.0.jar
rhq-rhqserver-plugin-4.13.0.jar
rhq-rhqstorage-plugin-4.13.0.jar
rhq-script-plugin-4.13.0.jar

Not sure why some are still called jopr-* but some of these are old jboss-plugins we need for Hibernate statistics I think?

There are however some inconstency here - the jboss-cache-plugin are on Agent side but are marked Disabled in RHQ. Might this be the cause of the exceptions?

I will try to enable them in RHQ and see if the error goes away - then disable them including Hibernate (we don't really need it).

It could seem that plugins disabled on the server are not actually purged on the Agent side?

Comment 6 Jay Shaughnessy 2014-12-18 14:47:16 UTC
I'm not sure why you would want to disable that plugin, I think the jboss-as (AS4) plugin depends on it, although I'm not sure.  Also, I don't think disabled plugins are purged from the agents, that would need to be investigated.  Anyway, it sounds suspicious.  I'm not an expert on the plugin disable/delete/purge logic.  I see that the code fix has been merged into master, so a fix should be in the next release, but I think you may be able to workaround this by playing with the plugins.  Let us know if you succeed...

Comment 7 Stian Lund 2014-12-19 09:17:57 UTC
I'm just disabling old plugins for JbossAS 3,4,5,6 since we don't actually run those any more. I tried enabling/disabling them, restarting agent and so on, but nothing seems to stop the exception from being logged.

Maybe I should turn on debug for org.rhq.enterprise.server.discovery ?

Comment 8 Stian Lund 2015-01-30 09:55:48 UTC
Turns out, I thought this was only an annoyance in that it spammed the logs but I think it's related to RHQ being unable to discover deployments that have been uninventoried. 

The exception is thrown when running a discovery scan on the platform.

Comment 9 Jay Shaughnessy 2015-01-30 14:42:00 UTC
The fix as provided in the PR should solve the problem.

Comment 10 Libor Zoubek 2015-01-30 14:50:51 UTC
in master

commit 9c374100de109ffa2f759a16b96da5ec20dc8a9a
Author: Jay Shaughnessy <jshaughn@redhat.com>
Date:   Wed Dec 17 16:36:36 2014 -0500

    [1174841] UnsupportedOperationException on mergeInventoryReport()
    Avoid the use of Iterator.remove() because Resources coming from the
    Agent may be using a customized impl for Resource.childResources (like
    CopyOnWriteArraySet).  The solution "lazily protects" because the
    problem scenario is rare (restype reported by agent is not
    present on the server) and we don't want to do any unnecessary work (like
    changing the Set impl in advance).


Note You need to log in before you can comment on or make changes to this bug.