Bug 824401 - Missing parent resource container for parent resource
Summary: Missing parent resource container for parent resource
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Plugin Container, Plugins
Version: 4.4
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: RHQ 4.5.0
Assignee: Charles Crouch
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: jon310-sprint11, rhq44-sprint11 825019
TreeView+ depends on / blocked
 
Reported: 2012-05-23 12:05 UTC by Libor Zoubek
Modified: 2015-11-02 00:42 UTC (History)
7 users (show)

Fixed In Version: 4.5
Clone Of:
: 825019 (view as bug list)
Environment:
Last Closed: 2013-09-01 10:03:48 UTC
Embargoed:


Attachments (Terms of Use)
agent.log (1.35 MB, application/octet-stream)
2012-05-23 12:07 UTC, Libor Zoubek
no flags Details

Description Libor Zoubek 2012-05-23 12:05:33 UTC
Description of problem: After some time (12hours) of running my testing setup I found one of my 2 agents (AUTO) in kind of broken state.

My setup:
JON 3.1.ER4 server
Agent AUTO
Agent MANUAL

On both agents there is EAP6 in domain and standalone mode (so 2 instances for each agent), I run automation on agent AUTO.

Version-Release number of selected component (if applicable):
Version: 3.1.0.ER4 
Build Number: 1783b86:2b8d25d 

How reproducible:very hard


Steps to Reproduce:

There are no steps that I am aware of. But once this issue comes to play, it is easy to reproduce it on running agent again.

1.remove any (I removed RHQ Server AS) platform child from inventory
2.import it again - it takes more time for the resource to appear in discovery queue  
Actual results: during or after import you get bunch of exceptions in agent log saying:

org.rhq.core.clientapi.agent.PluginContainerException: Failed to obtain classloader for resource: Resource[id=0, uuid=d89d5c8b-03bd-42c9-8ad6-a0fd1087eefc, type={JBossAS7}SocketBindingGroup, key=socket-binding-group=standard-sockets, name=standard-sockets, parent=EAP Domain Controller (0.0.0.0:8990)]

Caused by: org.rhq.core.clientapi.agent.PluginContainerException: [Warning] Missing parent resource container for parent resource=Res
ource[id=11071, uuid=277964d5-e328-432a-b5e1-95df0b439d05, type={JBossAS7}JBossAS7 Host Controller, key=/home/hudson/jbas-instances/j
boss-eap6-domain/domain, name=EAP Domain Controller (0.0.0.0:8990), parent=dhcp-31-185.brq.redhat.com, version=EAP 6.0.0.GA]



Additional info:
1. EAP Domain Controller was NOT the resource being inventoried
2. if you repeat steps to reproduce it's not always going to be 'key=socket-binding-group=standard-sockets, name=standard-sockets' that has issues
3. There is possible relation to AS7 plugin

Comment 1 Libor Zoubek 2012-05-23 12:07:34 UTC
Created attachment 586320 [details]
agent.log

Comment 2 Charles Crouch 2012-05-23 14:04:05 UTC
Setting to urgent for further investigation.

Comment 3 Jay Shaughnessy 2012-05-23 14:16:13 UTC
(9:40:47 AM) jshaughn: The BZ from Libor looks like the same thing I saw in Ian's log
(9:40:54 AM) jshaughn: I think I can explain this
(9:41:46 AM) jshaughn: It has to do, I think, with an agent sync happening, that includes uninventoried resources, atthe same time the hieracrhy is being traversed
(9:43:11 AM) jshaughn: to get a resource classloader you need the parent container, which, I think may have gone away due to the sync
(9:43:31 AM) jshaughn: that's the theory at least
(10:03:11 AM) lzoubek: jshaughn, so this might happen when I uninventory a resource during service scan on one of its children? If you do not need to see my broken setup, I'll clean it up and try reproducing
(10:07:01 AM) jshaughn: lzoubek: yes
(10:07:16 AM) jshaughn: that's basically the theory I have so far
(10:07:45 AM) jshaughn: The agent can run a sync at the same time as executing an avail scan, or a discovery scan, for example.
(10:08:01 AM) lzoubek: ok, I'll play with it
(10:08:02 AM) jshaughn: those scans typically recurse through the inventory
(10:08:29 AM) jshaughn: but if we alter the inventory during that time it is possible that a parent could disappear
(10:08:40 AM) jshaughn: if the sync removes resources
(10:09:06 AM) jshaughn: so, we may need to do something to protect against this, or to gracefully accept it
(10:09:14 AM) jshaughn: probably the latter
(10:09:29 AM) jshaughn: as to protect would probably mean adding more locking

Comment 4 John Mazzitelli 2012-05-23 14:54:10 UTC
org.rhq.core.clientapi.agent.PluginContainerException: Failed to obtain classloader for resource: Resource[id=0, uuid=d89d5c8b-03bd-42c9-8ad6-a0fd1087eefc, type={JBossAS7}SocketBindingGroup, key=socket-binding-group=standard-sockets, name=standard-sockets, parent=EAP Domain Controller (0.0.0.0:8990)]

the id on this reaource is 0, which tells me its in the NEW inventory state (hasn't been committed). I wonder if we don't assign classloaders to resouces not yet committed?

In any case, this is a new resource (since id=0 means it hasn't been sync'ed wih the server so it can't have been COMMITTED state yet - the agent doesn't COMMIT by itself). And we shouldn't be doing anything with new resources.

Comment 5 Libor Zoubek 2012-05-23 15:18:57 UTC
Jay, it looks like your theory is correct:

I've:
 * ./rhq-agent.sh --purgedata
 * I've imported both EAPs
 * right after that removed both EAPs resources from inventory
here I got bunch of classloader exceptions on agent.log

I can now reproduce it 100%. 

I've also tried this:
 * ./rhq-agent.sh --purgedata
 * I've imported both EAPs
 * wait 20minutes
 * removed both EAPs resources from inventory
And there are no classloader errors

Comment 6 Ian Springer 2012-05-24 17:14:02 UTC
handle condition of a missing parent resourceContainer more gracefully in a few places, since it's normal in situations where the corresponding resource was just uninventoried - we now log a DEBUG message, rather than an ERROR message + stack trace; add a PC integration test that verifies Resource uninventory works:

[master http://git.fedorahosted.org/git?p=rhq/rhq.git;a=commitdiff;h=5c4322c]

Comment 8 Heiko W. Rupp 2013-09-01 10:03:48 UTC
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.


Note You need to log in before you can comment on or make changes to this bug.