Bug 824401
Summary: | Missing parent resource container for parent resource | ||||||
---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Libor Zoubek <lzoubek> | ||||
Component: | Plugin Container, Plugins | Assignee: | Charles Crouch <ccrouch> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 4.4 | CC: | ccrouch, hbrock, hrupp, jshaughn, loleary, mazz, theute | ||||
Target Milestone: | --- | ||||||
Target Release: | RHQ 4.5.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | 4.5 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 825019 (view as bug list) | Environment: | |||||
Last Closed: | 2013-09-01 10:03:48 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 782579, 825019 | ||||||
Attachments: |
|
Description
Libor Zoubek
2012-05-23 12:05:33 UTC
Created attachment 586320 [details]
agent.log
Setting to urgent for further investigation. (9:40:47 AM) jshaughn: The BZ from Libor looks like the same thing I saw in Ian's log (9:40:54 AM) jshaughn: I think I can explain this (9:41:46 AM) jshaughn: It has to do, I think, with an agent sync happening, that includes uninventoried resources, atthe same time the hieracrhy is being traversed (9:43:11 AM) jshaughn: to get a resource classloader you need the parent container, which, I think may have gone away due to the sync (9:43:31 AM) jshaughn: that's the theory at least (10:03:11 AM) lzoubek: jshaughn, so this might happen when I uninventory a resource during service scan on one of its children? If you do not need to see my broken setup, I'll clean it up and try reproducing (10:07:01 AM) jshaughn: lzoubek: yes (10:07:16 AM) jshaughn: that's basically the theory I have so far (10:07:45 AM) jshaughn: The agent can run a sync at the same time as executing an avail scan, or a discovery scan, for example. (10:08:01 AM) lzoubek: ok, I'll play with it (10:08:02 AM) jshaughn: those scans typically recurse through the inventory (10:08:29 AM) jshaughn: but if we alter the inventory during that time it is possible that a parent could disappear (10:08:40 AM) jshaughn: if the sync removes resources (10:09:06 AM) jshaughn: so, we may need to do something to protect against this, or to gracefully accept it (10:09:14 AM) jshaughn: probably the latter (10:09:29 AM) jshaughn: as to protect would probably mean adding more locking org.rhq.core.clientapi.agent.PluginContainerException: Failed to obtain classloader for resource: Resource[id=0, uuid=d89d5c8b-03bd-42c9-8ad6-a0fd1087eefc, type={JBossAS7}SocketBindingGroup, key=socket-binding-group=standard-sockets, name=standard-sockets, parent=EAP Domain Controller (0.0.0.0:8990)] the id on this reaource is 0, which tells me its in the NEW inventory state (hasn't been committed). I wonder if we don't assign classloaders to resouces not yet committed? In any case, this is a new resource (since id=0 means it hasn't been sync'ed wih the server so it can't have been COMMITTED state yet - the agent doesn't COMMIT by itself). And we shouldn't be doing anything with new resources. Jay, it looks like your theory is correct: I've: * ./rhq-agent.sh --purgedata * I've imported both EAPs * right after that removed both EAPs resources from inventory here I got bunch of classloader exceptions on agent.log I can now reproduce it 100%. I've also tried this: * ./rhq-agent.sh --purgedata * I've imported both EAPs * wait 20minutes * removed both EAPs resources from inventory And there are no classloader errors handle condition of a missing parent resourceContainer more gracefully in a few places, since it's normal in situations where the corresponding resource was just uninventoried - we now log a DEBUG message, rather than an ERROR message + stack trace; add a PC integration test that verifies Resource uninventory works: [master http://git.fedorahosted.org/git?p=rhq/rhq.git;a=commitdiff;h=5c4322c] Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since. |