Red Hat Bugzilla – Bug 773031
Component start() called more than once without an intervening stop()
Last modified: 2013-09-01 06:04:29 EDT
It seems that at agent startup, or after a container restart (e.g
after a 'plugins update' agent command), that a resource component
may have its start() method invoked more than once without an
intervening stop() call. This breaks the lifecycle expectations.
I have seen this with the AS4 server resource but it may affect
other/all resources, I'm not sure.
Created attachment 551933 [details]
stack traces for the two start() calls
attaching a file that shows the stack traces for the two start() calls. There are two threads involved in each call. So the first start() triggers the first two thread stack traces; the second start() call triggers the second two thread stack traces
I think this might be the cause for Augeas leaks in resource components that inherit from AugeasConfigurationComponent (i.e. all the augeas plugins except Apache).
I have seen this using the https://github.com/rhq-project/samples/tree/master/agent/debug-tools/augeas-leak-detector but didn't spend enough time to get to the bottom of it.
The bug 766959 contains an example output of that leak detection capturing the create locations that weren't closed inside the start() methods of various Augeas-based resource components.
master commit 3a40aefab45f851facda766e766bb312675d1e63
A variety of auto-formatting changes took place, actual changes are in:
The resource component state could be STOPPED or STARTED. There was a
large window while a component was actually starting that a call
to prepareResourceForActivation would happily allow the component to
again be activated. So, added an actual STARTING state that can block
Additionally, if forcing reactivation ensure a STARTED component is
stopped before being restarted.
- remove unnecessary and bad lazy state check
- comment out some unused/debug code
- remove warnings
Not easily verifiable. General lookout for seemingly related
issues at agent startup (like resources not showing as available),
or errors in the agent log after an agent shutdown/start sequence.
if this is truly the reason for that augeas leak (and if that is so, maybe its also the cause of those agent crashes on shutdown due to augeas??) I would put this in.
talked to jay after I did some peer review and I think we are going to make some minor additions to this code. Nothing that affects the core behavior, but more for better handing of possible error conditions.
I tweeked the code after some peer review and discussion with jay. There really isn't any functionality change here, so unsure if QA wants to retest or not. Its really an internal code change for better exception handling.
master commit: f819631220c83f1ae8874953035e60720e61cff4
added a couple asserts in ResourceUpgradeTest to make sure the component state is set properly... see git commit 16224a4 in master branch
this was pushed to the release/jon3.0.x branch
opps... forgot git commit sha - 841dcce in release/jon3.0.x
i cloned this for the tracking of it in the release/jon3.0.x branch.
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.