Bug 773031

Summary: Component start() called more than once without an intervening stop()
Product: [Other] RHQ Project Reporter: Jay Shaughnessy <jshaughn>
Component: AgentAssignee: John Mazzitelli <mazz>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.3CC: hrupp, lkrejci, mazz
Target Milestone: ---   
Target Release: RHQ 4.3.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: 4.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 787290 (view as bug list) Environment:
Last Closed: 2013-09-01 10:04:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 760116, 787290    
Attachments:
Description Flags
stack traces for the two start() calls none

Description Jay Shaughnessy 2012-01-10 17:43:39 UTC
It seems that at agent startup, or after a container restart (e.g
after a 'plugins update' agent command), that a resource component
may have its start() method invoked more than once without an
intervening stop() call.  This breaks the lifecycle expectations.
I have seen this with the AS4 server resource but it may affect
other/all resources, I'm not sure.

Comment 1 John Mazzitelli 2012-01-10 20:48:08 UTC
Created attachment 551933 [details]
stack traces for the two start() calls

attaching a file that shows the stack traces for the two start() calls. There are two threads involved in each call. So the first start() triggers the first two thread stack traces; the second start() call triggers the second two thread stack traces

Comment 2 Lukas Krejci 2012-01-11 12:09:06 UTC
I think this might be the cause for Augeas leaks in resource components that inherit from AugeasConfigurationComponent (i.e. all the augeas plugins except Apache).

I have seen this using the https://github.com/rhq-project/samples/tree/master/agent/debug-tools/augeas-leak-detector but didn't spend enough time to get to the bottom of it.

The bug 766959 contains an example output of that leak detection capturing the create locations that weren't closed inside the start() methods of various Augeas-based resource components.

Comment 3 Jay Shaughnessy 2012-01-11 22:00:43 UTC
master commit 3a40aefab45f851facda766e766bb312675d1e63
A variety of auto-formatting changes took place, actual changes are in:
  InventoryManager.prepareResourceForActivation
  InventoryManager.activateResource

The resource component state could be STOPPED or STARTED.  There was a
large window while a component was actually starting that a call
to prepareResourceForActivation would happily allow the component to
again be activated. So, added an actual STARTING state that can block
redundant activation.

Additionally, if forcing reactivation ensure a STARTED component is
stopped before being restarted.

Also:
- remove unnecessary and bad lazy state check
- comment out some unused/debug code
- remove warnings


Test Notes
Not easily verifiable.  General lookout for seemingly related 
issues at agent startup (like resources not showing as available),
or errors in the agent log after an agent shutdown/start sequence.

Comment 8 John Mazzitelli 2012-02-02 13:22:36 UTC
if this is truly the reason for that augeas leak (and if that is so, maybe its also the cause of those agent crashes on shutdown due to augeas??) I would put this in.

Comment 10 John Mazzitelli 2012-02-02 17:29:23 UTC
talked to jay after I did some peer review and I think we are going to make some minor additions to this code. Nothing that affects the core behavior, but more for better handing of possible error conditions.

Comment 11 John Mazzitelli 2012-02-02 20:29:17 UTC
I tweeked the code after some peer review and discussion with jay. There really isn't any functionality change here, so unsure if QA wants to retest or not. Its really an internal code change for better exception handling.

master commit: f819631220c83f1ae8874953035e60720e61cff4

Comment 12 John Mazzitelli 2012-02-03 18:12:11 UTC
added a couple asserts in ResourceUpgradeTest to make sure the component state is set properly... see git commit 16224a4 in master branch

Comment 13 John Mazzitelli 2012-02-03 20:17:16 UTC
this was pushed to the release/jon3.0.x branch

Comment 14 John Mazzitelli 2012-02-03 20:18:17 UTC
opps... forgot git commit sha - 841dcce in release/jon3.0.x

Comment 15 John Mazzitelli 2012-02-03 20:20:10 UTC
i cloned this for the tracking of it in the release/jon3.0.x branch.

Comment 16 Heiko W. Rupp 2013-09-01 10:04:29 UTC
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.