Bug 773031 - Component start() called more than once without an intervening stop()
Summary: Component start() called more than once without an intervening stop()
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent
Version: 4.3
Hardware: All
OS: All
urgent
high
Target Milestone: ---
: RHQ 4.3.0
Assignee: John Mazzitelli
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: jon30-sprint10, rhq43-sprint10 787290
TreeView+ depends on / blocked
 
Reported: 2012-01-10 17:43 UTC by Jay Shaughnessy
Modified: 2013-09-01 10:04 UTC (History)
3 users (show)

Fixed In Version: 4.3
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 787290 (view as bug list)
Environment:
Last Closed: 2013-09-01 10:04:29 UTC
Embargoed:


Attachments (Terms of Use)
stack traces for the two start() calls (4.27 KB, text/plain)
2012-01-10 20:48 UTC, John Mazzitelli
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 766959 0 medium CLOSED Memory leak in agent; likely use of native code 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 787290 0 urgent CLOSED Component start() called more than once without an intervening stop() 2021-02-22 00:41:40 UTC

Internal Links: 766959 787290

Description Jay Shaughnessy 2012-01-10 17:43:39 UTC
It seems that at agent startup, or after a container restart (e.g
after a 'plugins update' agent command), that a resource component
may have its start() method invoked more than once without an
intervening stop() call.  This breaks the lifecycle expectations.
I have seen this with the AS4 server resource but it may affect
other/all resources, I'm not sure.

Comment 1 John Mazzitelli 2012-01-10 20:48:08 UTC
Created attachment 551933 [details]
stack traces for the two start() calls

attaching a file that shows the stack traces for the two start() calls. There are two threads involved in each call. So the first start() triggers the first two thread stack traces; the second start() call triggers the second two thread stack traces

Comment 2 Lukas Krejci 2012-01-11 12:09:06 UTC
I think this might be the cause for Augeas leaks in resource components that inherit from AugeasConfigurationComponent (i.e. all the augeas plugins except Apache).

I have seen this using the https://github.com/rhq-project/samples/tree/master/agent/debug-tools/augeas-leak-detector but didn't spend enough time to get to the bottom of it.

The bug 766959 contains an example output of that leak detection capturing the create locations that weren't closed inside the start() methods of various Augeas-based resource components.

Comment 3 Jay Shaughnessy 2012-01-11 22:00:43 UTC
master commit 3a40aefab45f851facda766e766bb312675d1e63
A variety of auto-formatting changes took place, actual changes are in:
  InventoryManager.prepareResourceForActivation
  InventoryManager.activateResource

The resource component state could be STOPPED or STARTED.  There was a
large window while a component was actually starting that a call
to prepareResourceForActivation would happily allow the component to
again be activated. So, added an actual STARTING state that can block
redundant activation.

Additionally, if forcing reactivation ensure a STARTED component is
stopped before being restarted.

Also:
- remove unnecessary and bad lazy state check
- comment out some unused/debug code
- remove warnings


Test Notes
Not easily verifiable.  General lookout for seemingly related 
issues at agent startup (like resources not showing as available),
or errors in the agent log after an agent shutdown/start sequence.

Comment 8 John Mazzitelli 2012-02-02 13:22:36 UTC
if this is truly the reason for that augeas leak (and if that is so, maybe its also the cause of those agent crashes on shutdown due to augeas??) I would put this in.

Comment 10 John Mazzitelli 2012-02-02 17:29:23 UTC
talked to jay after I did some peer review and I think we are going to make some minor additions to this code. Nothing that affects the core behavior, but more for better handing of possible error conditions.

Comment 11 John Mazzitelli 2012-02-02 20:29:17 UTC
I tweeked the code after some peer review and discussion with jay. There really isn't any functionality change here, so unsure if QA wants to retest or not. Its really an internal code change for better exception handling.

master commit: f819631220c83f1ae8874953035e60720e61cff4

Comment 12 John Mazzitelli 2012-02-03 18:12:11 UTC
added a couple asserts in ResourceUpgradeTest to make sure the component state is set properly... see git commit 16224a4 in master branch

Comment 13 John Mazzitelli 2012-02-03 20:17:16 UTC
this was pushed to the release/jon3.0.x branch

Comment 14 John Mazzitelli 2012-02-03 20:18:17 UTC
opps... forgot git commit sha - 841dcce in release/jon3.0.x

Comment 15 John Mazzitelli 2012-02-03 20:20:10 UTC
i cloned this for the tracking of it in the release/jon3.0.x branch.

Comment 16 Heiko W. Rupp 2013-09-01 10:04:29 UTC
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.


Note You need to log in before you can comment on or make changes to this bug.