Bug 757201 - RFE: DriftManager (and maybe others) should guard against calls when shutdown
Summary: RFE: DriftManager (and maybe others) should guard against calls when shutdown
Keywords:
Status: NEW
Alias: None
Product: RHQ Project
Classification: Other
Component: drift
Version: 4.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-25 18:54 UTC by Heiko W. Rupp
Modified: 2022-03-31 04:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-28 20:25:21 UTC
Embargoed:


Attachments (Terms of Use)

Description Heiko W. Rupp 2011-11-25 18:54:57 UTC
Saw this fly by once only

18:23:50,485 ERROR [JPADriftServerBean] Failed to store drift changeset [JPADriftChangeSet [id=10014, resource=Resource[id=10001, type=Mac OS X, key=snert, name=snert, parent=<null>, version=MacOSX 10.6.8], version=0]]
java.lang.NullPointerException
        at org.rhq.core.pc.drift.DriftManager.ackChangeSet(DriftManager.java:634)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.rhq.enterprise.communications.command.impl.remotepojo.server.RemotePojoInvocationCommandService.execute(RemotePojoInvocationCommandService.java:184)
        at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

Comment 1 Jay Shaughnessy 2011-11-28 20:25:21 UTC
Depending on which build this occurred on it most likely occurred on
one of two potential lines of code. Either way, the NPE seems odd.

In one case the DriftManager seems like it was not fully initialized,
which would be very unlikely as the remoting is not started until after
initialization (meaning the server could not have made the call until
after initialization). 

In the second case it would mean that the coverage changeset was missing
from the data directory. This shouldn't really happen unless the data
directory (or at least the relevant changeset file in the data dir) was 
somehow wiped while the agent was running, or somehow the file was
not present.

I'm going to add some null handling and logging to protect here, not
sure how to explain or reproduce.

master commit 4a6ec5f97390cc5b6a1956b0b904162ea7b2b319

This is not really testable.  Closing.

Comment 2 Jay Shaughnessy 2011-11-29 14:19:13 UTC
Re-opening this to include more discussion.

After more evaluation it seems that what may have happened is that the
plugin container was shut down (probably as part of an agent shutdown).
PC shutdown currently takes place while the comm layer is still up,
meaning that it's possible for the server to make an agent service
call to a manager (like DriftManager, which happens to be the first of
several managers to get shutdown, giving it the biggest window while
the others are subsequently shutdown).

The observed NPE could certainly happen if "AckChangeSet" is invoked
after shutdown, as several global vars are nulled out.

The question is what, if anything, to do about this. Although the NPE 
is ugly it in effect probably did the same thing as a more graceful
exception would have done.  If the comm layer was shut down then
contacting the agent service would have failed in that way, or, if
the DriftManager services guarded on state then AckChangeSet would have
sent an IllegalStateException or something like that.  In all cases the
agent service would fail.

Since the plugin container, can, I think be shut down outside of a
complete agent shutdown, perhaps it's not feasible to stop the comm
layer on PC shutdown.  But, personally, I think we need a mechanism that
can block agent service calls in a blanket fashion when the PC is
being shut down, and it should start prior to the shutdown of the
individual managers.

Comment 3 John Mazzitelli 2011-11-29 14:37:39 UTC
Remember that a "remote agent service" is nothing more than the PC's managers. For example, the "DiscoveryAgentService" *is* the InventoryManager.

So when you say "shut down the comm layer", that could mean one of two things - the actual socket comm stuff (jboss/remoting) and the remote agent service endpoints. But leaving jboss/remoting aside, we technicaly DO shut down the remote agent services when we shutdown the PC because the PC managers ARE in fact the remote agent service endpoints. So when InventoryManager.shutdown is called (which it is when PluginContanier is shutdown) then the remote agent service is shutdown.

I don't think there is much more we can do here. No matter what, as you said, "Although the NPE is ugly it in effect probably did the same thing as a more graceful exception would have done."

The exceptions might be different, but SOME exception will have to result.

Comment 4 Jay Shaughnessy 2011-11-29 15:02:33 UTC
It seems then that the way to go is that the individual manager may need
to simply protect itself, if it sees the need, from calls executing
when the manager is down.  For example,

initialize() {
   ...
   initialized = true;
}

shutdown() {
   ...
   initialized = false;
}

someService {
   if (!initialized) {
      thrown IllegalStateException( "XXX Manager currently not available" );
   }
}


I'm renaming this BZ and taking it away as a blocker.


Note You need to log in before you can comment on or make changes to this bug.