Bug 1052390 - Agent NullPointerException (NPE) in org.jboss.remoting.Client.invoke method
Summary: Agent NullPointerException (NPE) in org.jboss.remoting.Client.invoke method
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: GA
: RHQ 4.10
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
: 1024145 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-13 18:29 UTC by Elias Ross
Modified: 2014-04-23 12:30 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-04-23 12:30:29 UTC
Embargoed:


Attachments (Terms of Use)
Patch for RHQ_4_9_0 (28.54 KB, patch)
2014-01-28 19:22 UTC, Elias Ross
no flags Details | Diff

Description Elias Ross 2014-01-13 18:29:18 UTC
Description of problem:

There appears to be a NPE in the agent communications.

2014-01-13 12:57:59,746 ERROR [ClientCommandSenderTask Timer Thread #89264] (JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize callback has failed. It will be tried again. Cause: java.lang.NullPointerException:null. Cause: java.lang
.NullPointerException

Also:

2014-01-13 12:57:59,746 WARN  [InventoryManager.availability-1] (InventoryManager)- Could not transmit availability report to server
java.lang.NullPointerException
        at org.jboss.remoting.Client.invoke(Client.java:2084)
        at org.jboss.remoting.Client.invoke(Client.java:879)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.rawSend(JBossRemotingRemoteCommunicator.java:514)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutCallbacks(JBossRemotingRemoteCommunicator.java:456)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.sendWithoutInitializeCallback(JBossRemotingRemoteCommunicator.java:475)
        at org.rhq.enterprise.agent.AgentMain.sendConnectRequestToServer(AgentMain.java:2112)
        at org.rhq.enterprise.agent.ConnectAgentInitializeCallback.sendingInitialCommand(ConnectAgentInitializeCallback.java:43)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.invokeInitializeCallbackIfNeeded(JBossRemotingRemoteCommunicator.java:579)
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.send(JBossRemotingRemoteCommunicator.java:491)
        at org.rhq.enterprise.communications.command.client.AbstractCommandClient.invoke(AbstractCommandClient.java:143)
        at org.rhq.enterprise.communications.command.client.ClientCommandSender.send(ClientCommandSender.java:1084)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.send(ClientCommandSenderTask.java:229)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:107)
        at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:55)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

Code:

public class Client implements Externalizable {
...
   private Object invoke(Object param, Map metadata, InvokerLocator callbackServerLocator)
         throws Throwable
   {
      if (isConnected())
      {
         return invoker.invoke(new InvocationRequest(sessionId, subsystem, param,
                                                     metadata, null, callbackServerLocator));

^^^ Seems that invoker is set to null. So it thinks it is connected but 'invoker' is somehow null.

Version-Release number of selected component (if applicable): 4.9


How reproducible: Unclear

Comment 1 Elias Ross 2014-01-18 01:20:10 UTC
The biggest problem is the errors repeat over and over again and the agent never connects. The only way to repeat it is to restart the agent.

Couple of ideas:
1. Client isn't really thread safe (as, member variables aren't volatile, etc.) It could be a thread visibility issue. However, as the error repeats (indefinitely) it appears a bad state condition.
2. Disconnect is happening while the invoker is being used. I'm not sure this would result in repeating errors.

Looking at the code, it's clear some refactoring is in order. I'll post my patch when tested and approved.

Comment 2 Elias Ross 2014-01-28 19:22:51 UTC
Created attachment 856790 [details]
Patch for RHQ_4_9_0

I've tested this fix with about 1500 agents in two different environments.

I haven't seen the above NullPointerException anymore at the very least. Not sure there are potential regressions, but the NPE issue went away.

Comment 3 Jay Shaughnessy 2014-01-29 23:06:22 UTC
reviewing...

Comment 4 Jay Shaughnessy 2014-01-30 17:00:25 UTC
master commit 37263f7ece17f28541702666009fe057a28452c1
Author: Jay Shaughnessy <jshaughn>
Date:   Thu Jan 30 11:11:46 2014 -0500

    BZ 1052390 - Clean up remoting wrapper to avoid race conditions if possible
    
    Some unused or rarely constructors and methods were dropped.
    
    The biggest change is in the client caching. The cache code is guaranteed to
    call disconnect when a client is 'thrown away'. There is still a possibility
    disconnect can happen in the middle of an invoke.
    
    Original Author: Elias Ross <elias_ross>
    Signed-off-by: Jay Shaughnessy <jshaughn>
    Applying this patch as-is, I see no issues with it and it cleans some



QA Test Notes:
This is not directly testable and barring identified runtime regressions can be set to Verified.  It is covered by unit testing.

Comment 5 Elias Ross 2014-02-19 03:24:38 UTC
*** Bug 1024145 has been marked as a duplicate of this bug. ***

Comment 6 Heiko W. Rupp 2014-04-23 12:30:29 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.


Note You need to log in before you can comment on or make changes to this bug.