Bug 995424 - Agent is not able to communicate with server
Summary: Agent is not able to communicate with server
Keywords:
Status: ON_DEV
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent, Communications Subsystem, Core Server
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-09 10:23 UTC by Filip Brychta
Modified: 2022-03-31 04:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)
agent log (212.25 KB, text/x-log)
2013-08-09 10:23 UTC, Filip Brychta
no flags Details
server log (714.11 KB, text/x-log)
2013-08-09 10:24 UTC, Filip Brychta
no flags Details
retestedLogs.tar.gz (440.96 KB, application/x-gzip)
2013-08-12 12:27 UTC, Filip Brychta
no flags Details

Description Filip Brychta 2013-08-09 10:23:20 UTC
Created attachment 784789 [details]
agent log

Version-Release number of selected component (if applicable):
Version: 4.9.0-SNAPSHOT
Build Number: 9632b96

How reproducible:
3/3

Steps to Reproduce:
1. install rhq: ./rhqctl install


Actual results:
Local agent is registered but not available. I can see following error in agent log:
2013-08-09 05:57:34,948 ERROR [ClientCommandSenderTask Timer Thread #1] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize callback has failed. It will be tried again. Cause: org.rhq.enterprise.communications.util.NotProcessedException:null. Cause: org.rhq.enterprise.communications.util.NotProcessedException
2013-08-09 05:57:34,948 WARN  [InventoryManager.availability-1] (rhq.core.pc.inventory.InventoryManager)- Could not transmit availability report to server
org.rhq.enterprise.communications.util.NotProcessedException
        at org.rhq.enterprise.communications.GlobalSuspendCommandListener.receivedCommand(GlobalSuspendCommandListener.java:55)
        at org.rhq.enterprise.communications.command.server.CommandProcessor.notifyListenersOfReceivedCommand(CommandProcessor.java:402)
        at org.rhq.enterprise.communications.command.server.CommandProcessor.handleIncomingInvocationRequest(CommandProcessor.java:242)
        at org.rhq.enterprise.communications.command.server.CommandProcessor.invoke(CommandProcessor.java:184)
        at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:967)
        at org.jboss.remoting.transport.servlet.ServletServerInvoker.processRequest(ServletServerInvoker.java:416)
        at sun.reflect.GeneratedMethodAccessor141.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)



Additional info:
complete agent.log attached

Comment 1 Filip Brychta 2013-08-09 10:24:31 UTC
Created attachment 784790 [details]
server log

Comment 2 Filip Brychta 2013-08-09 13:52:50 UTC
After more testing the issue seems to be nondeterministic -> unable to provide reliable repro steps -> lowering severity

Comment 3 John Mazzitelli 2013-08-09 14:02:23 UTC
this is because the server is in maintenance mode. probably due to the new code stefan added - server goes in MM if storage node is not accessible.

Comment 4 John Mazzitelli 2013-08-09 15:18:07 UTC
stefan is making changes in the code around here (has to do with the server startup/initialization code and when we flip the server to Maintenance Mode if it needs to be.

I think he is going to fix this, so I'm assigning this to him.

Comment 5 Stefan Negrea 2013-08-09 21:41:16 UTC
Please retest, the startup sequence was updated to avoid going into maintenance mode by initializing the storage cluster connection first.

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=d1aa8e0eaf0cc2ac72b09599ddc406f9928cbdd9

Comment 6 Filip Brychta 2013-08-12 12:26:18 UTC
Retested on 
Version: 4.9.0-SNAPSHOT
Build Number: 34e1c7b

Scenario:
1 - i have a clean machine (rm -rf .java/.userPrefs/rhq-*, rm -rf /var/lib/rhq/*, no rhq agent or server, rhq database is clean)
2- ./rhqctl install

Result:
2 out of 3 tries had following result:
- agent was registered
- platform resource was NOT available -> Storage Node resource not available as well
- server was in maintenance mode (Administration->Topology->Servers-> my server)
- exceptions in the agent log 
2013-08-12 06:32:20,653 ERROR [RHQ Server Polling Thread] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize callback has failed. It will be tried again. Cause: org.rhq.enterprise.communications.util.NotProcessedException:null. Cause: org.rhq.enterprise.communications.util.NotProcessedException
- after some time (don't know exact time but it was more then 10 minutes) the server went to normal mode (visible in GUI administration) but platform resource was still NOT available and the agent was still throwing exceptions


See attached retestedLogs.tar.gz with full agent and server logs (server with debug level).

Comment 7 Filip Brychta 2013-08-12 12:27:10 UTC
Created attachment 785668 [details]
retestedLogs.tar.gz

Comment 8 Filip Brychta 2013-08-12 14:22:11 UTC
Would it be possible to add some informative agent logging about server being in  maintenance mode? Something like 'Unable to connect to the server - server may be in maintenance mode'


Note You need to log in before you can comment on or make changes to this bug.