Created attachment 784789 [details] agent log Version-Release number of selected component (if applicable): Version: 4.9.0-SNAPSHOT Build Number: 9632b96 How reproducible: 3/3 Steps to Reproduce: 1. install rhq: ./rhqctl install Actual results: Local agent is registered but not available. I can see following error in agent log: 2013-08-09 05:57:34,948 ERROR [ClientCommandSenderTask Timer Thread #1] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize callback has failed. It will be tried again. Cause: org.rhq.enterprise.communications.util.NotProcessedException:null. Cause: org.rhq.enterprise.communications.util.NotProcessedException 2013-08-09 05:57:34,948 WARN [InventoryManager.availability-1] (rhq.core.pc.inventory.InventoryManager)- Could not transmit availability report to server org.rhq.enterprise.communications.util.NotProcessedException at org.rhq.enterprise.communications.GlobalSuspendCommandListener.receivedCommand(GlobalSuspendCommandListener.java:55) at org.rhq.enterprise.communications.command.server.CommandProcessor.notifyListenersOfReceivedCommand(CommandProcessor.java:402) at org.rhq.enterprise.communications.command.server.CommandProcessor.handleIncomingInvocationRequest(CommandProcessor.java:242) at org.rhq.enterprise.communications.command.server.CommandProcessor.invoke(CommandProcessor.java:184) at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:967) at org.jboss.remoting.transport.servlet.ServletServerInvoker.processRequest(ServletServerInvoker.java:416) at sun.reflect.GeneratedMethodAccessor141.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) Additional info: complete agent.log attached
Created attachment 784790 [details] server log
After more testing the issue seems to be nondeterministic -> unable to provide reliable repro steps -> lowering severity
this is because the server is in maintenance mode. probably due to the new code stefan added - server goes in MM if storage node is not accessible.
stefan is making changes in the code around here (has to do with the server startup/initialization code and when we flip the server to Maintenance Mode if it needs to be. I think he is going to fix this, so I'm assigning this to him.
Please retest, the startup sequence was updated to avoid going into maintenance mode by initializing the storage cluster connection first. https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=d1aa8e0eaf0cc2ac72b09599ddc406f9928cbdd9
Retested on Version: 4.9.0-SNAPSHOT Build Number: 34e1c7b Scenario: 1 - i have a clean machine (rm -rf .java/.userPrefs/rhq-*, rm -rf /var/lib/rhq/*, no rhq agent or server, rhq database is clean) 2- ./rhqctl install Result: 2 out of 3 tries had following result: - agent was registered - platform resource was NOT available -> Storage Node resource not available as well - server was in maintenance mode (Administration->Topology->Servers-> my server) - exceptions in the agent log 2013-08-12 06:32:20,653 ERROR [RHQ Server Polling Thread] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize callback has failed. It will be tried again. Cause: org.rhq.enterprise.communications.util.NotProcessedException:null. Cause: org.rhq.enterprise.communications.util.NotProcessedException - after some time (don't know exact time but it was more then 10 minutes) the server went to normal mode (visible in GUI administration) but platform resource was still NOT available and the agent was still throwing exceptions See attached retestedLogs.tar.gz with full agent and server logs (server with debug level).
Created attachment 785668 [details] retestedLogs.tar.gz
Would it be possible to add some informative agent logging about server being in maintenance mode? Something like 'Unable to connect to the server - server may be in maintenance mode'