Bug 836137 - agent rpm - agent via service start cannot be connected to separated server
Summary: agent rpm - agent via service start cannot be connected to separated server
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent
Version: unspecified
Hardware: x86_64
OS: Linux
urgent
medium
Target Milestone: ---
: JON 3.1.0
Assignee: Deon Ballard
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 837381
TreeView+ depends on / blocked
 
Reported: 2012-06-28 07:52 UTC by Armine Hovsepyan
Modified: 2015-09-03 00:01 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-20 16:51:48 UTC
Embargoed:


Attachments (Terms of Use)

Description Armine Hovsepyan 2012-06-28 07:52:11 UTC
Description of problem:
Old infrastructure - server is installed in one machine - agent is installed on other machine. After agent installation service jon-agent start doesn't connect to server even if the configuration is set correctly and start via wrapper script does dork correctly.

Version-Release number of selected component (if applicable):
org.jboss.on-jboss-on-parent-3.1.0.GA-8

How reproducible:
always

Steps to Reproduce:
1. Install server on rhel6 via zip
2. install agent on another rhel6 machine via jar installation
3. configure agent so that it connects for server
4. Stop agent
5. install agent through rpm
6. start agent via wrapper script 
7. stop agent
8. start agent via service jon-aent
  
Actual results:
agent never connects to server - auto detection is disabled

Expected results:
agent is conncted to "separated" server

Additional info:
agent configuration file contains correct server host, and wrapper script starts agent correctly and connects to server.


I've puted severity as medium, but in my eyes it's really high.

Comment 1 Charles Crouch 2012-07-03 14:51:45 UTC
This should at least be investigate for jon311

Comment 2 Stefan Negrea 2012-07-05 15:32:14 UTC
Starting the agent first via the wrapper could be the cause for the agent not being able to start as a service. This could be duplicate of bug 835892. 

Please repeat the test with the following steps:
1) Install the RPM
2) Update the configuration
3) Start the service without first using the wrapper directly


Please attach the agent logs files in case of any failures.

Comment 3 Zhengping Jin 2012-07-09 20:03:23 UTC
performed an experiment with wireshark to capture the packages.
if: run "service jon-agent start", no communication packages captured between server and agent. result: the agent could not connect to the server.
else run "rhq-agent.sh", communication packages captured successfully. Result: the agent connected to the server.
Further analysis is required.

Comment 4 Armine Hovsepyan 2012-07-10 14:47:55 UTC
have changed server address in both agent-configuration.xml and rhq-agent-env.sh files and still service start doesn't work, agent cannot register with server.

Log contains:

service start with non-root user

[hudson@dhcp-31-221 logs]$ tail -f -n 200 agent.log 
2012-07-09 11:42:33,490 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.identify-version}Version=[RHQ 4.4.0.JON310GA], Build Number=[a53e41e], Build Date=[Jun 8, 2012 9:48 AM]
2012-07-09 11:42:33,595 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.agent-name-auto-generated}The name of this agent was not predefined so it was auto-generated. The agent name is now [dhcp-31-221.brq.redhat.com]
2012-07-09 11:42:33,800 INFO  [main] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.global-concurrency-limit-disabled}Global concurrency limit has been disabled - there is no limit to the number of incoming commands allowed
2012-07-09 11:42:33,933 INFO  [main] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.started}Service container started - ready to accept incoming commands
2012-07-09 11:42:33,933 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.no-auto-detect}Server auto-detection is not enabled - starting the poller immediately
2012-07-09 11:43:33,956 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.waiting-to-be-registered-begin}The agent will now wait until it has registered with the server...
2012-07-09 11:46:42,693 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.identify-version}Version=[RHQ 4.4.0.JON310GA], Build Number=[a53e41e], Build Date=[Jun 8, 2012 9:48 AM]
2012-07-09 11:46:43,007 INFO  [main] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.global-concurrency-limit-disabled}Global concurrency limit has been disabled - there is no limit to the number of incoming commands allowed
2012-07-09 11:46:43,122 INFO  [main] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.started}Service container started - ready to accept incoming commands
2012-07-09 11:46:43,122 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.no-auto-detect}Server auto-detection is not enabled - starting the poller immediately




service start with root user

[root@dhcp-31-221 logs]# tail -f -n 200 agent.log 
2012-07-09 11:50:58,117 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.identify-version}Version=[RHQ 4.4.0.JON310GA], Build Number=[a53e41e], Build Date=[Jun 8, 2012 9:48 AM]
2012-07-09 11:50:58,378 INFO  [main] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.global-concurrency-limit-disabled}Global concurrency limit has been disabled - there is no limit to the number of incoming commands allowed
2012-07-09 11:50:58,514 INFO  [main] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.started}Service container started - ready to accept incoming commands
2012-07-09 11:50:58,514 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.no-auto-detect}Server auto-detection is not enabled - starting the poller immediately
2012-07-09 11:51:58,542 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.waiting-to-be-registered-begin}The agent will now wait until it has registered with the server...
2012-07-09 13:51:35,923 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.identify-version}Version=[RHQ 4.4.0.JON310GA], Build Number=[a53e41e], Build Date=[Jun 8, 2012 9:48 AM]
2012-07-09 13:51:36,217 INFO  [main] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.global-concurrency-limit-disabled}Global concurrency limit has been disabled - there is no limit to the number of incoming commands allowed
2012-07-09 13:51:36,334 INFO  [main] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.started}Service container started - ready to accept incoming commands
2012-07-09 13:51:36,335 INFO  [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.no-auto-detect}Server auto-detection is not enabled - starting the poller immediately

Comment 5 Zhengping Jin 2012-07-10 21:28:52 UTC
enable DEBUG in rhq-agent-env.sh

console outputed: The agent will now wait until it has registered with the server... 

and part of the log shows:

There is no security token yet - the server will not accept commands from this agent until the agent is registered.

Unable to retrieve response message java.net.ConnectException: Connection refused

Failed to successfully poll the server. This is normally due to the server not being up yet. You can usually ignore this message since it will be tried again later, however, you should ensure this failure was not really caused by a misconfiguration. Cause: org.jboss.remoting.CannotConnectException:Can not connect http client invoker. Connection refused. -> java.net.ConnectException:Connection refused

The agent will now wait until it has registered with the server...

There is no security token yet - the server will not accept commands from this agent until the agent is registered.


for complete reference, pls see (available for one month)

http://pastebin.test.redhat.com/96430    (console outputs)
http://pastebin.test.redhat.com/96431    (log)

This debug explains why the agent doesn't request to register the server.

Comment 6 Stefan Negrea 2012-07-11 21:30:16 UTC
Updated the init scripts (ec2 and regular) to bypass the wrapper and call directly the rhq-agent script to allow reconfiguration and user prompt when the service is invoked with 'service jon-agent config'


The following agent options are used simultaneously by the script for the config option:
--cleanconfig  (clean the previous config)
--nostart (do not start the agent at the end of the configuration)
--daemon (combined with nostart makes the agent to quit at the end of the setup)
--setup (forces the agent to prompt for configuration)
--advanced (combined with setup forces the agent to prompt for advanced configuration)

Comment 7 Stefan Negrea 2012-07-11 21:33:11 UTC
Documentation updates needed for config usage and possible scenarios where this option is useful.

Comment 8 Armine Hovsepyan 2012-07-12 09:25:39 UTC
service jon-agent config fixed everything, now agetn can be started via service start and can connect to remote/separated server.

verified!


bug for documentation is created: https://bugzilla.redhat.com/show_bug.cgi?id=839547


Note You need to log in before you can comment on or make changes to this bug.