Bug 782890
Summary: | -D option on rhq-agent startup is not working | |||
---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Mike Foley <mfoley> | |
Component: | Agent | Assignee: | RHQ Project Maintainer <rhq-maint> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | unspecified | CC: | akostadi, hrupp, mazz | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | RHQ 4.3.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 783876 783877 (view as bug list) | Environment: | ||
Last Closed: | 2013-08-31 10:16:58 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 783876, 783877 |
Description
Mike Foley
2012-01-18 19:35:38 UTC
did you already have a failover list? the failover list probably overrides any server addr/port you provided. to test, do what you did, but before you enter the rhq-agent.sh command, delete the data/failover.dat file. Note, if I start the agent with -n (--nostart - this won't start any comm or plugin container), the port is changed. Will look more into this to see what happens at startup that might changn this. I suspect its the failover list.
$ ./rhq-agent.sh -n -Drhq.agent.server.bind-port=7090
RHQ 4.3.0-SNAPSHOT [b295126] (Fri Jan 20 11:17:50 EST 2012)
> getconfig rhq.agent.server.bind-port
rhq.agent.server.bind-port=7090
more tests - if you have a clean/new agent, the 7090 port is used: rhq-agent.sh -l -Drhq.agent.server.bind-port=7090 The setup questions are asked, and the default server port will be 7090. OK, this is working as expected. I will close this issue as such. Here's the explanation.
I have an agent that has previously registered with the server and as such got assigned a failover-list (see data/failover-list.dat). I shut the agent down and restart it. Here's some cmdline shell output (my shell has a current working directory of my agent's bin directory):
$ cat ../data/failover-list.dat
mazztower:7080/7443
$ ./rhq-agent.sh -Drhq.agent.server.bind-port=12345
RHQ 4.3.0-SNAPSHOT [b295126] (Fri Jan 20 11:17:50 EST 2012)
> getconfig rhq.agent.server.bind-port
rhq.agent.server.bind-port=7080
What is the magic you ask? Take a look at your agent log messages and you'll see:
2012-01-20 12:50:29,721 INFO [RHQ Server Polling Thread] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.changing-endpoint}Communicator is changing endpoint from [InvokerLocator [servlet://mazztower:12345/jboss-remoting-servlet-invoker/ServerInvokerServlet]] to [InvokerLocator [servlet://mazztower:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet]]
So, here you can see it DID use that override port number specified by the -D (in my case, 12345). BUT! The agent also sees its a bogus port - it can't talk to the server there, so its smart enough to immediately begin its failover backup plan. It says, "OK, this server endpoint is down, I will look for a failover list, and if I have one, go to the next server in the list".
Well, as you see in my above cmdline shell output, I DO have a failover-list.dat and it has "mazztower:7080/7443" as a server that it should try next.
And the agent does. It immediately switches over as you see in the next agent log message:
2012-01-20 12:50:29,843 INFO [RHQ Server Polling Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.failed-over-to-server}The agent has triggered its failover mechanism and switched to server [servlet://mazztower:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet]
So, this is working as expected. You tried to start the agent with a server port that is down and the agent, quickly sensing the problem, will immediately try to go to another server in its failover list.
If you did NOT have a failover list, you WOULD get an connection problem (because, obviously, the agent wouldn't know where to go next, so it will just sit and wait for that server on port 7090 (in your case) to come back.
So, that is how you can do your tests. Just delete data/failover-list.dat before you run the agent with the bad port.
BTW: for giggles, I tried to set my security token to an invalid one (since that was also mentioned in this issue): $ ./rhq-agent.sh -Drhq.agent.security-token=ABC RHQ 4.3.0-SNAPSHOT [b295126] (Fri Jan 20 11:17:50 EST 2012) The server has rejected the agent registration request. Cause: [org.rhq.core.clientapi.server.core.AgentRegistrationException:The agent asking for registration under the name [mazztower] provided an invalid security token. This request will fail. Please consult an administrator to reconfigure this agent with its proper security token.] Will retry the agent registration request soon... So this is now working as expected as well. Assigning to ON_QA so that QE can verify they are seeing what Mazz describes when they are testing builds for RHQ4.3, i.e. builds off of master i am seeing -D commands passed to the agent ... on JON 3.01 RC#1 ...change was cherrypicked from master Bulk close of old bugs in VERIFIED state. |