Bug 585334
Summary: | support having agent change hostname without human intervention | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Charles Crouch <ccrouch> | ||||||
Component: | No Component | Assignee: | John Sanda <jsanda> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Mike Foley <mfoley> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | unspecified | CC: | bmozaffa, hbrock, jshaughn, mazz, sreichar, twilkins | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-05-29 15:41:10 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Charles Crouch
2010-04-23 18:20:20 UTC
So to clarify how the agent/server registration should work: Assume the agent has initially registered successfully with the server. If the agent starts up with the same "agent name" and same "agent token" as when it first registered, but it has a different hostname, then the agent should be able to negotiate with the server to update the hostname and maintain correct communication. The key part of this is *not* to run cleanconfig when the agent starts up with the new hostname. When the agent starts up the first time and registers with the server it will write its entire configuration into Java preferences, stored in the home directory of the user which started the agent. [Note: running agents as a user whose home directory is on a shared drive will therefore not work by default, since each agent will overwrite the configuration of the other]. Also the next time the agent starts it will not read any settings from the ./conf/agent-configuration.xml, instead it will get all its settings from these Java preferences. If you run cleanconfig on the agent then these Java preferences, in particular the "agent token", will get blown away. Without the token the server will think this is an entirely new agent, not the old agent trying to reconnect. It sounds like in this automated setup you have you should be running with a pre-configured agent-configuration.xml which has a unique agent name set (obviously not the hostname which will change) and set to point to where your JON server is located. Can you attach the agent-configuration.xml you are using and the *exact* commands you give to the agent to start it initially (never been run on this machine) and when it is started subsequently with a different hostname (if different). Created attachment 408717 [details]
agent-configuration.xml_ra-130-249
Created attachment 408718 [details]
rhq-agent-wrapper.sh_ra-130-249
The agent-configuration.xml file and the calling script /etc/init.d/rhq-agent-wrapper.sh have been attached. There's a little bit more to this attempt that may still be playing a factor. The issue was initially noticed in our first attempts to automate the registration at startup. The agent name somehow defaulted to localhost.localdomain and registered with the JON server with that agent name rather that the hostname we wanted. Subsequent attempts to re-register the node have lead to this issue. We have yet to get the agent name to default to the hostname as expected. this feature might be useful to you: https://bugzilla.redhat.com/show_bug.cgi?id=535783 "by default, the agent's registration server (that is the server it registers with at startup) used to be 127.0.0.1 unless you preconfigured the agent. Now it will first perform a DNS lookup for a machine called "rhqserver" - if there is one defined, THAT will be the server it will connect to (you still have to define the port and transport - the defaults remain the same - servlet and 7080). If there is no "rhqserver", the default remains the localhost." I can't remember if this made it into the latest JON release or not. It must have made it in because it worked. We had been trying to preconfigure the agent but that continued to fail until rhqserver was defined, thanks for mentioning that. I assumed specifying the server using rhq.agent.server.bind-address would have been the answer but it did not succeed. So now a newly provisioned jboss system can be discovered by the JON server as the jboss system boots. We still have two hurdles to clear ... 1) What will occur if a jboss system is down for a while, its IP lease expires, another system acquires that IP, and the jboss server boots to obtain a different IP. 2) Our next goal is to duplicate the newly registered jboss server via templates so I will have to 'sysprep', if you will, the system to be ready for cloning. Will the cleanconfig option do what I need in this instance? So far any VMs cloned from template do indeed fail to be discovered by JON. JBoss and the JON agent are running so I'm assuming that my attempts to prep the VM for templating (stop agent, run the agent as the same user who registered it only with --cleanconfig) are not succeeding. This is almost expected because my attempts at cleanconfig ... rhq-agent.sh --cleanconfig -c ../conf/agent-configuration.xml ... do not return to the command line prompt but instead sit with a prompt as if the command is incomplete ... > [root@ra-131-247 ~]# rhq-agent/bin/rhq-agent.sh --cleanconfig -c /root/rhq-agent/conf/agent-configuration.xml > RHQ 1.3.1.GA [5295] (Wed Feb 24 18:46:23 EST 2010) >> ... until I eventually ^C out of it and the agent shuts down ... > RHQ 1.3.1.GA [5295] (Wed Feb 24 18:46:23 EST 2010) >> >> Shutting down... > The agent will wait for [0] threads to die > Shutdown complete - agent will now exit. It is assumed that --cleanconfig never actually occurs. What am I doing wrong regarding that option? see the second yellow box at: http://rhq-project.org/display/JOPR2/RHQ+Agent+Installation#RHQAgentInstallation-ConfiguretheRHQAgent that starts with "If the agent fails to register with the server..." If the agent seems to just "hang", look at the agent log file and it will probably tell you what its waiting for. I suspect its trying to register with a Server that it cannot communicate with. This is the Server specified in the configuration preferences rhq.agent.server.* - the agent log should tell you more. Actually, it looks like I've been put into a shell for rhq-agent ... > help avail: Get availability of inventoried resources config: Manages the agent configuration debug: Provides features to help debug the agent. discovery: Asks a plugin to run a server scan discovery download: Downloads a file from the RHQ Server dumpspool: Shows the entries found in the command spool file exit: Shuts down the agent's communications services and kills the agent failover: Provides HA failover functionality getconfig: Displays one, several or all agent configuration preferences help: Shows help for a given command identify: Asks to identify a remote server inventory: Provides information about the current inventory of resources log: Configures some settings for the log messages metrics: Shows the agent metrics native: Obtains native system information pc: Starts and stops the plugin container and all deployed plugins ping: Pings the RHQ Server piql: Executes a PIQL query to search for running processes plugins: Updates the agent plugins with the latest versions from the server quit: Shuts down the agent's communications services and kills the agent register: Registers this agent with the RHQ Server sender: Controls the command sender to start or stop sending commands setconfig: Sets an agent configuration preference setup: Sets up the agent configuration by asking a series of questions shutdown: Shuts down all communications services without killing the agent sleep: Puts the agent prompt to sleep for a given amount of seconds. start: Starts the agent comm services so it can accept remote requests timer: Times how long it takes to execute another prompt command update: Provides agent update functionality version: Shows information on agent version and agent environment The agent log looks to have connected top the server successfully and updated its plugins ... 2010-04-27 16:09:04,213 INFO [main] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.started}Service container started - ready to accept incoming commands 2010-04-27 16:09:05,361 INFO [RHQ Agent Registration Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.agent-registration-results}Agent has successfully registered with the server. The results are: [AgentRegistrationResults: [agent-token=1272308665969-1833291284-4956338588785322629]] 2010-04-27 16:09:05,485 ERROR [ClientCommandSenderTask Timer Thread #0] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.time-not-synced}The server and agent clocks are not in sync. Server=[1272413394595][April 27, 2010 8:09:54 PM EDT], Agent=[1272398945484][April 27, 2010 4:09:05 PM EDT] 2010-04-27 16:09:05,492 INFO [RHQ Server Polling Thread] (org.rhq.enterprise.agent.PluginUpdate)- {PluginUpdate.updating-complete}Completed updating the plugins to their latest versions. 2010-04-27 16:09:05,493 INFO [RHQ Server Polling Thread] (enterprise.communications.command.client.ServerPollingThread)- {ServerPollingThread.server-online}The server has come back online; client has been told to start sending commands again 2010-04-27 16:09:07,523 INFO [main] (org.rhq.core.pc.PluginContainer)- Initializing Plugin Container v1.3.1.GA... > 2010-04-27 16:09:09,821 INFO [main] (rhq.core.pc.inventory.InventoryManager)- Initializing Inventory Manager... 2010-04-27 16:09:09,841 INFO [main] (rhq.core.pc.inventory.InventoryManager)- Detected new Platform [Resource[id=0, type=Linux, key=ra-130-249.ra.rh.com, name=ra-130-249.ra.rh.com, parent=<null>, version=Linux 2.6.18-191.el5]] - adding to local inventory... 2010-04-27 16:09:09,846 INFO [main] (rhq.core.pc.inventory.InventoryManager)- Inventory Manager initialized. 2010-04-27 16:09:09,850 INFO [main] (rhq.core.pc.inventory.ResourceFactoryManager)- Initializing 2010-04-27 16:09:09,850 INFO [main] (rhq.core.pc.content.ContentManager)- Initializing Content Manager... 2010-04-27 16:09:09,851 INFO [main] (rhq.core.pc.content.ContentManager)- Initializing scheduled content discovery... 2010-04-27 16:09:09,851 INFO [main] (rhq.core.pc.content.ContentManager)- Content Manager initialized... 2010-04-27 16:09:09,852 INFO [main] (org.rhq.core.pc.PluginContainer)- Plugin Container initialized. 2010-04-27 16:09:09,854 INFO [RHQ Primary Server Switchover Thread] (org.rhq.enterprise.agent.AgentMain)- {PrimaryServerSwitchoverThread.started}The primary server switchover thread has started. 2010-04-27 16:09:19,849 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.AutoDiscoveryExecutor)- Executing server discovery scan... 2010-04-27 16:09:19,931 INFO [ResourceDiscoveryComponent.invoker.daemon-1] (org.rhq.plugins.agent.AgentDiscoveryComponent)- Discovering RHQ Agent... 2010-04-27 16:09:19,940 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Detected new Server [Resource[id=0, type=RHQ Agent, key=ra-130-249.ra.rh.com RHQ Agent, name=ra-130-249.ra.rh.com RHQ Agent, parent=<null>, version=1.3.1.GA]] - adding to local inventory... 2010-04-27 16:09:19,982 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Sending [server] inventory report to Server... 2010-04-27 16:09:20,052 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Syncing local inventory with Server inventory... 2010-04-27 16:09:20,052 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Got unknown resource: 10281 2010-04-27 16:09:20,171 INFO [ResourceContainer.invoker.daemon-1] (org.rhq.plugins.platform.LinuxPlatformComponent)- Internal yum server is disabled. 2010-04-27 16:09:20,687 INFO [ResourceContainer.invoker.daemon-2] (org.rhq.plugins.jmx.JMXServerComponent)- Starting connection to JMX Server ra-130-249.ra.rh.com RHQ Agent 2010-04-27 16:09:20,706 INFO [ResourceContainer.invoker.daemon-2] (ems.impl.jmx.connection.DConnection)- Querying MBeanServer for all MBeans 2010-04-27 16:09:20,707 INFO [ResourceContainer.invoker.daemon-2] (ems.impl.jmx.connection.DConnection)- Found 28 MBeans, starting load 2010-04-27 16:09:20,722 INFO [ResourceContainer.invoker.daemon-2] (org.rhq.plugins.jmx.JMXServerComponent)- Starting connection to JMX Server InternalVM 2010-04-27 16:09:20,723 INFO [ResourceContainer.invoker.daemon-2] (ems.impl.jmx.connection.DConnection)- Querying MBeanServer for all MBeans 2010-04-27 16:09:20,723 INFO [ResourceContainer.invoker.daemon-2] (ems.impl.jmx.connection.DConnection)- Found 28 MBeans, starting load 2010-04-27 16:09:20,814 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.AutoDiscoveryExecutor)- Found 1 servers. 2010-04-27 16:09:20,832 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server... 2010-04-27 16:09:25,815 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.RuntimeDiscoveryExecutor)- Running runtime discovery scan rooted at [platform] 2010-04-27 16:09:25,850 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Version of [Resource[id=10298, type=CPU, key=1, name=CPU 1, parent=ra-130-249.ra.rh.com, version=QEMU Virtual CPU version 0.9.1]] changed from [] to [QEMU Virtual CPU version 0.9.1] 2010-04-27 16:09:25,857 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Version of [Resource[id=10297, type=CPU, key=0, name=CPU 0, parent=ra-130-249.ra.rh.com, version=QEMU Virtual CPU version 0.9.1]] changed from [] to [QEMU Virtual CPU version 0.9.1] 2010-04-27 16:09:25,869 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Version of [Resource[id=10301, type=RHQ Agent JVM, key=InternalVM, name=RHQ Agent JVM, parent=ra-130-249.ra.rh.com RHQ Agent, version=1.6.0]] changed from [] to [1.6.0] 2010-04-27 16:09:25,903 INFO [ResourceDiscoveryComponent.invoker.daemon-1] (org.rhq.plugins.agent.AgentEnvironmentScriptDiscoveryComponent)- Discovering RHQ Agent's environment setup script... 2010-04-27 16:09:25,917 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Version of [Resource[id=10304, type=Environment Setup Script, key=environment-setup-script, name=rhq-agent-env.sh, parent=ra-130-249.ra.rh.com RHQ Agent, version=1.3.1.GA]] changed from [] to [1.3.1.GA] 2010-04-27 16:09:25,917 INFO [ResourceDiscoveryComponent.invoker.daemon-1] (org.rhq.plugins.agent.AgentLauncherScriptDiscoveryComponent)- Discovering RHQ Agent's launcher script service... 2010-04-27 16:09:25,925 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Version of [Resource[id=10305, type=Launcher Script, key=launcherscript, name=RHQ Agent Launcher Script, parent=ra-130-249.ra.rh.com RHQ Agent, version=1.3.1.GA]] changed from [] to [1.3.1.GA] 2010-04-27 16:09:25,926 INFO [ResourceDiscoveryComponent.invoker.daemon-1] (org.rhq.plugins.agent.AgentJavaServiceWrapperDiscoveryComponent)- Discovering RHQ Agent's JSW service... 2010-04-27 16:09:25,929 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.RuntimeDiscoveryExecutor)- Scanned [0] servers and found [0] total descendant Resources. 2010-04-27 16:09:25,929 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Sending [runtime] inventory report to Server... 2010-04-27 16:09:25,951 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Syncing local inventory with Server inventory... 2010-04-27 16:09:29,849 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.RuntimeDiscoveryExecutor)- Running runtime discovery scan rooted at [platform] 2010-04-27 16:09:29,905 INFO [ResourceDiscoveryComponent.invoker.daemon-1] (org.rhq.plugins.agent.AgentEnvironmentScriptDiscoveryComponent)- Discovering RHQ Agent's environment setup script... 2010-04-27 16:09:29,906 INFO [ResourceDiscoveryComponent.invoker.daemon-1] (org.rhq.plugins.agent.AgentLauncherScriptDiscoveryComponent)- Discovering RHQ Agent's launcher script service... 2010-04-27 16:09:29,906 INFO [ResourceDiscoveryComponent.invoker.daemon-1] (org.rhq.plugins.agent.AgentJavaServiceWrapperDiscoveryComponent)- Discovering RHQ Agent's JSW service... 2010-04-27 16:09:29,910 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.RuntimeDiscoveryExecutor)- Scanned [0] servers and found [0] total descendant Resources. 2010-04-27 16:09:29,910 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Sending [runtime] inventory report to Server... 2010-04-27 16:09:29,934 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Syncing local inventory with Server inventory... 2010-04-27 16:10:05,517 ERROR [RHQ Server Polling Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.time-not-synced}The server and agent clocks are not in sync. Server=[1272413454627][April 27, 2010 8:10:54 PM EDT], Agent=[1272399005516][April 27, 2010 4:10:05 PM EDT] 2010-04-27 16:10:09,855 INFO [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection for [123] metrics took 551ms - sending report to Server... 2010-04-27 16:10:39,855 INFO [MeasurementManager.sender-2] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection for [5] metrics took 3ms - sending report to Server... If the functionality of --cleanconfig is to restart the agent immediately after having cleared its persistent preferences, then this will not fit the bill for our cloning by template needs. The wiped config would need to leave the rhq-agent stopped and force itself into setup mode (which is hopefully preconfigured) when the cloned guest is started and not before. If this is so, we can change how we invoke cleanconfig to run it immediately after the newly cloned VM has started for the first time but we have not found a way out of the apparent shell we enter by executing the command as shown above. We tried the following to see if we could exit the shell ... [root@ra-130-249 ~]# /root/rhq-agent/bin/rhq-agent.sh --cleanconfig << eof > quit > eof RHQ 1.3.1.GA [5295] (Wed Feb 24 18:46:23 EST 2010) > Agent no longer accepting input at prompt. Shutting down... The agent will wait for [0] threads to die Shutdown complete - agent will now exit. ... but this shuts down the agent when done and I'm not sure the cleanconfig has occurred. I will test this further. Still digging. You are using rhq-agent.sh - it assumes the agent is to be run in the foreground as a console app. Thus, the agent will provide a "shell" prompt (i.e. it will read stdin and you can enter keyboard input, such as you did there with the "help" option). If you want to run the agent in the background, you typically will use the init.d script that we provide - rhq-agent-wrapper.sh. You can use that as-is (execute "rhq-agent-wrapper.sh" without args for syntax help) or you can install it as an init.d script for boottime launching. You can export the env var RHQ_AGENT_CMDLINE_OPTS if you want to pass cmd line args to the agent via rhq-agent-wrapper.sh. Read the comments at the top of rhq-agent-wrapper.sh (and rhq-agent.sh) for more info on all the different environment variables they accept). Note that if you want to run rhq-agent.sh and put it into background via "&" (as opposed to rhq-agent-wrapper.sh), make sure you at least pass in --daemon as a cmdline argument (this tells the Java agent it will be in background and to not listen for stdin input). The rhq-agent-wrapper.sh does this for you. If you want to see help on the cmdline args, "rhq-agent.sh help" provides you some help. Or see this: http://rhq-project.org/display/JOPR2/RHQ+Agent+Command+Line+Options - specifically you'll see "--daemon" and its description there. Have you read the docs on how to start and configure the agent such that it can run in the background the first time you start it? See these docs, these explain a lot of this: http://rhq-project.org/display/JOPR2/Running+the+RHQ+Agent#RunningtheRHQAgent-RunningonUnix http://rhq-project.org/display/JOPR2/RHQ+Agent+Installation > You are using rhq-agent.sh - it assumes the agent is to be run in the > foreground as a console app. Thanks. I had tried using the wrapper first but did not see the RHQ_AGENT_CMDLINE_OPTS var to help with engaging cleanconfig. Is it possible that cleanconfig is not doing what we require? Although --cleanconfig is included in RHQ_AGENT_CMDLINE_OPTS, I am not sure it is engaging unless this line in the log is that action ... Agent has been asked to start up clean - cleaning out the data directory: data I'm still trying to determine if the attempt to cleanconfig should occur on the original VM (before a template is produced) OR if the agent config can remain in place, be template cloned into a new VM and then have that new VM run cleanconfig on its first boot so it can register. So far I am still seeing remnants of the first incarnation of the guest [ra-131-247] in JON preventing its new cloned identity [ra-130-223] from registering. 2010-04-28 09:56:56,365 ERROR [RHQ Agent Registration Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.agent-registration-rejected}The server has rejected the agent registration request. Cause: [org.rhq.core.clientapi.server.core.AgentRegistrationException:The agent asking for registration is trying to register the same address/port [172.20.130.223:16163] that is already registered under a different name [ra-131-247.ra.rh.com]; if this new agent is actually the same as the original, then re-register with the same name] > > Have you read the docs on how to start and configure the agent such that it > can run in the background the first time you start it? Yes, at least by use of the rhq.agent.configuration-setup-flag so the activity is not interactive, if that's what you mean... but not via the same URLs you provided ... http://www.redhat.com/docs/en-US/JBoss_ON/2.3/html/Installation_Guide/Installation_Guide-JON_Agent_Installation_Guide-Preconfiguring_the_JON_Agent.html http://www.redhat.com/docs/en-US/JBoss_ON/html/JON_Agent_Guide/index.html |