Bug 768706 - Agent needs to use a unique Java Preference node name
Agent needs to use a unique Java Preference node name
Status: CLOSED WONTFIX
Product: RHQ Project
Classification: Other
Component: Agent (Show other bugs)
4.2
All All
high Severity high (vote)
: ---
: JON 3.1.0
Assigned To: RHQ Project Maintainer
Mike Foley
:
Depends On:
Blocks: JON3-45/PRODMGT-131/PRODMGT-417
  Show dependency treegraph
 
Reported: 2011-12-18 01:11 EST by Larry O'Leary
Modified: 2014-05-02 11:58 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
JON 2.4.1 Multiple agents installed (1 per host) Same user/system account on all hosts User/system account uses a shared home directory (NFS)
Last Closed: 2012-06-05 10:58:34 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Larry O'Leary 2011-12-18 01:11:48 EST
Description of problem:
When multiple agents are using the same system/service account and the system's home directory is in a shared location, the result is JON failing to properly execute without any clear reason to what is going wrong.

This configuration is common and the recommended within a enterprise system.
   Single system account to run a specific service (rhq-agent for example)
   Shared system and home directories

The result is that when an agent is configured it stores it Java Preferences in a node named 'default'. When the next agent is configured, it does the same thing. However, if it is sharing the same Java Preference store, it will overwrite the previous agent's configuration. The result is that the last installed agent works but all previously installed agents will fail to start the next time a restart is performed.

As the home directory is not tied to an RHQ installation, external forces could have a catastrophic affect on the RHQ system. For example, RHQ system has been setup and configured to run. Then, system-wide change request is submitted to relocate the system home directories to a shared location to make maintenance easier. Without the knowledge that the RHQ agent uses configuration persisted outside its installation and the system wide accepted location of /etc or /Windows/System32 or the Windows Registry, impact analysis on such a system change would miss the affect to the RHQ system.

Version-Release number of selected component (if applicable):
JON 2.4.1

How reproducible:
Always

Steps to Reproduce:
1. Create NFS (or other network attached) mount for /home
2. Create a system account for the RHQ agent (rhq-agent)
3. Install the RHQ agent on agent-host-01
4. Install the RHQ agent on agent-hose-02
5. Restart the agent on agent-host-01
  
Actual results:
The agent installed on agent-host-01 fails to restart due to it picking up the configuration of agent-host-02

Expected results:
Both agents should be capable of running with their own configuration without the need to add the -p/--pref agent command-line argument.

Additional info:
It is accepted that the -p/--pref agent command-line argument can control the Java Preference node name. However, this should only be needed if a non-standard unique name is not going to be used. Instead, by default, the agent should use a node name based on its host name or IP address and if a node already exists with such a name, the configuration of the agent should require the user to confirm the replacement or provide a --forceconfig option when configuring the agent.

The goal should be that the user should only have to use -p/--pref if they want to use some name different then what the unique default is or if for example the host name or IP changes at a later time and they still want to use the existing (old) configuration.
Comment 1 Charles Crouch 2012-01-23 11:39:12 EST
Apparently there will be changes in RHEL in the future which may increase the chance of customers running into this issue. Asked Larry to provide more info on that (links, timelines etc), once we've got we can triage and see if this is a JON3.x or 4.x thing
Comment 2 Larry O'Leary 2012-01-23 14:28:24 EST
Perhaps my reference to the changes in RHEL were misleading. They do not reflect actual changes in the RHEL product but changes in how systems are managed in a shared/cloud environment. This seems to go back to RHEL 4 as well and can be seen in the section entitled "NFS shared home directories" from the book A Practical Guide to Fedora and Red Hat Enterprise Linux.

Additionally, FreeIPA, AutoFS, and Red Hat GFS promote use of shared file systems across a data center as a means to improve performance, reliability, and maintainability. One common use case is using AutoFS and LDAP to centralize account management across multiple machines within an enterprise environment. 

The use of shared directories across virtual and physical machines is a practice that has been in use for a few years and the push for making use of shared partitions more mainstream will occur in Fedora 17 with implementation of merging into /usr (http://fedoraproject.org/wiki/Features/UsrMove).

So, what this BZ describes is a real world issue/problem that has impacted "real" customers using the JON product. Although only a hand full of user's have reported an issue within a production system it is impossible to determine how many users are currently experiencing issues related to the agents sharing the same preferences or have experienced the issue and resolved it using data available to them from GSS.
Comment 4 Charles Crouch 2012-02-01 14:12:36 EST
I'd like to start the discussion about addressing this bug in JON3.1
First off, how practical is it? What should be used for the unique identifier for the preferences, I'm not convinced IP address is the right answer, given its not unique across agents on a single machine, and the fact it can change (and the pref node name won't), making it potentially confusing.

How are we going to handle upgrades? e.g. we can't leave all the people with the "default" pref node with only manual upgrade options for instance.

Lets get agreement on an approach on rhq-devel before coding.

Mazz, needs to review proposals, given his intimate knowledge of the agent.
Comment 5 John Mazzitelli 2012-02-01 14:43:13 EST
This has always been a known use-case that people need to be aware of (specifically, the issue, at its core, is that there is a shared location that multiple agents use for their preferences store).

See the FAQ on this for how we, today, tell people to work around it:

http://docs.redhat.com/docs/en-US/JBoss_Operations_Network/2.3/html/FAQ/sect-FAQs-Agent-Wrong_Address.html

http://rhq-project.org/display/JOPR2/FAQ#FAQ-Iwanttorunagentsonallmymachines%2CbutonlyonestartsOKtherestfailduetobindingtoawrongaddress

As Charles mentions, using hostname, or worse IP address, for the preference store location would be bad. The whole reason why we don't force the agent name to be IP or hostname is because we want to avoid the issues that occur when, say, DHCP changes the agent's IP or when a hostname or domain name is changed (this was a problem in the old Hyperic/JON 1.x days when the agent name WAS the hostname which resulted in those kinds of problems). If you make the preference store reliant on IP or hostname, the same sets of problems re-emerge.

Today, you CAN tell each individual agent a different perference store location, if you so choose. Java provides a standard mechanism (on Linux implementations) to point to a different location for the preference store. By default its the user's $HOME/.java directory but you can change this. So, for example, in your agent's rhq-agent-env.sh, you can define a different location via:

set RHQ_AGENT_ADDITIONAL_JAVA_OPTS="-Djava.util.prefs.userRoot=/etc/rhq-agent-prefs"

where /etc/rhq-agent-prefs can be any path you want to point to (and in that example, it is in /etc which is, presumably, not a shared, NFS mounted directory).

So, I think we already can support this use-case - it just requires the RHQ installer to predefine this additional -D option on their agents.
Comment 7 John Mazzitelli 2012-02-02 11:27:19 EST
using the agent name is a potential solution. However, I am concerned about a catch-22 situation. The agent name itself is stored in the configuration. So in order to find the agent name, you need to look up the configuration. But in order to look up the configuration, you need to know the agent name. One way around this would be to assume the agent name is "special" and is not stored in the preference config but rather in, say, /data directory, or the /conf directory - in some special file "agentname.txt".

Note that anything we do must be backwards compat and not rely on user intervention when upgrading. I think my idea of having an agentname.txt in conf/ might help with that. The agent could look for "agentname.txt" in conf/ and if it finds it, use that as the default preferences node name. If its missing, fallback to the old default pref name which is "default". Note that the --pref cmdline option would override all of that still - to again, allow the user to define their own preference node (we use --pref in our agent spawn stuff, too, we have to make sure we don't break any of that agentspawn or agentcopy stuff too).
Comment 8 John Mazzitelli 2012-02-02 17:21:47 EST
I can take a stab at this - to allow agent name to be somewhere other than prefs while maintaining backward compat. would require changes to the setup questions code, the agent upgrade code, and the agent configuration code.
Comment 9 Larry O'Leary 2012-02-02 18:29:11 EST
Perhaps the agent name could be stored in the agent.properties file? I know we do not use this in a configured agent, but I would hate to see us introduce another file into the file system when this is essentially and agent property. Perhaps the agent could load the properties defined in agent.properties, then overwrite those with its preferences if they can be found. If not, agent configuration would be prompted. 

Additionally, at configuration time, if the agentname specified already exists, a warning prompt should be presented informing the user that an agent configuration appears to already exist at that location and that it will be overwritten.
Comment 10 John Mazzitelli 2012-02-02 18:57:53 EST
(In reply to comment #9)
> Perhaps the agent name could be stored in the agent.properties file? I know we
> do not use this in a configured agent, but I would hate to see us introduce
> another file into the file system when this is essentially and agent property.
> Perhaps the agent could load the properties defined in agent.properties, then
> overwrite those with its preferences if they can be found. If not, agent
> configuration would be prompted. 

I think you mean agent-configuration.xml (there is no agent.properties file). And we DO actually have the ability to have the agent name in that xml file (because, as mentioned earlier, agent name is nothing more than one of the normal preference settings - thus, it can also be placed in here). Today, by default, it is commented out, but to support pre-configured agents, we have a place in there for people to uncomment and use.

      <!--
         <entry key="rhq.agent.name" value="my.hostname.com"/>
       -->

We could do the same thing that we do with the preference node name. If you pass in --prefs - we actually edit that .xml file and put the pref node name directly in the XML. We could do something similar by adding a new cmdline option --agentname - so you can specify it on the command line and we could put the agent name right in the xml file based on what they specified.


> Additionally, at configuration time, if the agentname specified already exists,
> a warning prompt should be presented informing the user that an agent
> configuration appears to already exist at that location and that it will be
> overwritten.

I think we need to worry about making things more confusing.  Today, its an easy rule - if the agent is already configured, there is nothign to think about - the .xml file is completely ignored.

It sounds like there are lots of ideas floating around. Before we implement anything, I think we should probably write up a design page on the wiki to flesh out all the different scenarios of use-cases, issues to worry about like backwards compatibility and the like.
Comment 11 Larry O'Leary 2012-02-02 19:23:05 EST
(In reply to comment #10)
> I think you mean agent-configuration.xml (there is no agent.properties file).

Correct. Sorry about that.

> 
> > Additionally, at configuration time, if the agentname specified already exists,
> > a warning prompt should be presented informing the user that an agent
> > configuration appears to already exist at that location and that it will be
> > overwritten.
> 
> I think we need to worry about making things more confusing.  Today, its an
> easy rule - if the agent is already configured, there is nothign to think about
> - the .xml file is completely ignored.
> 
> It sounds like there are lots of ideas floating around. Before we implement
> anything, I think we should probably write up a design page on the wiki to
> flesh out all the different scenarios of use-cases, issues to worry about like
> backwards compatibility and the like.

Correct. My concern is that if a user installs an agent and specifies a name that already exists as a preference node that a warning is displayed. This way, if I am using my own naming schema such as agent-01, agent-02, agent-03, etc. and install agent-04 but specify agent-03 during configuration, a warning would prompt me to confirm that I really want to overwrite the existing agent-03 preferences. If in fact I am really "re-installing/configuring" agent-03 then obviously I would confirm/continue. But if I didn't expect that, such a warning would give me a chance to figure out why the agent thought agent-03 agentname and preference node was already there.
Comment 12 John Mazzitelli 2012-02-03 11:19:14 EST
see design wiki page on this:

http://rhq-project.org/display/RHQ/Design-AgentUniquePrefsLocation
Comment 13 Larry O'Leary 2012-02-09 16:49:17 EST
Just to make it clear on what I was originally proposing. 

My suggestion is to store and retrieve the Java Preference Node name within a configuration file located in the agent's installation directory. For example <RHQ_AGENT_HOME>/conf/agent-configuration.xml. 

When the agent starts up, it would attempt to read the Java Preference node name (node name) from the agent's command-line or environment (such as the -p/--prefs option). If the value was not specified on the command-line or as part of the agent's environment, the agent will then look for the node name value in the expected file location (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml). 

Keep in mind, at this point, the agent has not attempted to read/initialize a java.util.prefs.Preferences object.

If for some reason the node name has not been defined, the default "default" will be used for backwards/legacy compatibility.

Next, the agent must determine if a Java Preference node already exists using the node name. If so, the preferences are loaded and things continue on as normal. However, if it does not exist, one of the following paths would be taken:

   Path A: If the node name value is null, an empty string, or is "default"
      1) Using the same block of code which determines the suggested default agent name (or perhaps the agent name provided in agent-configuration.xml), set the node name to what would be determined to be the default agent name.
      2) Using the suggested node name, check to see if a Java Preference node already exists using this new node name. If so, the preferences are loaded and the node name value is stored in the expected location (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml). However, if it does not exist, go to Path B.

   Path B: If in interactive mode, ask user to confirm and/or input node name value
      1) If value entered is different then determined node name, check to see if a Java Preference node exists and if it does, load those values for use with the rest of the interactive prompts.
      2) If value entered is the same as the determined node name, create a new Java Preference node and pre-populate it with the default values loaded from agent-configuration.xml.
      3) Continue with the interactive configuration prompts using the Java Preference node loaded in 1) or created in 2).
      
   Path C: If not in interactive mode
      1) Create a new Java Preference node and pre-populate it with the values loaded form agent-configuration.xml



Scneario 1: Fresh Agent installation in interactive mode (interactive)
   - The agent prompts the user for the agent/node name and suggests the fully qualified host name by default
   - The user accepts all default input
   - The agent stores its configuration in the Java Preference node named the same as the agent's name (fully qualified host name)
   - The agent stores its agent/node name in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)
   
   
Scenario 2: Agent already installed using Scenario 1 (interactive)
   - The agent finds its agent/node name stored in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)
   - The agent successfully loads its configuration from the existing Java Preference node
   

Scenario 3: Agent already installed using Scenario 1 but someone deleted the .java directory in user's home directory (interactive)
   - The agent finds its agent/node name stored in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)
   - The agent fails to load its configuration from the existing Java Preference node
   - Agent prompts user to confirm/enter the agent/node name
   - The user accepts all default input
   - The agent stores its configuration in the Java Preference node named the same as the agent's name (fully qualified host name)
   - The agent stores its agent/node name in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)
      
   
Scenario 4: Agent already installed using Scenario 1 but agent/node name is now missing (interactive)
   - The agent DOES NOT find its agent/node name stored in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)
   - The agent finds a Java Preference node named with the agent's fully qualified host name
   - The agent successfully loads its configuration from the existing java Preference node
   - The agent stores its agent/node name in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)


Scenario 5: Agent already installed using Scenario 1 and now is being auto-upgraded (non-interactive)
   - The agent invokes the extraction of the new agent and copies its environment script, agent-configuration.xml, and agent/node name configuration file (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml) to the new installation
   - The new agent is started
   - The new agent finds its agent/node name stored in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)
   - The new agent successfully loads its configuration from the existing Java Preference node


Scenario 6: Agent already installed using Scenario 1 and now is being manually-upgraded (non-interactive)
   - The user executes the agent installer telling it to perform an upgrade
   - The agent installer invokes the extraction of the new agent and copies the environment script, agent-configuration.xml, and agent/node name configuration file (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml) from the prior agent to the new agent
   - The new agent is started
   - The new agent finds its agent/node name stored in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)
   - The new agent successfully loads its configuration from the existing Java Preference node


Scenario 7: Agent being installed on host which already has/had an agent installed using Scenario 1 (interactive)
   - The agent DOES NOT find its agent/node name stored in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)
   - The agent finds a Java Preference node named with the agent's fully qualified host name
   - The agent successfully loads its configuration from the existing Java Preference node
   - The agent stores its agent/node name in a configuration file located in its file system (such as <RHQ_AGENT_HOME>/conf/agent-configuration.xml)
Comment 14 John Mazzitelli 2012-02-10 11:53:50 EST
we had a conf call discussing this... assigning to ian to see if he can come up
with a implementation to do this
Comment 15 Charles Crouch 2012-02-15 08:30:13 EST
Resetting assignee. Proposals for how to move forward should go to rhq-devel list
Comment 16 Charles Crouch 2012-03-06 16:23:45 EST
Taking this out of current sprint based on prioritization from PM

Note You need to log in before you can comment on or make changes to this bug.