Red Hat Bugzilla – Bug 472960
rg_test script does not provide adequate debugging of a faulty cluster.conf
Last modified: 2010-01-21 14:38:30 EST
Created attachment 324658 [details]
rg_test being run against a valid cluster.conf does not produce a "No errors detected" message.
Description of problem:
Part of the rgmanager package, rg_test can be employed to debug a problematic cluster configuration file. The documentation located at http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Administration/s1-clust-rsc-testing-config-CA.html gives the appropriate syntax to run this program and explains that it will "Test a configuration (and /usr/share/cluster) for errors or redundant resource agents."
Upon running the script using the syntax specified (rg_test test /etc/cluster/cluster.conf) it does not do a detailed job in diagnosing and displaying potential syntax errors within the cluster configuration. Purposefully introducing typos, or other errors only occasionally produce meaningful output.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Introduce errors into a known, good /etc/cluster/cluster.conf
2. Run "rg_test test /etc/cluster/cluster.conf"
3. Observe the output for errors present (or not).
Created attachment 324659 [details]
Introducing a typo and subsequent display of expected output.
I would expect that when a typo is introduced (intentionally or otherwise) to a cluster configuration that the rg_test script will alert the user to the problem that is found. That is the case in this example.
Created attachment 324661 [details]
Introducing several typos which are not found by rg_test
I would expect that rg_test would alert the user to the misspelling of "ref" in the cluster configuration, however it doesn't seem to be as thorough as it should be in checking the XML syntax.
"Valid" and "correct" are very different. Typos don't necessarily make a cluster.conf invalid, but they can make them semantically wrong.
In this particular case, sometimes, it's not what's written, but what isn't. The service structure's output is flat-out *wrong* if you add the typo.
If it would make more sense, the XML outputs would have been like this:
<service name="test" autostart="1" hardrecovery="0" exclusive="0"
<lvm name="armstrong2-oracle-vol" vg_name="armstrong2vg"
lv_name="oracle" nfslock="0" />
fstype="ext3" force_unmount= "1"
self_fence="1" nfslock="0" fsid="47123"
<ip address="18.104.22.168" monitor_link="1" nfslock="0" />
With the typo added...
<service name="test" autostart="1" hardrecovery="0"
exclusive="0" nfslock="0" recovery="restart"
We can detect some things easily (say wrong "reff"). This is actually fairly easy to add, and I agree that it should be added.
However, reporting errors for unrecognized attributes names is more difficult and probably will get us into trouble: we obsolete attribute names from time to time, but this doesn't make the configuration invalid (or even incorrect!).
Because the names of attributes and such are from the resource agents' metadata, and are not something that rg_test is aware of at compile-time (only run-time), maintaining a list of "obsolete" attributes within rg_test is just begging for problems.
We are working on updating the cluster.conf schema to be more robust.
Development Management has reviewed and declined this request. You may appeal
this decision by reopening this request.