Red Hat Bugzilla – Bug 247521
Luci returns misleading errors if cluster.conf has a syntax error
Last modified: 2009-04-16 18:58:48 EDT
Description of problem:
Luci will return errors of all sorts if there is a syntax error in the
cluster.conf file. My particular error was an "UnboundLocalError: local variable
'model' referenced before assignment"
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. inject syntax error in /etc/cluster/cluster.conf (such as a missing
2. start luci and add cluster to the managed clusters.
3. attempt to click on the node-name in the cluster tab, or any member of the
luci returns errors
Luci should have some basic syntax checking on the xml file and return errors
(such as those produced by xmllint) indicating a syntax error.
Additional info: jparsons asked me to enter this bug after spending much time
assisting me on my silly syntax problem which was complicated because luci was
leading me down a confusing path)
Thanks for this ticket - but the fact that this deployment was on a classified
network where the conf file couldn't be shared was the REAL issue that sucked
down so much of our time :)
Luci needs to check the conf file and display a meaningful error message if it
does not pass a schema check.
Two other things, please:
1) It is always possible to check the cluster.conf file if it is suspect or if
it has been edited by hand and errors follow. the xmllint command can be run
against the file -- this will check basic xml robustness such as missing tags,
etc. In addition, if system-config-cluster is installed, the relaxng
cluster.conf schema specification can be found in
/usr/share/system-config-cluster/misc/cluster.ng . Run 'xmllint --relaxng
2) Why was it necessary to edit the conf file by hand? Was there something that
the luci interface was unable to configure? Marking NEEDINFO, hoping to learn if
luci is insufficient in some manner......
Answer to 2) When a cluster node is down (Status: Unknown State) the only option
Luci offers is to "Fence this node". I had to manually edit cluster.conf to
delete the node (which was not returning to service within the cluster). I would
have deleted the node prior to its de-commisioning had I known of this
limitation. Maybe there could be a "are you sure you want to delete this
unresponsive node" checkbox along with a delete option.
(In reply to comment #3)
> Answer to 2) When a cluster node is down (Status: Unknown State) the only option
> Luci offers is to "Fence this node". I had to manually edit cluster.conf to
> delete the node (which was not returning to service within the cluster). I would
> have deleted the node prior to its de-commisioning had I known of this
> limitation. Maybe there could be a "are you sure you want to delete this
> unresponsive node" checkbox along with a delete option.
We've added support for doing this that will be in the 5.1 release.
(In reply to comment #2)
> 1) It is always possible to check the cluster.conf file if it is suspect or if
> it has been edited by hand and errors follow. the xmllint command can be run
Further, ccs_tool will check the XML before propagating it, so there should be
no way a cluster.conf with invalid XML can be used by the cluster.
I think this bug must have been hit as a result of ricci reading in an edited
configuration file that contained invalid XML from /etc/cluster/cluster.conf,
even though it wasn't the configuration in use by the cluster.
The particular error that luci hit (unboundlocal exception) is fixed in the
current, build, but I've improved the error messages that show up when this occurs.
I'm going to close this since it wasn't a bug in the first place... I haven't
had any issues since and conga has improved error messages.