Description of problem: Luci will return errors of all sorts if there is a syntax error in the cluster.conf file. My particular error was an "UnboundLocalError: local variable 'model' referenced before assignment" Version-Release number of selected component (if applicable): 0.9.2 How reproducible: easily Steps to Reproduce: 1. inject syntax error in /etc/cluster/cluster.conf (such as a missing </clusternode> 2. start luci and add cluster to the managed clusters. 3. attempt to click on the node-name in the cluster tab, or any member of the cluster. Actual results: luci returns errors Expected results: Luci should have some basic syntax checking on the xml file and return errors (such as those produced by xmllint) indicating a syntax error. Additional info: jparsons asked me to enter this bug after spending much time assisting me on my silly syntax problem which was complicated because luci was leading me down a confusing path)
Thanks for this ticket - but the fact that this deployment was on a classified network where the conf file couldn't be shared was the REAL issue that sucked down so much of our time :) Luci needs to check the conf file and display a meaningful error message if it does not pass a schema check.
Two other things, please: 1) It is always possible to check the cluster.conf file if it is suspect or if it has been edited by hand and errors follow. the xmllint command can be run against the file -- this will check basic xml robustness such as missing tags, etc. In addition, if system-config-cluster is installed, the relaxng cluster.conf schema specification can be found in /usr/share/system-config-cluster/misc/cluster.ng . Run 'xmllint --relaxng cluster.ng /etc/cluster/cluster.conf 2) Why was it necessary to edit the conf file by hand? Was there something that the luci interface was unable to configure? Marking NEEDINFO, hoping to learn if luci is insufficient in some manner......
Answer to 2) When a cluster node is down (Status: Unknown State) the only option Luci offers is to "Fence this node". I had to manually edit cluster.conf to delete the node (which was not returning to service within the cluster). I would have deleted the node prior to its de-commisioning had I known of this limitation. Maybe there could be a "are you sure you want to delete this unresponsive node" checkbox along with a delete option.
(In reply to comment #3) > Answer to 2) When a cluster node is down (Status: Unknown State) the only option > Luci offers is to "Fence this node". I had to manually edit cluster.conf to > delete the node (which was not returning to service within the cluster). I would > have deleted the node prior to its de-commisioning had I known of this > limitation. Maybe there could be a "are you sure you want to delete this > unresponsive node" checkbox along with a delete option. We've added support for doing this that will be in the 5.1 release.
(In reply to comment #2) > 1) It is always possible to check the cluster.conf file if it is suspect or if > it has been edited by hand and errors follow. the xmllint command can be run Further, ccs_tool will check the XML before propagating it, so there should be no way a cluster.conf with invalid XML can be used by the cluster. I think this bug must have been hit as a result of ricci reading in an edited configuration file that contained invalid XML from /etc/cluster/cluster.conf, even though it wasn't the configuration in use by the cluster. The particular error that luci hit (unboundlocal exception) is fixed in the current, build, but I've improved the error messages that show up when this occurs.
I'm going to close this since it wasn't a bug in the first place... I haven't had any issues since and conga has improved error messages.