Bug 615929

Summary: luci generated cluster.conf with fence_scsi fails to validate
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: luciAssignee: Chris Feist <cfeist>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: bbrock, ccaulfie, cfeist, cluster-maint, lhh, rmccabe, rpeterso, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: luci-0.22.2-9.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-10 22:12:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Fabio Massimo Di Nitto 2010-07-19 09:44:52 UTC
<?xml version="1.0"?>
<cluster config_version="8" name="fabbione-rhel6">
        <clusternodes>
                <clusternode name="rhel6-node1" nodeid="1" votes="1">
                        <fence>
                                <method name="default">
                                        <device name="scsi_fence"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="rhel6-node2" nodeid="2" votes="1">
                        <fence>
                                <method name="default">
                                        <device name="scsi_fence2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
        <fencedevices>
                <fencedevice agent="fence_scsi" name="scsi_fence" node="rhel6-node1"/>
                <fencedevice agent="fence_scsi" name="scsi_fence2" node="rhel6-node2"/>
        </fencedevices>
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <quorumd label="qdisk">
                <heuristic program="/bin/ping -c 1 vultus5.int.fabbione.net"/>
        </quorumd>
</cluster>


and error:

   Starting cman... Relax-NG validity error : Extra element fencedevices in interleave
tempfile:23: element fencedevices: Relax-NG validity error : Element cluster failed to validate content
tempfile:14: element device: validity error : IDREF attribute name references an unknown ID "scsi_fence2"
Configuration fails to validate

Comment 2 Lon Hohberger 2010-07-19 13:54:59 UTC
Jing says:

[lhh@localhost jing-20090818]$ java -jar bin/jing.jar ~/cluster.rng.in ./cluster.conf_fail
/sandbox/lhh/jing/jing-20090818/./cluster.conf_fail:25:21: error: attribute "node" not allowed here; expected attribute "action", "aptpl", "auth", "channel_address", "cipher", "cmd_prompt", "community", "cserver", "debug", "device", "devices", "domain", "drac_version", "exec", "hash", "help", "hmc_version", "identity_file", "inet4_only", "inet6_only", "io_fencing", "ip_family", "ipaddr", "ipport", "key", "key_file", "lanplus", "logfile", "login", "managed", "method", "module_name", "multicast_address", "nodename", "option", "partition", "passwd", "passwd_script", "port", "retrans", "ribcl", "rpowerpath", "secure", "separator", "serial_device", "serial_params", "servers", "snmp_auth_prot", "snmp_priv_passwd", "snmp_priv_passwd_script", "snmp_priv_prot", "snmp_sec_level", "snmp_version", "ssl", "switch", "timeout", "udpport", "use_uuid", "verbose", "version", "vmware_datacenter" or "vmware_type"
/sandbox/lhh/jing/jing-20090818/./cluster.conf_fail:27:21: error: attribute "node" not allowed here; expected attribute "action", "aptpl", "auth", "channel_address", "cipher", "cmd_prompt", "community", "cserver", "debug", "device", "devices", "domain", "drac_version", "exec", "hash", "help", "hmc_version", "identity_file", "inet4_only", "inet6_only", "io_fencing", "ip_family", "ipaddr", "ipport", "key", "key_file", "lanplus", "logfile", "login", "managed", "method", "module_name", "multicast_address", "nodename", "option", "partition", "passwd", "passwd_script", "port", "retrans", "ribcl", "rpowerpath", "secure", "separator", "serial_device", "serial_params", "servers", "snmp_auth_prot", "snmp_priv_passwd", "snmp_priv_passwd_script", "snmp_priv_prot", "snmp_sec_level", "snmp_version", "ssl", "switch", "timeout", "udpport", "use_uuid", "verbose", "version", "vmware_datacenter" or "vmware_type"

Looking at the fence_scsi script, the parameter "node" should be "nodename".  Since the <fencedevice> is failing because of this, this trickles up to the IDREF error when using xmllint.

Actually, "node" I don't think is ever used by a fencing agent; it's always "nodename".

Comment 4 Lon Hohberger 2010-07-19 14:17:26 UTC
Changing node -> nodename causes above cluster.conf to validate correctly.

Comment 5 Fabio Massimo Di Nitto 2010-07-20 09:59:10 UTC
the fix committed in luci.git 3177c4dd861060b6bec1b3510fcf041334c1fff7
does not solve the issue.

adding a fence_scsi device with Node name = foo returns an error:

No value for required attribute "node" was given for fence "scsi_fence"

Comment 6 Chris Feist 2010-07-22 22:04:28 UTC
It looks like the fix was not complete.

It should be fully completed in 6afbe5ac3ed48b589ba4375065a4d8aa35eb3927

Fabio,

can you verify that this works?

Thanks,
Chris

Comment 7 Fabio Massimo Di Nitto 2010-07-23 06:34:31 UTC
confirmed that the new fix works as expected

Comment 11 Chris Feist 2010-08-26 19:58:48 UTC
The config file listed above is a bad config file that was generated with a past version of luci with the bug.  If you create a new cluster in luci and created a fence scsi device it should be using the "nodename" tag inside <fencedevice>.

So if you create a cluster.conf and add a fence_scsi device and look at the cluster.conf and if the line looks something like this:

...
        <fencedevices>
                <fencedevice agent="fence_scsi" name="scsi_fence"
node="rhel6-node1"/>
                <fencedevice agent="fence_scsi" name="scsi_fence2"
node="rhel6-node2"/>
        </fencedevices>
...

Then luci is doing it wrong, it should look like this (notice node->nodename):

...
        <fencedevices>
                <fencedevice agent="fence_scsi" name="scsi_fence"
nodename="rhel6-node1"/>
                <fencedevice agent="fence_scsi" name="scsi_fence2"
nodename="rhel6-node2"/>
        </fencedevices>
...

Comment 12 Brian Brock 2010-08-27 21:00:33 UTC
verified

gotcha, parameter name is "nodename" in the cluster.conf currently generated by luci.

Comment 13 releng-rhel@redhat.com 2010-11-10 22:12:05 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.