Bug 613880

Summary: cluster.conf fails to validate when <totem ... > is set.
Product: [Fedora] Fedora Reporter: Madison Kelly <mkelly>
Component: clusterAssignee: Lon Hohberger <lhh>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: agk, cfeist, fdinitto, lhh, swhiteho
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-15 14:29:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the cluster.conf file that fails to validate.
none
The cluster.rng file used to validate.
none
Fixed cluster.conf which validates. none

Description Madison Kelly 2010-07-13 04:56:30 UTC
Created attachment 431346 [details]
the cluster.conf file that fails to validate.

Description of problem:

I added the following to '/etc/cluster/cluster.conf':
-------------------------------------------------------
<cluster name="an-cluster" config_version="2">
        <totem rppmode="passive" version="2" secauth="off" threads="off">
                <interface ringnumber="0" bindnetaddr="10.0.1.0" mcastaddr="226.94.1.1" mcastport="5405" />
                <interface ringnumber="1" bindnetaddr="10.0.0.0" mcastaddr="226.94.1.2" mcastport="5405" />
        </totem>
        ...
</cluster>
-------------------------------------------------------

When I then tried to validate it with 'ccs_config_validate' I got:
-------------------------------------------------------
Relax-NG validity error : Extra element totem in interleave
tempfile:3: element totem: Relax-NG validity error : Element cluster failed to validate content
Configuration fails to validate
-------------------------------------------------------


Version-Release number of selected component (if applicable):

See attached cluster.conf and cluster.rng files. Please note that the cluster.rng in use contains a modification from source to add support for a custom fence device. The cluster.conf failed validation against a stock cluster.rng as well.

How reproducible:

Seems to be 100%

Steps to Reproduce:
1. Add the above <totem ...> syntax
2. Try to validate.
3.
  
Actual results:

Validation failed.

Expected results:

Validation passed.

Additional info:

Comment 1 Madison Kelly 2010-07-13 04:57:22 UTC
Created attachment 431347 [details]
The cluster.rng file used to validate.

Comment 2 Lon Hohberger 2010-07-13 18:28:19 UTC
The following keyword is incorrect:

   rppmode="active"

Should be:

   rrp_mode="passive"

The following two keywords are not supported at this time by cman-preconfig; hence they are not valid as part of cluster.conf at this time:

   version="2"
   threads="off"

This should work:

<cluster name="an-cluster" config_version="2">
        <totem rrp_mode="passive" secauth="off">
                <interface ringnumber="0" bindnetaddr="10.0.1.0"
mcastaddr="226.94.1.1" mcastport="5405" />
                <interface ringnumber="1" bindnetaddr="10.0.0.0"
mcastaddr="226.94.1.2" mcastport="5405" />
        </totem>
        ...
</cluster>

Comment 3 Lon Hohberger 2010-07-13 18:29:04 UTC
(In reply to comment #2)
> The following keyword is incorrect:
> 
>    rppmode="active"
> 

Oops, I meant:

     rppmode="passive"

Comment 4 Lon Hohberger 2010-07-13 18:33:35 UTC
There is also no handling of the quiet="1" parameter in the <fencedevice> tags at this point.

Comment 5 Lon Hohberger 2010-07-13 18:34:50 UTC
Created attachment 431548 [details]
Fixed cluster.conf which validates.

Comment 6 Madison Kelly 2010-07-13 18:39:14 UTC
(In reply to comment #4)
> There is also no handling of the quiet="1" parameter in the <fencedevice> tags
> at this point.    

I built a new fence device. That argument is used by the fence agent I added. There is an addition to the cluster.rng I added to properly validate against it. I am working on adding support upstream.

Comment 7 Madison Kelly 2010-07-13 18:40:28 UTC
(In reply to comment #5)
> Created an attachment (id=431548) [details]
> Fixed cluster.conf which validates.    

I will test this tonight. I suspect it will work so this bug is probably safe to close.

Is there a comprehensive list of what openais/corosync arguments are and are not currently supported by cman's cluster.conf?

Comment 8 Madison Kelly 2010-07-14 04:06:58 UTC
I made the changes (used the cluster.conf attached here and marked as fixed) and it validates. However, on starting cman, I get this:

Jul 14 00:04:20 an-node01 kernel: DLM (built Jul  6 2010 22:33:59) installed
Jul 14 00:04:20 an-node01 corosync[2364]:   [MAIN  ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service.
Jul 14 00:04:20 an-node01 corosync[2364]:   [MAIN  ] Corosync built-in features: nss rdma
Jul 14 00:04:20 an-node01 corosync[2364]:   [MAIN  ] Successfully read config from /etc/cluster/cluster.conf
Jul 14 00:04:20 an-node01 corosync[2364]:   [MAIN  ] Successfully parsed cman config
Jul 14 00:04:20 an-node01 corosync[2364]:   [MAIN  ] Successfully configured openais services to load
Jul 14 00:04:20 an-node01 corosync[2364]:   [MAIN  ] parse error in config: No multicast address specified
Jul 14 00:04:20 an-node01 corosync[2364]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1430.

Comment 9 Fabio Massimo Di Nitto 2010-07-14 04:20:55 UTC
(In reply to comment #2)
> The following keyword is incorrect:
> 
>    rppmode="active"
> 
> Should be:
> 
>    rrp_mode="passive"
> 
> The following two keywords are not supported at this time by cman-preconfig;
> hence they are not valid as part of cluster.conf at this time:
> 
>    version="2"
>    threads="off"
> 

This shouldn't be a problem at all. cman-preconfig  copies corosync config bits pristine from within <cluster to the top level of the objdb where corosync can access them.

So theoretically any corosync config option can be changed from within cluster.conf, clearly the question if it makes sense still stands.

Comment 10 Lon Hohberger 2010-07-14 17:37:49 UTC
It's trivial to add corosync bits to cluster.conf schema; whatever we decide is fine.

I didn't realize cman-preconfig would just pass things up, so it's my error.

Comment 11 Madison Kelly 2010-07-14 17:57:43 UTC
(In reply to comment #10)
> It's trivial to add corosync bits to cluster.conf schema; whatever we decide is
> fine.
> 
> I didn't realize cman-preconfig would just pass things up, so it's my error.    

Will this lead to an updated cluster.rng?

Comment 12 Lon Hohberger 2010-07-15 14:29:28 UTC
As it turns out, according to bug 614697, you can't use cluster.conf to configure RRP mode using corosync directives when using a CMAN cluster:

https://bugzilla.redhat.com/show_bug.cgi?id=614697

See here for the correct way to configure RRP mode with CMAN-managed clusters:

http://sources.redhat.com/cluster/wiki/MultiHome

*** This bug has been marked as a duplicate of bug 614697 ***