Bug 529498

Summary: /etc/init.d/cman fails in set_networking_params with 3.0.2 and 3.0.3
Product: [Fedora] Fedora Reporter: Thomas Sjolshagen <thomas.sjolshagen>
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 11CC: agk, cfeist, fdinitto, gianluca.cecchi, lhh, swhiteho
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-10-19 04:48:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/etc/sysconfig/cman file
none
Log from "bash -x /etc/init.d/cman start" run
none
proposed fix none

Description Thomas Sjolshagen 2009-10-17 17:40:09 UTC
Description of problem:

When running the /etc/init.d/cman startup script, it fails while executing the set_networking_params() function on both of the members of my Fedora 11 based cluster cluster.

Version-Release number of selected component (if applicable):

cman-3.0.3-1.fc11.x86_64
openaislib-1.1.0-1.fc11.x86_64
openais-1.1.0-1.fc11.x86_64
rgmanager-3.0.3-1.fc11.x86_64
gfs2-utils-3.0.3-1.fc11.x86_64
lvm2-cluster-2.02.48-2.fc11.x86_64
corosynclib-1.1.0-1.fc11.x86_64
corosync-1.1.0-1.fc11.x86_64
kernel-2.6.30.8-64.fc11.x86_64

How reproducible:

Every time

Steps to Reproduce:
1. Boot cluster node with /etc/init.d/cman enabled

Or
1. service cman start

  
Actual results:

"Setting network parameters...        [FAILED]"

and cman script stops executing resulting in the cluster member not joining the cluster.

Expected results:

"Setting network parameters...        [OK]"

and cman script completing with the node having joined the cluster.

Additional info:

Attaching a log file showing that because the default (existing) /proc/sys/net/core/rmem_max value is _greater_ than the expected value, setting the value to whatever the cluster needs/wants is failing. 

Would think the test should be to validate that the rmem_max (and rmem_default) are set to a value greater or equal to what the cluster stack needs, the startup would proceed, if not the values get elevated. This since other applications (3rd party) may require a higher default network read buffer value than what the cluster software stack needs on its own?

Comment 1 Thomas Sjolshagen 2009-10-17 17:41:23 UTC
Created attachment 365130 [details]
/etc/sysconfig/cman file

Comment 2 Thomas Sjolshagen 2009-10-17 17:46:17 UTC
Created attachment 365131 [details]
Log from "bash -x /etc/init.d/cman start" run

Log file showing failed /etc/init.d/cman start.

Comment 3 Fabio Massimo Di Nitto 2009-10-17 17:57:05 UTC
Created attachment 365133 [details]
proposed fix

Please patch /etc/init.d/cman and test.

The patch should address the issue

Thanks

Comment 4 Thomas Sjolshagen 2009-10-18 16:49:17 UTC
Tested the patch. The cman service now starts with set_networking_params enabled as part of the start action.

Comment 5 Fabio Massimo Di Nitto 2009-10-19 04:48:31 UTC
Fix is now upstream.

git commit 1ece3abed41a6debf4175201c4061108e9034e68

Fabio

Comment 6 Gianluca Cecchi 2009-10-21 13:58:10 UTC
ok also for me, 
I had the same problem after updating from version 3.0.2-1.fc11.x86_64 to 3.0.3-1.fc11.x86_64
Without the proposed patch I get:
[root@r]# service cman start
Starting cluster:
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Setting network parameters... FATAL: Module lock_dlm not found.
                                                           [FAILED]

Now with the proposed patch all is ok.
Thanks,
Gianluca

Comment 7 Fabio Massimo Di Nitto 2009-10-21 14:11:03 UTC
update packages for F11 are available in koji and bodhi.

They should be available "soonish" (it's a manual process) in f10 and f11 updates channels.

Fabio