Bug 732991

Summary: dlm_controld errors during startup with redundant ring setup
Product: Red Hat Enterprise Linux 6 Reporter: Jaroslav Kortus <jkortus>
Component: clusterAssignee: David Teigland <teigland>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: ccaulfie, cluster-maint, djansa, fdinitto, lhh, rpeterso, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cluster-3.0.12.1-16.el6 Doc Type: Bug Fix
Doc Text:
Cause: corosync redundant ring configuration detected by dlm_controld. Consequence: dlm_controld would log harmless EEXIST errors "mkdir failed: 17" Fix: remove the error message. Result: no more error message.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 14:53:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaroslav Kortus 2011-08-24 12:03:55 UTC
Description of problem:
Aug 24 06:58:17 marathon-03 corosync[7251]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 24 06:58:20 marathon-03 fenced[7307]: fenced 3.0.12.1 started
Aug 24 06:58:20 marathon-03 dlm_controld[7333]: dlm_controld 3.0.12.1 started
Aug 24 06:58:21 marathon-03 dlm_controld[7333]: /sys/kernel/config/dlm/cluster/comms/1: mkdir failed: 17
Aug 24 06:58:21 marathon-03 dlm_controld[7333]: /sys/kernel/config/dlm/cluster/comms/2: mkdir failed: 17
Aug 24 06:58:21 marathon-03 dlm_controld[7333]: /sys/kernel/config/dlm/cluster/comms/3: mkdir failed: 17
Aug 24 06:58:21 marathon-03 dlm_controld[7333]: /sys/kernel/config/dlm/cluster/comms/4: mkdir failed: 17
Aug 24 06:58:21 marathon-03 dlm_controld[7333]: /sys/kernel/config/dlm/cluster/comms/5: mkdir failed: 17
Aug 24 06:58:22 marathon-03 gfs_controld[7381]: gfs_controld 3.0.12.1 started

These errors do not appear if only one ring is used.
Required setup is same as for bug 722469 (https://fedorahosted.org/cluster/wiki/MultiHome).

Selinux is in permissive mode.

Version-Release number of selected component (if applicable):
cman-3.0.12.1-14.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. setup cluster with redundant ring
2. service cman start
3.
  
Actual results:
errors as above

Expected results:
no errors

Additional info:
$ ll /sys/kernel/config/dlm/cluster/comms
total 0
drwxr-xr-x. 2 root root 0 Aug 24 06:58 1
drwxr-xr-x. 2 root root 0 Aug 24 06:58 2
drwxr-xr-x. 2 root root 0 Aug 24 06:58 3
drwxr-xr-x. 2 root root 0 Aug 24 06:58 4
drwxr-xr-x. 2 root root 0 Aug 24 06:58 5

Comment 2 David Teigland 2011-08-24 15:38:38 UTC
We just need to silence those messages (17/EEXIST is not a problem).
Will push out the change if/when this bz is approved.

Comment 3 David Teigland 2011-08-29 15:55:05 UTC
pushed change to remove the error messages to cluster.git RHEL6

use the steps above to test (setup rrp and service cman start)

Comment 5 Jaroslav Kortus 2011-08-31 10:50:48 UTC
Marking as verified, the messages no longer appear in cman-3.0.12.1-16.el6.x86_64

Comment 6 David Teigland 2011-10-27 14:39:33 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: corosync redundant ring configuration detected by dlm_controld.
Consequence: dlm_controld would log harmless EEXIST errors
"mkdir failed: 17"
Fix: remove the error message.
Result: no more error message.

Comment 7 errata-xmlrpc 2011-12-06 14:53:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1516.html