Bug 163587 - kernel: CMANsendmsg failed: -101
kernel: CMANsendmsg failed: -101
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cman (Show other bugs)
4
All Linux
medium Severity low
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-07-19 03:39 EDT by Christine Caulfield
Modified: 2009-04-16 16:00 EDT (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2005-734
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-07 12:46:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Christine Caulfield 2005-07-19 03:39:55 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.7.8) Gecko/20050524 Fedora/1.0.4-4 Firefox/1.0.4

Description of problem:
There are a couple of things that can cause this message. The easist is to simply down the interface that cman is using and watch the messages scroll up until the node gets fenced.

In some more extreme circumstances it can prevent reboot of the machine (though I don't seem to be able to reproduce this with more recent kernels).

In any case it's a tatty message. cman should either wait quietly to be fenced or quit if all its channels of communication have been cut.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. start cman
2. ifconfig eth0 down
3. watch messages
  

Additional info:

Normally this not a problem, if it gets the point where it prevents a reboot that's usually a configuration error (cman not being shut down by the init scripts).
Comment 1 Lon Hohberger 2005-07-19 16:24:21 EDT
I think it's CMAN being shut down with no network connectivity, which causes the
problem.  The simulation is a bonded interface losing all connectivity
simultaneously, followed by a non-powercycle-fence.

Non-powercycle-fence events are generally non-recoverable.  That is, the node
can't rejoin the cluster by itself -- it requires manual intervention of some
form, because it could still have things waiting to be flushed (which are only
prevented by the fact that the node has been fenced off...).

Here's how to get around this:
(a) Instead of typing "reboot", try "reboot -fn"

(b) Press the power button and hold it for 5 seconds, release, then press it
again for 1 second.

(c) Press the reset button ;)
Comment 2 Christine Caulfield 2005-07-20 10:42:59 EDT
I've checked in a fix to the STABLE branch. If you can, plese let me know how
you get on with it.

Checking in cnxman.c;
/cvs/cluster/cluster/cman-kernel/src/cnxman.c,v  <--  cnxman.c
new revision: 1.42.2.12.4.1.2.1; previous revision: 1.42.2.12.4.1
done
Comment 3 Christine Caulfield 2005-08-02 11:14:21 EDT
Also committed to RHEL4 branch

Checking in cnxman.c;
/cvs/cluster/cluster/cman-kernel/src/cnxman.c,v  <--  cnxman.c
new revision: 1.42.2.13; previous revision: 1.42.2.12
done
Comment 4 Corey Marthaler 2005-08-31 15:42:45 EDT
I'm still seeing these messages running revolver (see bz165160), is there still
a case which can cause this message to occur?
Comment 5 Christine Caulfield 2005-09-01 03:19:49 EDT
errno -101 is EUNETNREACH. I've only see it when the network interface is downed
but it can happen if the route or IP address is changed I suppose, such that
cman can't send a packet to its broadcast address.

So, the message isn't going to go away completely because the condition that
causes it to happen is external to CMAN. 

What this bug was (originally) was the looping and prevention of a clean reboot.
cman now shuts itself down. Even without this it would be fenced out of the
cluster by the other nodes because the heartbeat messages are not reaching the
network.
Comment 7 Red Hat Bugzilla 2005-10-07 12:46:48 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-734.html

Note You need to log in before you can comment on or make changes to this bug.