This service will be undergoing maintenance at 20:00 UTC, 2017-04-03. It is expected to last about 30 minutes
Bug 155393 - RFE: option to disable ARP checking on connections
RFE: option to disable ARP checking on connections
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: clumanager (Show other bugs)
3
All Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On: 149311
Blocks: 162166
  Show dependency treegraph
 
Reported: 2005-04-19 17:40 EDT by Lon Hohberger
Modified: 2009-04-16 16:17 EDT (History)
7 users (show)

See Also:
Fixed In Version: RHBA-2005-676
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-09-30 10:56:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch which kills ARP checking (999 bytes, patch)
2005-05-25 12:42 EDT, Lon Hohberger
no flags Details | Diff
Corrected patch: Kill ARP table checking (2.32 KB, patch)
2005-05-25 12:48 EDT, Lon Hohberger
no flags Details | Diff
Patch fully implementing RFE. (3.29 KB, patch)
2005-06-17 15:24 EDT, Lon Hohberger
no flags Details | Diff
rebuilt clumanager package (519.71 KB, application/octet-stream)
2005-06-24 10:25 EDT, Brent Fox
no flags Details

  None (edit)
Description Lon Hohberger 2005-04-19 17:40:12 EDT
Description of problem:

In some cases where a node's network configuration is incorrect or configured in
such a way that ARP request and/or information is blocked, clumanager won't
converge on a view of membership.

Problems which have been reported around the cluster not "converging" even
though they "see" each other are difficult to track down.

With sufficient tinkering, it's possible to get the cluster working.  However,
in some cases, it may be desirable to disable the ARP checks to get the cluster
up and running, saving the network configuration problem(s) for later.

Furthermore, the ARP check presumes that the user is using an ethernet network,
which has never, to my knowledge, been a stated requirement of RHCS.
Comment 2 Lon Hohberger 2005-05-25 12:42:16 EDT
Created attachment 114843 [details]
Patch which kills ARP checking
Comment 3 Lon Hohberger 2005-05-25 12:48:01 EDT
Created attachment 114845 [details]
Corrected patch: Kill ARP table checking
Comment 4 Lon Hohberger 2005-05-25 12:54:57 EDT
The introduction of this patch will have a couple of outcomes:

(a) Incorrectly configured service IPs which take over the default route due to
incorrect subnet masks will make the cluster cease to function.  Ex: Main
cluster IP is 192.168.0.1/24 and we add 192.168.0.2/16 as a service IP, bringing
up that service IP will kill the cluster node's traffic.  In part, this case was
what why we added the ARP check in the first place: so a misconfiguration didn't
cause a node outage.

(b) Incorrectly configured network stuff will mysteriously start working again.
 The "Denied connect from Foo, not in subnet" messages will go away.
Comment 5 Brent Fox 2005-05-25 16:39:10 EDT
Lon, that patch works fine.  I rebuilt clumanager-1.2.26.1 with the ARP patch,
and it fixes the network tiebreaker bug as well.  What we need now is an
official hotfix package that we can support.  
Comment 6 Lon Hohberger 2005-05-25 17:09:22 EDT
Adding PM for further evaluation.
Comment 10 Lon Hohberger 2005-06-07 12:34:38 EDT
Downgrading to  initscripts-7.31.9.EL-1 also seems to fix this problem; there's
a possible regression in the initscripts code which apparently causes problems
with bonded routing.  We're still evaluating this.
Comment 15 Lon Hohberger 2005-06-17 14:55:04 EDT
I'm going to add the option to toggle this -- just in case we don't have a fix
for the initscripts package any time soon.  The change is low impact.
Comment 16 Lon Hohberger 2005-06-17 15:24:07 EDT
Created attachment 115636 [details]
Patch fully implementing RFE.

This adds a configuration key:

      cluster%msgsvc_noarp

When set to "yes" or "1", the cluster software will not use its internal ARP
table to validate new connections (nor will it ask the kernel for any ARP
information).  This should quash "not in subnet" messages regardless of the
version of initscripts installed.
Comment 22 Brent Fox 2005-06-24 10:25:06 EDT
Created attachment 115934 [details]
rebuilt clumanager package

Attaching a rebuild clumanager with this patch.  

Note: this version is for testing the noarp patch only.  It is NOT intended for
production systems at this point.
Comment 24 Brent Fox 2005-06-24 15:52:14 EDT
Lon, can you explain more about in what file the user is supposed to set the
"cluster%msgsvc_noarp" configuration key?  
Comment 25 Lon Hohberger 2005-06-24 16:02:31 EDT
Yes, the patch should have edited "man cludb"

To block arp checking:

cludb -p cluster%msgsvc_noarp 1
cludb -p cluster%msgsvc_noarp yes

To re-enable it:

cludb -p cluster%msgsvc_noarp 0
cludb -p cluster%msgsvc_noarp no



Comment 26 Lon Hohberger 2005-06-24 16:05:05 EDT
Sorry - to be clear: you only have to do one of the above two things to change it.

e.g. "1" is the same as "yes", "0" is the same as "no"
Comment 39 Lon Hohberger 2005-07-11 18:00:54 EDT
I'm worried that this conflicts with another problem that Chris Kloiber saw.

Is there any way we can have the customer(s) run "clurgmgrd -fd" and get the
output up to the point where it says "Cluster I/F: xxxxxx" ?

Comment 41 Lon Hohberger 2005-07-15 10:27:17 EDT
1.2.27pre1-1 is available from:

http://people.redhat.com/lhh/packages.html

It should address this issue.  This package should be used for testing only.
Comment 50 Red Hat Bugzilla 2005-09-30 10:56:19 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-676.html

Note You need to log in before you can comment on or make changes to this bug.