Bug 557911

Summary:	Multi-homed servers grant NLM lock requests via wrong IP
Product:	Red Hat Enterprise Linux 5	Reporter:	Ray Van Dolson <rvandolson>
Component:	kernel	Assignee:	Jeff Layton <jlayton>
Status:	CLOSED DUPLICATE	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	high	Docs Contact:
Priority:	low
Version:	5.4	CC:	jlayton, regulus22, steved
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-01-23 14:44:49 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ray Van Dolson 2010-01-22 19:31:38 UTC

We have a two node NFS cluster backed by a GFS2 filesystem.  We've
noticed that RHEL NFS clients who request locks always hang, unless they
access the "passive" node of the cluster directly.

The problem appears to stem from the fact that when the NLM on the
primary node transmits its "GRANT" response to the client, it does so
via an asyncrhonous callback -- meaning that a new connection is
established to the client.  It appears that this connection is initiated
via the machine's primary IP, and not the "cluster" IP over which the
client first asked for the lock.

The client, rightly, rejects this response and continues blocking
forever.

As an aside, it seems that Solaris 10 NFS clients are not as "secure"
and happily accept a GRANT from any IP under the sun (no pun intended).

This post[1] to linux-nfs seems to indicate there is a kernel patch to
address this.  I have been unable to find the kernel commit, but am
curious if this has been backported to RHEL5's kernel or not.

This is a show-stopper for us and I will be filing an SR as well.  It
sounds like this is a known (and already resolved) issue, but I can
attach a packet dump if needed and steps to reproduce the problem.

[1] http://markmail.org/message/nd4lvfpiv6gkacio

Comment 1 Ray Van Dolson 2010-01-22 19:34:12 UTC

I should note the following:

Servers are running RHEL 5.4 kernel 2.6.18-164.6.1.el5 with
nfs-utils-1.0.9-42.el5.

Clients are RHEL 5.4 as well -- fully patched and latest kernels.

I know our server kernel isn't the latest, we just haven't rebooted in a
while.

Comment 2 Ray Van Dolson 2010-01-22 19:45:16 UTC

Opened SR #1988432 for this issue.

Comment 3 Jeff Layton 2010-01-23 14:44:49 UTC

I believe this is a duplicate of bug 500653. Closing as such. Please reopen if I've misunderstood the problem you're having.

*** This bug has been marked as a duplicate of bug 500653 ***