557911 – Multi-homed servers grant NLM lock requests via wrong IP

Bug 557911 - Multi-homed servers grant NLM lock requests via wrong IP

Summary: Multi-homed servers grant NLM lock requests via wrong IP

Keywords:
Status:	CLOSED DUPLICATE of bug 500653
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-01-22 19:31 UTC by Ray Van Dolson
Modified:	2014-06-18 07:39 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-01-23 14:44:49 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ray Van Dolson 2010-01-22 19:31:38 UTC

We have a two node NFS cluster backed by a GFS2 filesystem.  We've
noticed that RHEL NFS clients who request locks always hang, unless they
access the "passive" node of the cluster directly.

The problem appears to stem from the fact that when the NLM on the
primary node transmits its "GRANT" response to the client, it does so
via an asyncrhonous callback -- meaning that a new connection is
established to the client.  It appears that this connection is initiated
via the machine's primary IP, and not the "cluster" IP over which the
client first asked for the lock.

The client, rightly, rejects this response and continues blocking
forever.

As an aside, it seems that Solaris 10 NFS clients are not as "secure"
and happily accept a GRANT from any IP under the sun (no pun intended).

This post[1] to linux-nfs seems to indicate there is a kernel patch to
address this.  I have been unable to find the kernel commit, but am
curious if this has been backported to RHEL5's kernel or not.

This is a show-stopper for us and I will be filing an SR as well.  It
sounds like this is a known (and already resolved) issue, but I can
attach a packet dump if needed and steps to reproduce the problem.

[1] http://markmail.org/message/nd4lvfpiv6gkacio

Comment 1 Ray Van Dolson 2010-01-22 19:34:12 UTC

I should note the following:

Servers are running RHEL 5.4 kernel 2.6.18-164.6.1.el5 with
nfs-utils-1.0.9-42.el5.

Clients are RHEL 5.4 as well -- fully patched and latest kernels.

I know our server kernel isn't the latest, we just haven't rebooted in a
while.

Comment 2 Ray Van Dolson 2010-01-22 19:45:16 UTC

Opened SR #1988432 for this issue.

Comment 3 Jeff Layton 2010-01-23 14:44:49 UTC

I believe this is a duplicate of bug 500653. Closing as such. Please reopen if I've misunderstood the problem you're having.

*** This bug has been marked as a duplicate of bug 500653 ***

Note You need to log in before you can comment on or make changes to this bug.