|Summary:||Multi-homed servers grant NLM lock requests via wrong IP|
|Product:||Red Hat Enterprise Linux 5||Reporter:||Ray Van Dolson <rvandolson>|
|Component:||kernel||Assignee:||Jeff Layton <jlayton>|
|Status:||CLOSED DUPLICATE||QA Contact:||Red Hat Kernel QE team <kernel-qe>|
|Version:||5.4||CC:||jlayton, regulus22, steved|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2010-01-23 14:44:49 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Ray Van Dolson 2010-01-22 19:31:38 UTC
We have a two node NFS cluster backed by a GFS2 filesystem. We've noticed that RHEL NFS clients who request locks always hang, unless they access the "passive" node of the cluster directly. The problem appears to stem from the fact that when the NLM on the primary node transmits its "GRANT" response to the client, it does so via an asyncrhonous callback -- meaning that a new connection is established to the client. It appears that this connection is initiated via the machine's primary IP, and not the "cluster" IP over which the client first asked for the lock. The client, rightly, rejects this response and continues blocking forever. As an aside, it seems that Solaris 10 NFS clients are not as "secure" and happily accept a GRANT from any IP under the sun (no pun intended). This post to linux-nfs seems to indicate there is a kernel patch to address this. I have been unable to find the kernel commit, but am curious if this has been backported to RHEL5's kernel or not. This is a show-stopper for us and I will be filing an SR as well. It sounds like this is a known (and already resolved) issue, but I can attach a packet dump if needed and steps to reproduce the problem.  http://markmail.org/message/nd4lvfpiv6gkacio
Comment 1 Ray Van Dolson 2010-01-22 19:34:12 UTC
I should note the following: Servers are running RHEL 5.4 kernel 2.6.18-164.6.1.el5 with nfs-utils-1.0.9-42.el5. Clients are RHEL 5.4 as well -- fully patched and latest kernels. I know our server kernel isn't the latest, we just haven't rebooted in a while.
Comment 2 Ray Van Dolson 2010-01-22 19:45:16 UTC
Opened SR #1988432 for this issue.