We have a two node NFS cluster backed by a GFS2 filesystem. We've
noticed that RHEL NFS clients who request locks always hang, unless they
access the "passive" node of the cluster directly.
The problem appears to stem from the fact that when the NLM on the
primary node transmits its "GRANT" response to the client, it does so
via an asyncrhonous callback -- meaning that a new connection is
established to the client. It appears that this connection is initiated
via the machine's primary IP, and not the "cluster" IP over which the
client first asked for the lock.
The client, rightly, rejects this response and continues blocking
As an aside, it seems that Solaris 10 NFS clients are not as "secure"
and happily accept a GRANT from any IP under the sun (no pun intended).
This post to linux-nfs seems to indicate there is a kernel patch to
address this. I have been unable to find the kernel commit, but am
curious if this has been backported to RHEL5's kernel or not.
This is a show-stopper for us and I will be filing an SR as well. It
sounds like this is a known (and already resolved) issue, but I can
attach a packet dump if needed and steps to reproduce the problem.
I should note the following:
Servers are running RHEL 5.4 kernel 2.6.18-164.6.1.el5 with
Clients are RHEL 5.4 as well -- fully patched and latest kernels.
I know our server kernel isn't the latest, we just haven't rebooted in a
Opened SR #1988432 for this issue.
I believe this is a duplicate of bug 500653. Closing as such. Please reopen if I've misunderstood the problem you're having.
*** This bug has been marked as a duplicate of bug 500653 ***