Bug 1120850 - unable recover NFSv3 locks NLM_DENIED_NOLOCK
Summary: unable recover NFSv3 locks NLM_DENIED_NOLOCK
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: J. Bruce Fields
QA Contact: Yin.JianHong
URL:
Whiteboard:
Depends On:
Blocks: 1069341 1091101 1118356 1134854 1150889 1168400
TreeView+ depends on / blocked
 
Reported: 2014-07-17 21:00 UTC by David Vossel
Modified: 2015-03-05 12:30 UTC (History)
13 users (show)

Fixed In Version: kernel-3.10.0-191.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1150889 (view as bug list)
Environment:
Last Closed: 2015-03-05 12:30:10 UTC


Attachments (Terms of Use)
[PATCH] lockd: allow rebinding to statd (1.48 KB, patch)
2014-08-20 21:34 UTC, J. Bruce Fields
no flags Details | Diff
[PATCH] lockd: allow rebinding to statd (2.48 KB, patch)
2014-08-28 17:53 UTC, J. Bruce Fields
no flags Details | Diff
[PATCH] lockd: allow rebinding to statd (2.16 KB, patch)
2014-09-19 16:51 UTC, J. Bruce Fields
no flags Details | Diff
statd_port_change_recovery.patch (1.91 KB, patch)
2014-09-23 19:41 UTC, Benjamin Coddington
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0290 normal SHIPPED_LIVE Important: kernel security, bug fix, and enhancement update 2015-03-05 16:13:58 UTC

Description David Vossel 2014-07-17 21:00:37 UTC
Description of problem:

If we restart an NFS server without rebooting the node, it is expected that clients can reclaim their previously held locks after the NFS server after sm-notify sends out the status notifications.

Version-Release number of selected component (if applicable):

nfs-utils.x86_64                       1:1.3.0-0.el7

How reproducible:
100%

Steps to Reproduce:

You need two nodes.
node1 is server
node2 is nfs client

1. start NFS server on node1. Note, the modprobe is necessary because of another unrelated bug in the nfs systemd unit files that involves not requiring the /proc/fs/nfsd mount before nfsserver starts.  I am using STATDARG=--no-notify in order to prevent nfs-lock from doing sm-notifications automatically.


echo STATDARG="--no-notify" > /etc/sysconfig/nfs
modprobe nfsd
systemctl start nfs-lock
systemctl start nfs-server
mkdir /root/testnfs
touch /root/testnfs/file
exportfs -v fsid=0,rw,sync,no_root_squash 192.168.122.0/255.255.255.0:/root/testnfs


2. on node2 mount the client, grab a file lock.

mkdir /root/testmount
mount -v -o "vers=3" rhel7-alt1:/root/testnfs /root/testmount
flock /root/testmount/file -c "sleep 10000"

3. Start wireshark on node2 to view the NLM lock traffic.

tshark -V -i eth0 -Y nlm

4. Restart the nfs server on node1. Note that I am recreating the statd state file. In this scenario this is required for the client to attempt to reclaim the lock. Without creating a new state file different from the previous one, the nfs client will ignore the sm-notification.

exportfs -v -u 192.168.122.0/255.255.255.0:/root/testnfs
systemctl stop nfs-server
systemctl stop nfs-lock
sleep 2
dd if=/dev/urandom of=/var/lib/nfs/statd/state bs=1 count=4 &> /dev/null
systemctl start nfs-lock
systemctl start nfs-server
exportfs -v fsid=0,rw,sync,no_root_squash 192.168.122.0/255.255.255.0:/root/testnfs
sm-notify -f

5. Watch the NLM traffic on node2, After node2 attempts to reclaim its lock, instead of seeing the server grant that lock reclaim because it occurred during the server's grace period, you'll see NLM_DENIED_NOLOCK


From that point forward, any attempt to lock the file on the client results in this message

# flock /root/testmount/file -c "sleep 1"
flock: /root/testmount/file: No locks available

/var/log/messages shows this on node2

"Jul 17 16:38:56 rhel7-base kernel: lockd: failed to reclaim lock for pid 2093 (errno 0, status 2)"

On the server we see this message in /var/log/messages
Jul 17 16:38:56 rhel7-base kernel: lockd: cannot monitor rhel7-alt2


Actual results:
Client can not reclaim active locks on restarted NFS server during grace period.

Expected results:
Client can reclaim locks after NFS server restart.

Additional info:
This works fine in rhel6.

Comment 6 J. Bruce Fields 2014-08-08 20:09:49 UTC
(In reply to David Vossel from comment #0)
> 4. Restart the nfs server on node1. Note that I am recreating the statd
> state file. In this scenario this is required for the client to attempt to
> reclaim the lock. Without creating a new state file different from the
> previous one, the nfs client will ignore the sm-notification.

This, by the way, is surprising--sm-notify has code to update the state file itself (unless you tell it not to with the -n option), so the sm-notify at the end here:

> exportfs -v -u 192.168.122.0/255.255.255.0:/root/testnfs
> systemctl stop nfs-server
> systemctl stop nfs-lock
> sleep 2
> dd if=/dev/urandom of=/var/lib/nfs/statd/state bs=1 count=4 &> /dev/null
> systemctl start nfs-lock
> systemctl start nfs-server
> exportfs -v fsid=0,rw,sync,no_root_squash
> 192.168.122.0/255.255.255.0:/root/testnfs
> sm-notify -f

should have done the job.

Comment 7 David Vossel 2014-08-11 14:59:07 UTC
(In reply to J. Bruce Fields from comment #6)
> (In reply to David Vossel from comment #0)
> > 4. Restart the nfs server on node1. Note that I am recreating the statd
> > state file. In this scenario this is required for the client to attempt to
> > reclaim the lock. Without creating a new state file different from the
> > previous one, the nfs client will ignore the sm-notification.
> 
> This, by the way, is surprising--sm-notify has code to update the state file
> itself (unless you tell it not to with the -n option), so the sm-notify at
> the end here:
> 
> > exportfs -v -u 192.168.122.0/255.255.255.0:/root/testnfs
> > systemctl stop nfs-server
> > systemctl stop nfs-lock
> > sleep 2
> > dd if=/dev/urandom of=/var/lib/nfs/statd/state bs=1 count=4 &> /dev/null
> > systemctl start nfs-lock
> > systemctl start nfs-server
> > exportfs -v fsid=0,rw,sync,no_root_squash
> > 192.168.122.0/255.255.255.0:/root/testnfs
> > sm-notify -f
> 
> should have done the job.

Perhaps that is the case now. In rhel6 I know if I didn't re-create the statefile after stop/start of the nfs-server, the statefile would not be updated and the SM_NOTIFY requests would be ignored by the clients because it would look like the service instance never changed.

-- Vossel

Comment 8 David Vossel 2014-08-12 16:17:56 UTC
This still doesn't work.

After further testing, it turns out using fnctl has the exact same behavior as flock.  My previous tests showed that locks grabbed using fnctl would in fact failover between nodes, but this actually already worked with flock as well. The original steps to reproduce this issue still fail in the exact same way regardless if flock or fnctl is used.

Below are the updated steps (using fnctl) required to reproduce this issue standalone, outside of pacemaker or the HA resource-agents.

You need two nodes.
node1 is server
node2 is nfs client

1. start NFS server on node1. Note, the modprobe is necessary because of another unrelated bug in the nfs systemd unit files that involves not requiring the /proc/fs/nfsd mount before nfsserver starts.  I am using STATDARG=--no-notify in order to prevent nfs-lock from doing sm-notifications automatically.


echo STATDARG="--no-notify" > /etc/sysconfig/nfs
modprobe nfsd
systemctl start nfs-lock
systemctl start nfs-server
mkdir /root/testnfs
touch /root/testnfs/file
exportfs -v -o fsid=0,rw,sync,no_root_squash 192.168.122.0/255.255.255.0:/root/testnfs


2. on node2 mount the client, grab a file lock. You won't be able to establish the lock until after the server's grace period expires. (90s by default)

mkdir /root/testmount
mount -v -o "vers=3" rhel7-auto1:/root/testnfs /root/testmount
wget https://raw.githubusercontent.com/davidvossel/phd/master/misc/fnctl_locker.c
gcc fnctl_locker.c
./a.out -f /root/testmount/file


3. Start wireshark on node2 to view the NLM lock traffic.

tshark -V -i eth0 -Y nlm

4. Restart the nfs server on node1. Note that I am recreating the statd state file. In this scenario this is required for the client to attempt to reclaim the lock. Without creating a new state file different from the previous one, the nfs client will ignore the sm-notification.

exportfs -v -u 192.168.122.0/255.255.255.0:/root/testnfs
systemctl stop nfs-server
systemctl stop nfs-lock
sleep 2
dd if=/dev/urandom of=/var/lib/nfs/statd/state bs=1 count=4 &> /dev/null
systemctl start nfs-lock
systemctl start nfs-server
exportfs -v -o fsid=0,rw,sync,no_root_squash 192.168.122.0/255.255.255.0:/root/testnfs
sm-notify -f

5. Watch the NLM traffic on node2, After node2 attempts to reclaim its lock, instead of seeing the server grant that lock reclaim because it occurred during the server's grace period, you'll see NLM_DENIED_NOLOCK

From that point forward, any attempt to lock the file on the client results in this message

./a.out -f /root/testmount/file 
Attempting to get write lock...
fcntl: No locks available


Note: in step 4 we discussed that updating the state file isn't necessary. I've reproduced this problem with and without the state file update. The results are the same.

-- Vossel

Comment 9 J. Bruce Fields 2014-08-15 20:55:42 UTC
I can reproduce this, apologies for being slow to get to it.

One other odd thing I notice about the directions is the exportfs ordering--we want to set up the exports before starting the server.  (See nfs-utils/README.)

Comment 10 J. Bruce Fields 2014-08-15 20:56:03 UTC
(But the exportfs ordering is unrelated to this issue.)

Comment 11 David Vossel 2014-08-15 21:37:04 UTC
(In reply to J. Bruce Fields from comment #9)
> I can reproduce this, apologies for being slow to get to it.
> 
> One other odd thing I notice about the directions is the exportfs
> ordering--we want to set up the exports before starting the server.  (See
> nfs-utils/README.)


Yes, we are aware of this. We're using a floating IP though, so we have more control over when requests are received than the standalone use-case.

The directions you're referring to.

"C/ exportfs -av ; rpc.mountd
       It is important that exportfs be run before mountd so that
       mountd is working from current information (in
       /var/lib/nfs/etab).
       It is also important that both of these are run before
       rpc.nfsd.
       If not, any NFS requests that arrive before mountd is started
       will get replied to with a 'Stale NFS File handle' error.
"

In the HA NFS use-cases, no NFS requests can arrive to mountd before an export is started. We can be assured of this because the floating IP clients use to make requests to the server doesn't start until after the nfs daemons and exports have started.

I didn't put the floating ip in the steps to reproduce this issue because it is irrelevant.  All the other steps reflect exactly what we're doing in the HA NFS active/passive use-case during startup and recovery.

-- Vossel

Comment 12 J. Bruce Fields 2014-08-19 20:58:06 UTC
If I turn on nsm debugging with "rpcdebug -m nlm -s all", among other things I see this

  lockd: NSM upcall RPC failed, status=-111

logged (111 = ECONNREFUSED), which seems suspicious.

Comment 13 J. Bruce Fields 2014-08-20 19:52:42 UTC
From "rpcinfo" run on the server, before restarting:

  100024    1    tcp       0.0.0.0.148.181        status     29

and after restarting:

  100024    1    tcp       0.0.0.0.175.82         status     29

And capturing traffic on the server's "lo", I see failed attempts to connect to port 38069 (== 148.181).  So the server's attempting to contact the new statd without doing a new portmap call.

Comment 14 J. Bruce Fields 2014-08-20 21:34:11 UTC
Created attachment 928949 [details]
[PATCH] lockd: allow rebinding to statd

It looks like this might do the job, but I haven't tested it yet.

Comment 16 J. Bruce Fields 2014-08-21 14:59:48 UTC
That test kernel is still failing my test.  I haven't figured out why yet.

Comment 17 J. Bruce Fields 2014-08-28 17:53:02 UTC
Created attachment 932433 [details]
[PATCH] lockd: allow rebinding to statd

The RPC_CLNT_CREATE_AUTOBIND flag isn't working as I expected.  The attached appears to do the job partly by just adding a force_rebind in one more place in the client rpc code, but I'm not at all sure this is correct.

There are some known bugs in the rpc client's error handling in this area (see https://bugzilla.redhat.com/show_bug.cgi?id=1134911) so it may be worth retrying withe AUTOBIND approach with those patches.

But it might be simpler just for the nsm code to deal with this by hand, somehow, which is what the nlm code seems to do.

Comment 19 J. Bruce Fields 2014-09-18 19:39:33 UTC
Trond had one more suggestion:

http://mid.gmane.org/<CAHQdGtRZ43npC---LK1-kZy91V9PCk63g_QAR_5DsrfFM-2u4Q@mail.gmail.com>

I'll do some more testing.

Comment 21 J. Bruce Fields 2014-09-19 16:51:49 UTC
Created attachment 939392 [details]
[PATCH] lockd: allow rebinding to statd

I tried adding RPC_CLNT_CREATE_HARDRTRY (attached), but that doesn't seem to do it.

The later ENOLCKs are gone, but the reclaim still isn't happening and I haven't figured out why yet.

(My test is:

# One-time setup, on server:

mkdir /exports
chmod a+rwx /exports
touch /exports/file
echo "/exports $CLIENT(rw)" >/etc/exports
echo STATDARG="--no-notify" > /etc/sysconfig/nfs

# On client:

mount -v -o "vers=3" $SERVER:/exports /mnt
flock /mnt/file sleep 10000

# On server:

systemctl stop nfs-server
systemctl stop nfs-lock
sleep 2
systemctl start nfs-lock
systemctl start nfs-server
sm-notify -f
sleep 2
grep $(stat -c'%i' /exports/file) /proc/locks)

The final grep should show a line for the lock something like (details may vary):
5: POSIX  ADVISORY  WRITE 11307 fd:00:135822054 0 EOF

Comment 22 Benjamin Coddington 2014-09-23 19:41:13 UTC
Created attachment 940590 [details]
statd_port_change_recovery.patch

I think this should allow re-connection to statd.  I've submitted it upstream for review:

http://marc.info/?l=linux-nfs&m=141148958603889&w=2

Comment 28 Jarod Wilson 2014-10-22 16:18:57 UTC
Patch(es) available on kernel-3.10.0-191.el7

Comment 36 Yin.JianHong 2015-01-13 04:33:33 UTC
Verified:
  bkr job: https://beaker.engineering.redhat.com/jobs/847291

  test log:
-----------
[12:02:05 root@ ~~]# service_nfs stop
Redirecting to /bin/systemctl stop  nfs.service
:: [   PASS   ] :: Running 'service_nfs stop' (Expected 0, got 0)
--------------------------------------------------------------------------------
[12:02:05 root@ ~~]# service_nfslock stop
Redirecting to /bin/systemctl stop  nfs-lock.service
:: [   PASS   ] :: Running 'service_nfslock stop' (Expected 0, got 0)
--------------------------------------------------------------------------------
[12:02:05 root@ ~~]# sleep 2
:: [   PASS   ] :: Running 'sleep 2' (Expected 0, got 0)
--------------------------------------------------------------------------------
[12:02:07 root@ ~~]# service_nfslock start
Redirecting to /bin/systemctl start  nfs-lock.service
:: [   PASS   ] :: Running 'service_nfslock start' (Expected 0, got 0)
--------------------------------------------------------------------------------
[12:02:08 root@ ~~]# service_nfs start
Redirecting to /bin/systemctl start  nfs.service
{Info} nfs rpcinfo:
    100005  3,2,1     tcp6,udp6,tcp,udp                mountd      superuser
    100003  4,3       udp6,tcp6,udp,tcp                nfs         superuser
:: [   PASS   ] :: Running 'service_nfs start' (Expected 0, got 0)
--------------------------------------------------------------------------------
[12:02:10 root@ ~~]# sm-notify -f
:: [   PASS   ] :: Running 'sm-notify -f' (Expected 0, got 0)
--------------------------------------------------------------------------------
[12:02:10 root@ ~~]# sleep 80
:: [   PASS   ] :: Running 'sleep 80' (Expected 0, got 0)
--------------------------------------------------------------------------------
[12:03:30 root@ ~~]# grep $(stat -c"%i" $expdir/testfile) /proc/locks
1: POSIX  ADVISORY  WRITE 15830 fd:00:1802629 0 EOF
:: [   PASS   ] :: Running 'grep $(stat -c"%i" $expdir/testfile) /proc/locks' (Expected 0, got 0)
--------------------------------------------------------------------------------
[12:03:30 root@ ~~]# ls -i $expdir/testfile
1802629 /exportDir-bz1120850/testfile

Comment 39 errata-xmlrpc 2015-03-05 12:30:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0290.html


Note You need to log in before you can comment on or make changes to this bug.