A bug in the RHEL 5 NFSv4 client code will result in the sequence id being incremented even if the actual command for which the sequence id is incremented is not executed on the NFS server. An example is an OPEN command which fails due to PUTFH command on the server returning an error. In such cases, since the OPEN call was never made, the sequence id should not have been incremented. This results in the following message on the NFS client kernel: NFS: v4 server returned a bad sequence-id error! Upstream Patch: commit c1d519312dcdf11532fed9f99a8ecc3547ffd9d6 Author: Trond Myklebust <Trond.Myklebust> Date: Mon Apr 7 13:20:54 2008 -0400 NFSv4: Only increment the sequence id if the server saw it It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail before the actual stateful operation has been executed (for instance in the PUTFH call). There is no way to tell from the overall status result which operations were executed from the COMPOUND. The fix is to move incrementing of the sequence id into the XDR layer, so that we do it as we process the results from the stateful operation. Signed-off-by: Trond Myklebust <Trond.Myklebust>
Created attachment 459135 [details] Reproducer To reproduce: 1) Mount a NFSv4 share on a RHEL 5 client. 2) Compile attached reproducer and copy it to /tmp gcc -o /tmp/open_cycle open_cycle.c 3) Create 3 directories with 3 test files in them. # mkdir t1 t2 t3; for i in t1 t2 t3; do echo 123 > $i/file1; done;sync 4) You will need a terminal each open on the client and the server On the client, run the command client# cat t2/file1; sleep 5; /tmp/open_cycle t1/file1 t2/file1 t3/file1 While the client pauses for 5 seconds, delete the directory t2 on the server server# rm -rf t2 The client outputs the following # cat t2/file1; sleep 5; /tmp/open_cycle t1/file1 t2/file1 t3/file1 123 t2/file1: open: No such file or directory t3/file1: and logs the following in /var/log/messages kernel: NFS: v4 server returned a bad sequence-id error! The test commands run on the client do the following cat t2/file1, caches the filehandle for the directory t2. sleep 5, pauses the client for 5 seconds. /tmp/open_cycle t1/file1 t2/file1 t3/file1 results in the file first opening t1/file1, while this is open, it attempts to open t2/file1. This command fails at PUTFH since the underlying directory is no longer there. It then attempts to open t3/file1. It uses the same lock owner in all 3 cases. Since the sequence id is incorrectly incremented when it failed to open t2/file1, the first attempt to open t3/file1 fails with a NFS4ERR_BAD_SEQID.
Created attachment 460024 [details] Proposed patch Based on the upstream commit -- commit c1d519312dcdf11532fed9f99a8ecc3547ffd9d6 Author: Trond Myklebust <Trond.Myklebust> Date: Mon Apr 7 13:20:54 2008 -0400 NFSv4: Only increment the sequence id if the server saw it It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail before the actual stateful operation has been executed (for instance in the PUTFH call). There is no way to tell from the overall status result which operations were executed from the COMPOUND. The fix is to move incrementing of the sequence id into the XDR layer, so that we do it as we process the results from the stateful operation. Signed-off-by: Trond Myklebust <Trond.Myklebust> -- The patch adds pointers to the seqid elements to the res structures for locking operations. The sequence id is then updated from the corresponding decode_* functions for each of the locking functions. This was compiled and tested against the reproducer attached to the bugzilla. The tcpdump was also used to confirm that the sequence id was not incremented wrongly when PUTFH failed in an OPEN COMPOUND request and no BAD_SEQID errors were received from the NFS server.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-246.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Reproduced in 2.6.18-245.el5 and verified in 2.6.18-246.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html