Bug 651409
Summary: | BAD SEQID error messages returned by the NFS server | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Sachin Prabhu <sprabhu> | ||||||
Component: | 4Suite | Assignee: | Sachin Prabhu <sprabhu> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Petr Beňas <pbenas> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 5.5 | CC: | bfields, jiali, jlayton, pbenas, pstehlik, qcai, rwheeler, steved | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-07-21 10:14:49 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Sachin Prabhu
2010-11-09 14:34:00 UTC
Created attachment 459135 [details]
Reproducer
To reproduce:
1) Mount a NFSv4 share on a RHEL 5 client.
2) Compile attached reproducer and copy it to /tmp
gcc -o /tmp/open_cycle open_cycle.c
3) Create 3 directories with 3 test files in them.
# mkdir t1 t2 t3; for i in t1 t2 t3; do echo 123 > $i/file1; done;sync
4) You will need a terminal each open on the client and the server
On the client, run the command
client# cat t2/file1; sleep 5; /tmp/open_cycle t1/file1 t2/file1 t3/file1
While the client pauses for 5 seconds, delete the directory t2 on the server
server# rm -rf t2
The client outputs the following
# cat t2/file1; sleep 5; /tmp/open_cycle t1/file1 t2/file1 t3/file1
123
t2/file1: open: No such file or directory
t3/file1:
and logs the following in /var/log/messages
kernel: NFS: v4 server returned a bad sequence-id error!
The test commands run on the client do the following
cat t2/file1, caches the filehandle for the directory t2.
sleep 5, pauses the client for 5 seconds.
/tmp/open_cycle t1/file1 t2/file1 t3/file1
results in the file first opening t1/file1, while this is open, it attempts to open t2/file1. This command fails at PUTFH since the underlying directory is no longer there. It then attempts to open t3/file1. It uses the same lock owner in all 3 cases. Since the sequence id is incorrectly incremented when it failed to open t2/file1, the first attempt to open t3/file1 fails with a NFS4ERR_BAD_SEQID.
Created attachment 460024 [details]
Proposed patch
Based on the upstream commit
--
commit c1d519312dcdf11532fed9f99a8ecc3547ffd9d6
Author: Trond Myklebust <Trond.Myklebust>
Date: Mon Apr 7 13:20:54 2008 -0400
NFSv4: Only increment the sequence id if the server saw it
It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail
before the actual stateful operation has been executed (for instance in the
PUTFH call). There is no way to tell from the overall status result which
operations were executed from the COMPOUND.
The fix is to move incrementing of the sequence id into the XDR layer,
so that we do it as we process the results from the stateful operation.
Signed-off-by: Trond Myklebust <Trond.Myklebust>
--
The patch adds pointers to the seqid elements to the res structures for locking operations. The sequence id is then updated from the corresponding decode_* functions for each of the locking functions.
This was compiled and tested against the reproducer attached to the bugzilla. The tcpdump was also used to confirm that the sequence id was not incremented wrongly when PUTFH failed in an OPEN COMPOUND request and no BAD_SEQID errors were received from the NFS server.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-246.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Reproduced in 2.6.18-245.el5 and verified in 2.6.18-246.el5. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html |