Bug 651409 - BAD SEQID error messages returned by the NFS server
Summary: BAD SEQID error messages returned by the NFS server
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: 4Suite
Version: 5.5
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: ---
Assignee: Sachin Prabhu
QA Contact: Petr Beňas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-09 14:34 UTC by Sachin Prabhu
Modified: 2018-11-14 17:44 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-21 10:14:49 UTC
Target Upstream Version:


Attachments (Terms of Use)
Reproducer (618 bytes, text/x-csrc)
2010-11-09 15:00 UTC, Sachin Prabhu
no flags Details
Proposed patch (7.10 KB, patch)
2010-11-12 11:48 UTC, Sachin Prabhu
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Description Sachin Prabhu 2010-11-09 14:34:00 UTC
A bug in the RHEL 5 NFSv4 client code will result in the sequence id being incremented even if the actual command for which the sequence id is incremented is not executed on the NFS server. 

An example is an OPEN command which fails due to PUTFH command on the server returning an error. In such cases, since the OPEN call was never made, the sequence id should not have been incremented. 

This results in the following message on the NFS client
kernel: NFS: v4 server returned a bad sequence-id error!

Upstream Patch:

commit c1d519312dcdf11532fed9f99a8ecc3547ffd9d6
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Mon Apr 7 13:20:54 2008 -0400

    NFSv4: Only increment the sequence id if the server saw it
    
    It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail
    before the actual stateful operation has been executed (for instance in the
    PUTFH call). There is no way to tell from the overall status result which
    operations were executed from the COMPOUND.
    
    The fix is to move incrementing of the sequence id into the XDR layer,
    so that we do it as we process the results from the stateful operation.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Comment 1 Sachin Prabhu 2010-11-09 15:00:18 UTC
Created attachment 459135 [details]
Reproducer

To reproduce:

1) Mount a NFSv4 share on a RHEL 5 client.

2) Compile attached reproducer and copy it to /tmp
gcc -o /tmp/open_cycle open_cycle.c

3) Create 3 directories with 3 test files in them.
# mkdir t1 t2 t3; for i in t1 t2 t3; do echo 123 > $i/file1; done;sync

4) You will need a terminal each open on the client and the server

On the client, run the command
client# cat t2/file1; sleep 5; /tmp/open_cycle t1/file1 t2/file1 t3/file1 

While the client pauses for 5 seconds, delete the directory t2 on the server
server# rm -rf t2

The client outputs the following

# cat t2/file1; sleep 5; /tmp/open_cycle t1/file1 t2/file1 t3/file1 
123
t2/file1: open: No such file or directory
t3/file1: 

and logs the following in /var/log/messages
kernel: NFS: v4 server returned a bad sequence-id error!




The test commands run on the client do the following

cat t2/file1, caches the filehandle for the directory t2. 

sleep 5, pauses the client for 5 seconds.

/tmp/open_cycle t1/file1 t2/file1 t3/file1 
results in the file first opening t1/file1, while this is open, it attempts to open t2/file1. This command fails at PUTFH since the underlying directory is no longer there. It then attempts to open t3/file1. It uses the same lock owner in all 3 cases. Since the sequence id is incorrectly incremented when it failed to open t2/file1, the first attempt to open t3/file1 fails with a NFS4ERR_BAD_SEQID.

Comment 2 Sachin Prabhu 2010-11-12 11:48:03 UTC
Created attachment 460024 [details]
Proposed patch

Based on the upstream commit

--
commit c1d519312dcdf11532fed9f99a8ecc3547ffd9d6
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Mon Apr 7 13:20:54 2008 -0400

    NFSv4: Only increment the sequence id if the server saw it
    
    It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail
    before the actual stateful operation has been executed (for instance in the
    PUTFH call). There is no way to tell from the overall status result which
    operations were executed from the COMPOUND.
    
    The fix is to move incrementing of the sequence id into the XDR layer,
    so that we do it as we process the results from the stateful operation.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
--

The patch adds pointers to the seqid elements to the res structures for locking operations. The sequence id is then updated from the corresponding decode_* functions for each of the locking functions.

This was compiled and tested against the reproducer attached to the bugzilla. The tcpdump was also used to confirm that the sequence id was not incremented wrongly when PUTFH failed in an OPEN COMPOUND request and no BAD_SEQID errors were received from the NFS server.

Comment 6 RHEL Product and Program Management 2011-02-01 16:50:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 Jarod Wilson 2011-03-03 20:33:45 UTC
in kernel-2.6.18-246.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 17 Petr Beňas 2011-03-04 11:36:14 UTC
Reproduced in 2.6.18-245.el5 and verified in 2.6.18-246.el5.

Comment 18 errata-xmlrpc 2011-07-21 10:14:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html


Note You need to log in before you can comment on or make changes to this bug.