Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 651409

Summary:

BAD SEQID error messages returned by the NFS server

Product:

Red Hat Enterprise Linux 5

Reporter:

Sachin Prabhu <sprabhu>

Component:

4Suite

Assignee:

Sachin Prabhu <sprabhu>

Status:

CLOSED ERRATA

QA Contact:

Petr Beňas <pbenas>

Severity:

medium

Docs Contact:

Priority:

high

Version:

5.5

CC:

bfields, jiali, jlayton, pbenas, pstehlik, qcai, rwheeler, steved

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-07-21 10:14:49 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Reproducer	none
Proposed patch	none

Description Sachin Prabhu 2010-11-09 14:34:00 UTC

A bug in the RHEL 5 NFSv4 client code will result in the sequence id being incremented even if the actual command for which the sequence id is incremented is not executed on the NFS server. 

An example is an OPEN command which fails due to PUTFH command on the server returning an error. In such cases, since the OPEN call was never made, the sequence id should not have been incremented. 

This results in the following message on the NFS client
kernel: NFS: v4 server returned a bad sequence-id error!

Upstream Patch:

commit c1d519312dcdf11532fed9f99a8ecc3547ffd9d6
Author: Trond Myklebust <Trond.Myklebust>
Date:   Mon Apr 7 13:20:54 2008 -0400

    NFSv4: Only increment the sequence id if the server saw it
    
    It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail
    before the actual stateful operation has been executed (for instance in the
    PUTFH call). There is no way to tell from the overall status result which
    operations were executed from the COMPOUND.
    
    The fix is to move incrementing of the sequence id into the XDR layer,
    so that we do it as we process the results from the stateful operation.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust>

Comment 1 Sachin Prabhu 2010-11-09 15:00:18 UTC

Created attachment 459135 [details]
Reproducer

To reproduce:

1) Mount a NFSv4 share on a RHEL 5 client.

2) Compile attached reproducer and copy it to /tmp
gcc -o /tmp/open_cycle open_cycle.c

3) Create 3 directories with 3 test files in them.
# mkdir t1 t2 t3; for i in t1 t2 t3; do echo 123 > $i/file1; done;sync

4) You will need a terminal each open on the client and the server

On the client, run the command
client# cat t2/file1; sleep 5; /tmp/open_cycle t1/file1 t2/file1 t3/file1 

While the client pauses for 5 seconds, delete the directory t2 on the server
server# rm -rf t2

The client outputs the following

# cat t2/file1; sleep 5; /tmp/open_cycle t1/file1 t2/file1 t3/file1 
123
t2/file1: open: No such file or directory
t3/file1: 

and logs the following in /var/log/messages
kernel: NFS: v4 server returned a bad sequence-id error!




The test commands run on the client do the following

cat t2/file1, caches the filehandle for the directory t2. 

sleep 5, pauses the client for 5 seconds.

/tmp/open_cycle t1/file1 t2/file1 t3/file1 
results in the file first opening t1/file1, while this is open, it attempts to open t2/file1. This command fails at PUTFH since the underlying directory is no longer there. It then attempts to open t3/file1. It uses the same lock owner in all 3 cases. Since the sequence id is incorrectly incremented when it failed to open t2/file1, the first attempt to open t3/file1 fails with a NFS4ERR_BAD_SEQID.

Comment 2 Sachin Prabhu 2010-11-12 11:48:03 UTC

Created attachment 460024 [details]
Proposed patch

Based on the upstream commit

--
commit c1d519312dcdf11532fed9f99a8ecc3547ffd9d6
Author: Trond Myklebust <Trond.Myklebust>
Date:   Mon Apr 7 13:20:54 2008 -0400

    NFSv4: Only increment the sequence id if the server saw it
    
    It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail
    before the actual stateful operation has been executed (for instance in the
    PUTFH call). There is no way to tell from the overall status result which
    operations were executed from the COMPOUND.
    
    The fix is to move incrementing of the sequence id into the XDR layer,
    so that we do it as we process the results from the stateful operation.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust>
--

The patch adds pointers to the seqid elements to the res structures for locking operations. The sequence id is then updated from the corresponding decode_* functions for each of the locking functions.

This was compiled and tested against the reproducer attached to the bugzilla. The tcpdump was also used to confirm that the sequence id was not incremented wrongly when PUTFH failed in an OPEN COMPOUND request and no BAD_SEQID errors were received from the NFS server.

Comment 6 RHEL Program Management 2011-02-01 16:50:47 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 Jarod Wilson 2011-03-03 20:33:45 UTC

in kernel-2.6.18-246.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 17 Petr Beňas 2011-03-04 11:36:14 UTC

Reproduced in 2.6.18-245.el5 and verified in 2.6.18-246.el5.

Comment 18 errata-xmlrpc 2011-07-21 10:14:49 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html