Bug 476084

Summary: kernel: NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff880052c57228!
Product: [Fedora] Fedora Reporter: Louis Lagendijk <louis>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 10CC: fcdanilo, kernel-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-08-12 19:40:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Louis Lagendijk 2008-12-11 20:29:51 UTC
Description of problem:
I am using nfsv4 towards a Centos-5 server. The server is mounted using kerberos. The relevant part of the fstab looks as follows:
nest.pheasant:/home1    /home/home1             nfs4    sec=krb5        0 0

At some point in time (mainly after some heavy nfs activity) the kernel keeps repeating "NFS: v4 server returned a bad sequence-id error" It does not recover. Reboot seems to be the only recovery possible.

The Fedora 9 kernel occasionally also gives this type of errors, but that kernel does recover from the error, while the fedora 10 kernel does not.

Version-Release number of selected component (if applicable):
kernel-2.6.27.7-134.fc10.x86_64
nfs-utils-1.1.4-4.fc10.x86_64

How reproducible:
Some heavy nfs activity. I noticed this error especially while installing a virtual machine in VirtualBox, but I have also seen the errors while using evolution 

Steps to Reproduce:
1. heavy nfs activity
2.
3.
  
Actual results:
errors keep repeating in /var/log/messages

Expected results:
system recovers (or even better, no such errors at all)

Additional info:

Comment 1 Louis Lagendijk 2008-12-11 21:40:58 UTC
I forgot to show the error message on the server side. Here are the errors that the Centos 5 server reports (from a recent Fedora 9 client, as I am back to Fedora 9 for the time being as Fedora 10 is not usable):

NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 2, got 3)
NFSD: preprocess_seqid_op: bad seqid (expected 2, got 3)
NFSD: preprocess_seqid_op: bad seqid (expected 2, got 3)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)
NFSD: preprocess_seqid_op: bad seqid (expected 8305, got 8306)

Comment 2 Louis Lagendijk 2008-12-27 12:54:03 UTC
With the latest kernel :
Linux travel.pheasant 2.6.27.9-159.fc10.x86_64 #1 SMP Tue Dec 16 14:47:52 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
the hangings disappeared. this may be related to the kernel update on my Centos server machine taht now runs:
Linux nest.pheasant 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:57:43 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

Comment 3 Steve Dickson 2008-12-29 15:25:37 UTC
So we can close this bug?

Comment 4 Louis Lagendijk 2008-12-29 15:59:05 UTC
The bug hit again today. A reboot solved the issue. I still do not know exactly how to trigger it. Here is an exempt from the Fedora 10 messages file:
Dec 29 10:27:13 travel ntpd[3052]: synchronized to 81.171.44.131, stratum 2
Dec 29 10:27:11 travel ntpd[3052]: time reset -1.518701 s
Dec 29 10:27:11 travel ntpd[3052]: kernel time sync status change 0001
Dec 29 10:30:51 travel ntpd[3052]: synchronized to 81.171.44.131, stratum 2
Dec 29 10:58:01 travel yum: Updated: gstreamer-plugins-ugly-0.10.10-1.fc10.x86_64
Dec 29 12:01:04 travel kernel: NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff880053409a28!
Dec 29 12:01:04 travel kernel: NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff880053409628!
Dec 29 12:01:04 travel kernel: NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff880053409a28!
Dec 29 12:01:04 travel kernel: NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff880053409628!
Dec 29 12:01:04 travel kernel: NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff880053409a28!
and then it goes on and on.....


and this is what the Centos logfile has:
Dec 29 11:48:27 nest nmbd[4557]:   Cannot get workgroup name.
Dec 29 11:56:25 nest kernel: NFSD: preprocess_seqid_op: bad seqid (expected 27286, got 27287)
Dec 29 11:56:25 nest kernel: NFSD: preprocess_seqid_op: bad seqid (expected 2, got 3)
Dec 29 12:01:02 nest xinetd[4160]: START: hotwayd pid=8583 from=10.0.0.1
it repeats the 

Dec 29 12:01:03 nest kernel: NFSD: preprocess_seqid_op: bad seqid (expected 27286, got 27287)

many times

Centos is now running the Xen kernel:
Linux nest.pheasant 2.6.18-92.1.22.el5xen #1 SMP Tue Dec 16 12:26:32 EST 2008 x86_64 x86_64 x86_64 GNU/Linux. Apart from this nothing changed

Comment 5 Louis Lagendijk 2009-08-03 18:54:11 UTC
I have for a long time been running a standard 2.6.28 kernel on my Centos box and did not have the problem any more. I now started the stock Centos kernel 2.6.18-128.2.1.el5xen (x86_64) to play around with Xen and the problem re-appeared. It appears to be a Centos/RHEL5 problem, not Fedora. Should I file a new BZ towards RHEL5?

Comment 6 Steve Dickson 2009-08-04 11:49:39 UTC
yes... Close this one and please open a RHEL5 bz...

Comment 7 Danilo Câmara 2009-08-06 20:29:44 UTC
Today I installed a Fedora 11 client in my network and noticed this bug. My server is also a NFSv4 with Kerberos:

Server CentOS: kernel-2.6.18-128.2.1.el5
Client Fedora: kernel-2.6.29.6-217.2.3.fc11.x86_64

Comment 8 Louis Lagendijk 2009-08-12 19:40:06 UTC
Closing this BZ. Have not seen the bug after installing dzickus 2.6.18-162.el5xen experimental RHEL kernel. Will report it on RHEL kernel only if the problem re-appears

Comment 9 Louis Lagendijk 2009-08-15 11:21:33 UTC
Created a BZ for RHEL5:
https://bugzilla.redhat.com/show_bug.cgi?id=517629