Bug 161868

Summary: get "nfs4_map_errors could not handle NFSv4 error 10026"
Product: Red Hat Enterprise Linux 4 Reporter: Andrew Schultz <ajschult784>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, jlayton, njoly, staubach, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-10 19:50:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages snippit from kernel panic none

Description Andrew Schultz 2005-06-27 22:28:33 UTC
After switching to NFS4, I've been getting "nfs4_map_errors could not handle
NFSv4 error 10026" in the message log a few times a day.  Google told me that
10026 is NFS4ERR_BAD_SEQID but is not forthcoming on what is actually wrong.

Server and client are both running kernel-smp-2.6.9-11.EL and nfs-utils-1.0.6-46.

Comment 1 Peter Staubach 2005-06-28 20:51:33 UTC
These messages should not be occuring because they indicate that an NFSv4
protocol error is being attempted to be passed up to the user level.  This
should not happen.  In this case, the error indicates that the client is
not handling sequence numbers correctly.

Is there any idea of what is happening when the message appears?  This
might give some clue to the code path which is not correct.

Comment 2 Andrew Schultz 2005-06-28 21:47:33 UTC
It seems to be Mozilla that causes the problems.  I'm not seeing the problem on
other (headless) worstation, only on my desktop.  I got the error once when I
received new mail and another time when I deleted a bunch of messages and
compacted a folder.  Unfortunately, Mozilla is probably enough of a beast that
that won't help narrowing it down too much.

Is there some way to get the kernel to be more verbose about the error?

Comment 3 Steve Dickson 2005-09-07 10:17:18 UTC
It appears the client and server state is getting out
of sync but its not clear why.... It sounds like your mail
folders are on an v4 mount, is this the case?

Comment 4 Andrew Schultz 2005-09-07 12:51:33 UTC
yes.

Comment 5 Need Real Name 2007-03-01 15:13:49 UTC
Any status on this? I am seeing this on a large system (200+ nodes) running
RHEL4.4 kernel rev 42.0.2.  Any tips would be appreciated.

Comment 6 Peter Staubach 2007-03-13 19:52:05 UTC
I could use some help in identifying a reproducible testcase, so, if
someone can identify a sequence of commands/whatever which will provoke
the messages, I would appreciate it.

Comment 7 Jeff Layton 2007-07-17 20:01:11 UTC
There are quite a few NFSv4 patches going into 4.6. Some of them may help this
situation. If you have a non-critical place to do so, please test the kernels on
my people page:

http://people.redhat.com/jlayton

and let me know if the issue is still reproducible.


Comment 8 Andrew Schultz 2007-08-10 18:36:38 UTC
I upgraded the kernel to 2.6.9-55.16.EL.jtltest.14 on the server and a couple
clients and switched back to NFS v4 with no problems.  I switched to NFS v4 on
my workstation (forgetting to upgrade the kernel) and I got a kernel panic
within a couple hours.  I upgraded the kernel and haven't any any problems in 4
days.

Comment 9 Jeff Layton 2007-08-10 19:31:45 UTC
Excellent. This is likely a duplicate of an existing bug, but without more info,
it's tough to know which. Do you happen to have the oops message from the kernel
panic? That may help us to determine the cause.




Comment 10 Andrew Schultz 2007-08-10 19:43:28 UTC
Created attachment 161077 [details]
/var/log/messages snippit from kernel panic

I got this after a bunch of messages like:
> nfs4_map_errors could not handle NFSv4 error 10026
> rpciod waiting on sync task!
> NFS: v4 server returned a bad sequence-id error!

Hopefully the call trace is helpful.  The rest doesn't make much sense to me. 
:)

Comment 11 Jeff Layton 2007-08-10 19:50:21 UTC
Thanks. Closing this as a duplicate...



*** This bug has been marked as a duplicate of 209419 ***

Comment 12 Jeff Layton 2007-08-10 19:55:39 UTC
Technically, I think this is really 2 separate issues. This is also related to
bug #228292, which deals with the "rpciod waiting on sync task" problem. The
panic you've seen is related to the bug that this was closed against. Both
problems should be addressed in 4.6.