Red Hat Bugzilla – Bug 426198
LTE: Drop into xmon on uli03 during stress test run
Last modified: 2008-08-01 12:49:16 EDT
NOTE: Please see https://bugzilla.redhat.com/show_bug.cgi?id=402581#c9 by Jeff.
This should be assigned to him (email@example.com). Thanks!
Miao Tao Feng <firstname.lastname@example.org> - 2007-12-18 22:10 EDT
After the stress testing(BASE, IO, TCP, NFS focus areas) ran for about 30
hours on uli03, the system droped into xmon and dmesg has a lots of error
messages "nfs4_reclaim_open_state: unhandled error -116. Zeroing state".
The kernel on uli03 is 2.6.9-68.1.EL.jtltest.25 and it tried to fix bug
37993. We got it from http://people.redhat.com/jlayton/ .
cpu 0x3: Vector: 300 (Data Access) at [c0000000d4e6fbd0]
pc: d0000000004f7e54: .nfs4_reclaim_open_state+0x144/0x184 [nfs]
lr: d0000000004f7e3c: .nfs4_reclaim_open_state+0x12c/0x184 [nfs]
current = 0xc0000001d33c35a0
paca = 0xc0000000003fc000
pid = 10886, comm = 188.8.131.52-rec
[c0000000d4e6fef0] d0000000004f8014 .reclaimer+0x180/0x2cc [nfs]
[c0000000d4e6ff90] c000000000018e48 .kernel_thread+0x4c/0x6c
R00 = 0000000000000000 R16 = 0000000000000000
R01 = c0000000d4e6fe50 R17 = 0000000000000000
R02 = d00000000052e3e8 R18 = 0000000000000000
R03 = 0000000000000040 R19 = 0000000000000000
R04 = 8000000000009032 R20 = 0000000000230000
R05 = 0000000000000000 R21 = 0000000000000000
R06 = 0000000000000080 R22 = 00000000001cb800
R07 = 0000000000000000 R23 = 0000000000000000
R08 = 0000000000000018 R24 = c0000000003fa800
R09 = c00000000043aec0 R25 = 0000000001300000
R10 = c00000000117bbd8 R26 = c0000001dd610b50
R11 = c00000000043aec0 R27 = c0000001dd610b00
R12 = 0000000044000028 R28 = c0000001dd62f860
R13 = c0000000003fc000 R29 = 0000000000100100
R14 = 0000000000000000 R30 = d00000000052b3c8
R15 = 0000000000000000 R31 = c000000129de86a0
pc = d0000000004f7e54 .nfs4_reclaim_open_state+0x144/0x184 [nfs]
lr = d0000000004f7e3c .nfs4_reclaim_open_state+0x12c/0x184 [nfs]
msr = 8000000000009032 cr = 24000024
ctr = c00000000005d9b8 xer = 000000000000000e trap = 300
Machine Type = IVM lpar of P6 blade (JS22)
Contact Information = Miao Tao Fengemail@example.com
------- Comment From firstname.lastname@example.org 2008-01-07 12:39 EDT-------
Jeff, any update on this?
------- Comment From email@example.com 2008-01-28 12:40 EDT-------
Since this was found on an unofficial kernel, and the parent bug 37993 is
marked for acceptance into a maintenance release, I am going to reject this for
now. When the maintenance release comes out, if this bug still shows up there,
we can reopen this bug.
-116 == -ESTALE
This was seen in context of the state recovery thread. Work generally gets
queued to that thread when the server returns an error and we need to "reset"
the open/lock state on the file. When we tried to recover the state here, we got
back -ESTALE, so the filehandle was no longer any good. It's possible that the
same issue that caused us to get an ESTALE when trying to recover the state was
what originally caused the state recovery attempt in the first place.
Did something happen on the server at or maybe a little while before this
What kind of server are you testing against, and what sort of tests were you
Do we have a coredump from this testing? If so that might be a way to gather a
bit more info about what sort of error we got back from the server...
No info in several months and IT is now closed. Closing BZ with resolution of