Red Hat Bugzilla – Bug 121804
(NFSv4) Kernel panic; do_IRQ: stack overflow:360
Last modified: 2007-11-30 17:10:41 EST
Description of problem:
Got a crash today:
do_IRQ: stack overflow:360
Version-Release number of selected component (if applicable):
kernel-smp-2.6.5-1.327 on a UP machine running with
nmi_watchdog=1, with lvm2, xfs and NFSv4.
Created attachment 99855 [details]
Stack overflow call trace, showing fault in NFS->IP path
Call trace/oops for case where machine did not repeatedly fault. NFS mount is
TCP, but problem occurs with UDP mount too.
Created attachment 99856 [details]
Tail of output of continiously repeating oops
This is the tail of what was in my scroll buffer for the case where the machine
faulted, i think similarly to other trace (i saw ip_finish and sock_sendmsg
scroll by), and then repeatedly and continiously faulted trying to print oops,
until i power cycled.
I have a similar problem with kernel-2.6.5-1.322 and
kernel-2.6.5-1.343. I can reliably trigger a stack overflow by running
apt-get dist-upgrade, as apt reads in its package lists - located on
an NFS mount, the kernel will reliably blow up with a stack overflow.
AFAICT somewhere down the nfs_writepage -> ... -> inet_sendmsg -> ...
-> ip_finish_output path each time, however I cant say for 100% sure
as I have only managed to capture one oops from the beginning (see
below) because 9/10 times the kernel ends up oopsing repeatedly over
and over again with show_registers ... do_fault ... show_registers
(etc.) in the stack traces. it scrolls very quickly - the
trace I have where it did not repeatedly oops, appears to be similar
to what I see as the first oops on screen for these repeating cases,
but i cant say for sure, it scrolls too fast.
See attached for what was left in my scroll buffer for the case of a
repeating fault, where I did not manage to catch the beginning of the
series of faults. Also attached is the case where it did not
repeatedly fault, showing what probably is similar to the initial
fault for the repeating cases.
Initially I had the /var/cache/apt NFS mount mounted over TCP, and the
trace is with TCP mount, but the problem occurs with UDP mounts too. I
have also tried booting with acpi=off and vdso=0, just to rule out
those out - problem still occurs.
Problem does not occur with older kernel versions, eg
The machine is a Compaq Deskpro 4000, Pentium 233MMX, 192MB RAM, 200M
swap on local disk, ThunderLAN onboard NIC, with the last available
ROMPaq (BIOS) from Compaq for the series of Deskpro installed.
Is there any chance an 8kB stack version of the kernel could be made
available? Or at least that the config option for 8kB stack could be
reinstated so that I could build such a kernel from the src rpm?
NFSv4 has a known big stack abuser; Dave Jones has a fix pending fo rthat.
The newer 2.6.5-1.358 kernel appears to have resolved this issue. Thanks!