Bug 121804 - (NFSv4) Kernel panic; do_IRQ: stack overflow:360
Summary: (NFSv4) Kernel panic; do_IRQ: stack overflow:360
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-04-27 22:51 UTC by Carl-Johan Kjellander
Modified: 2007-11-30 22:10 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-05-17 11:03:21 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Stack overflow call trace, showing fault in NFS->IP path (15.02 KB, text/plain)
2004-05-01 05:37 UTC, Paul Jakma
no flags Details
Tail of output of continiously repeating oops (25.32 KB, text/plain)
2004-05-01 05:42 UTC, Paul Jakma
no flags Details

Description Carl-Johan Kjellander 2004-04-27 22:51:22 UTC
Description of problem:
Got a crash today:

do_IRQ: stack overflow:360

call trace:
  [<02107aa0>]  do_IRQ+0x46/0x225
  [<022111b7>]

Nothing else.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.5-1.327 on a UP machine running with
nmi_watchdog=1, with lvm2, xfs and NFSv4.

Comment 1 Paul Jakma 2004-05-01 05:37:45 UTC
Created attachment 99855 [details]
Stack overflow call trace, showing fault in NFS->IP path

Call trace/oops for case where machine did not repeatedly fault. NFS mount is
TCP, but problem occurs with UDP mount too.

Comment 2 Paul Jakma 2004-05-01 05:42:01 UTC
Created attachment 99856 [details]
Tail of output of continiously repeating oops

This is the tail of what was in my scroll buffer for the case where the machine
faulted, i think similarly to other trace (i saw ip_finish and sock_sendmsg
scroll by), and then repeatedly and continiously faulted trying to print oops,
until i power cycled.

Comment 3 Paul Jakma 2004-05-01 05:44:23 UTC
I have a similar problem with  kernel-2.6.5-1.322 and
kernel-2.6.5-1.343. I can reliably trigger a stack overflow by running
apt-get dist-upgrade, as apt reads in its package lists - located on
an NFS mount, the kernel will reliably blow up with a stack overflow.
AFAICT somewhere down the nfs_writepage -> ... -> inet_sendmsg -> ...
-> ip_finish_output path each time, however I cant say for 100% sure
as I have only managed to capture one oops from the beginning (see
below) because 9/10 times the kernel ends up oopsing repeatedly over
and over again with show_registers ... do_fault ...  show_registers
(etc.) in the stack traces. it scrolls very quickly - the
trace I have where it did not repeatedly oops, appears to be similar
to what I see as the first oops on screen for these repeating cases, 
but i cant say for sure, it scrolls too fast. 

See attached for what was left in my scroll buffer for the case of a
repeating fault, where I did not manage to catch the beginning of the
series of faults. Also attached is the case where it did not
repeatedly fault, showing what probably is similar to the initial
fault for the repeating cases.

Initially I had the /var/cache/apt NFS mount mounted over TCP, and the
trace is with TCP mount, but the problem occurs with UDP mounts too. I
have also tried booting with acpi=off and vdso=0, just to rule out
those out - problem still occurs.

Problem does not occur with older kernel versions, eg 
kernel-2.6.3-1.96.

The machine is a Compaq Deskpro 4000, Pentium 233MMX, 192MB RAM, 200M
swap on local disk, ThunderLAN onboard NIC, with the last available
ROMPaq (BIOS) from Compaq for the series of Deskpro installed.




Comment 4 Paul Jakma 2004-05-06 15:36:43 UTC
Is there any chance an 8kB stack version of the kernel could be made
available? Or at least that the config option for 8kB stack could be
reinstated so that I could build such a kernel from the src rpm?

Comment 5 Arjan van de Ven 2004-05-07 06:51:14 UTC
NFSv4 has a known big stack abuser; Dave Jones has a fix pending fo rthat.

Comment 6 Paul Jakma 2004-05-16 23:46:01 UTC
The newer 2.6.5-1.358 kernel appears to have resolved this issue. Thanks!


Note You need to log in before you can comment on or make changes to this bug.