Red Hat Bugzilla – Bug 222152
stack overflow in do_IRQ under heavy disk load with HPT ide driver md lvm reiser fs
Last modified: 2008-02-07 23:24:12 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:126.96.36.199) Gecko/20061204 Firefox/188.8.131.52
Description of problem:
Using 4 160GB drives attached to a HighPoint Raid controller. The controller has 4 IDE devices all on the same IRQ. I used the primary channel on each.
Each disk has a single partition, and they are all joined into a raid 5 array.
The raid array was added to LVM as a single PV.
I created a 120GB logical volume and used mkreiserfs to put down the FS metadata. After mounting the file system, I tried to copy about 16GB of data to the new file system. The system locked up solid after about 3 to 4 GB was copied.
I found that by slowing down the copy by using compress/decompress in a pipeline allowed me to get the whole 16GB copied. I then tried copying the 16GB off. Same result, system locked up hard. I was able to copy the data off using the compress/decompress trick.
I searched the Internet and the driver source and found the idrX=serialize options. Using ide2=serialize ide3=serialize ide4=serialize ide5=serialize helped. Switching to the ext3 file system also helps.
I'm now using the ideX=serialize and ext3 file system. The system can still be made to crash, but I have to try a lot harder.
My suggestion is to have all devices on all ide interfaces being serviced by the hpt34x driver be in the same 'hwgroup'.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.create md raid 5 device (I had 4 disks on highpoint controler)
2.create volume group and a logical volume
3.mkreiserfs and mount it
4.copy huge amount of data to new file system
System locked up hard, needing hardware reset button to clear.
adding ide2=serialize ide3=serialize ide4=serialize ide5=serialize helped.
switching to ext3 also helps. Lockups are now few and far between.
I think this is a 4k stacks/do_IRQ stack overflow symptom. I have a similar
setup, with 4 750GB Seagate SATA drives connected to NV ports on Asus A8N-SLI
Premium. Propriatary NVIDIA driver uninstalled. Been booting into runlevel 3,
logging on the console and turning off screenblanking (setterm -blank 0
-powersave off). I was able to see only "do_IRQ: stack overflow: " followed by
496 or 504 or other various similar numbers. No stack dump was displayed,
system locked hard and needed reset. Only occurred under high I/O load, caused
locally or via NFS.
I was originally using XFS and found xfs maintainers admit RAID5 + LVM + XFS
causes stack overflows (XFS uses too much stack space).
Switched to Reiserfs and it was better, but still happens occasionally at the
Hate to say it, but a self-compiled 2.6.20-rc6 with reiserfs has been rock solid
for a week and I've pounded the box hard on occasion (syncing homedirs with
Unison was my latest repeatable crash method).
I am seeing similar problems with 2.6.19-1.2911 running xen0. Systems with 3
and 4 drives fail when moving large amounts of data on and off of reiserfs.
Sometimes the system will reboot and sometimes hang. Here is a typical message:
h-xxxx.easyco.net login: do_IRQ: stack overflow: 480
(XEN) (file=x86_emulate.c, line=1152) Cannot emulate 57
(XEN) domain_crash_sync called from entry.S (ff1611d9)
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.0.3-0-1.2911.fc6 x86_32p debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) EIP: 0061:[<c061b83f>]
(XEN) EFLAGS: 00010296 CONTEXT: guest
(XEN) eax: e93e8008 ebx: ed749190 ecx: 0000007b edx: 00000000
(XEN) esi: c0684b54 edi: c061b83e ebp: 00000011 esp: e93e8000
(XEN) cr0: 8005003b cr4: 000006f0 cr3: 13633000 cr2: e93e7ffc
(XEN) ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0069 cs: 0061
(XEN) Guest stack trace from esp=e93e8000:
(XEN) Stack empty.
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
(XEN) AMD SVM Extension is disabled.
The crash messages vary. When in XEN, this seems to be the most details.
Sometimes when not running XEN, I will get full stack dumps. Sometimes it loops
forever giving stack dumps. Sometimes it only display the stack overflow line.
Sometimes it displays nothing. This is from a serial console, so perhaps it is
too deep in the IRQ handler to keep the serial alive.
I have recompiled unchecking the 4K stacks, and this seems to help a lot.
Perhaps 4K stacks is a dangerous default.
(This is a mass-update to all current FC6 kernel bugs in NEW state)
I'm reviewing this bug list as part of the kernel bug triage project, an attempt
to isolate current bugs in the Fedora kernel.
I am CC'ing myself to this bug, however this version of Fedora is no longer
Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.
Thanks for using Fedora!
Per the previous comment in this bug, I am closing it as INSUFFICIENT_DATA,
since no information has been lodged for over 30 days.
Please re-open this bug or file a new one if you can provide the requested data,
and thanks for filing the original report!