Description of problem: This bug is eeriely similar to #250266 . When running .6.24.3-29.el5rt.i386 on a dell-pe1950 machine, megasas driver gets corrupted after the machine does some work. You can make it happen with: cat /dev/sdX > /dev/null If the kernel is booted with noapic option, this doesn't happen. I am attaching portions of dmesg with the relevant messages. Version-Release number of selected component (if applicable): 2.6.24.3-29.el5rt.i386 How reproducible: Very Steps to Reproduce: 1. Boot with 2.6.24.3-29.el5rt.i386 kernel on a hardware that has megasas controller.. 2. run "cat /dev/sdX > /dev/null" 3. Actual results: Expected results: Additional info:
Created attachment 299054 [details] dmesg messages
Note to myself: port preempt-irqs-x86-64-ioapic-mask-quirk-jcm.patch to i386 and see if it fixes it.
Created attachment 299347 [details] ioapic quirk ported to i386 Straightforward port of preempt-irqs-x86-64-ioapic-mask-quirk-jcm.patch for 32-bit kernels. I'm testing on dell-pe1850-01 in RHTS (I could not find a pe1950). With 2.6.24.4-30.el5rt I was getting: megaraid: aborting-6695 cmd=2a <c=1 t=0 l=0> megaraid abort: 6695:13[255:128], fw owner megaraid: aborting-6696 cmd=2a <c=1 t=0 l=0> megaraid abort: 6696:9[255:128], fw owner megaraid: aborting-6699 cmd=2a <c=1 t=0 l=0> megaraid abort: 6699:0[255:128], fw owner megaraid: aborting-6700 cmd=28 <c=1 t=0 l=0> megaraid abort: 6700:18[255:128], fw owner megaraid: 4 outstanding commands. Max wait 300 sec megaraid mbox: Wait for 2 commands to complete:300 megaraid mbox: reset sequence completed sucessfully With the patch it hasn't happened yet. Scratch kernel build with the patch: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1231815
Verified this with 2.6.24.4-30.m1.el5rt.i386 kernel ...
Created attachment 302329 [details] ioapic quirk for i386 We still do not have the patch applied in CVS. Here's the patch against the current 2.6.24.4-41. jcm noticed that previously the ioapic quirks in arch/x86/kernel/quirks.c were used only on x86_64. This version of the patch removes the '#ifdef CONFIG_X64_64' so the automatic quirks are used on both archs. (Gurhan, I think setting this bug to VERIFIED after testing the scratch build was a mistake. In the usual BZ workflow it means the fix was applied and passed QA.)
(In reply to comment #5) > Created an attachment (id=302329) [edit] > ioapic quirk for i386 > > We still do not have the patch applied in CVS. Here's the patch against the > current 2.6.24.4-41. jcm noticed that previously the ioapic quirks in > arch/x86/kernel/quirks.c were used only on x86_64. This version of the patch > removes the '#ifdef CONFIG_X64_64' so the automatic quirks are used on both > archs. > > (Gurhan, I think setting this bug to VERIFIED after testing the scratch build > was a mistake. In the usual BZ workflow it means the fix was applied and passed > QA.) Eekk. Sorry about creating a confusion, I guess I still don't know small cornercases of BZ workflow such as this. I didn't realize this. Anyway, feel free to change the status of the bug and just let me know when it goes in main -rt tree and i'll test it.
It's now in kernel-rt-2.6.24.4-42.el5rt.
(In reply to comment #7) > It's now in kernel-rt-2.6.24.4-42.el5rt. Ok in that case, I'll just revive this bug. There are some issues with -42 , I can't even get this to be tested. Here are some backtraces from dmesg: BUG: scheduling with irqs disabled: auditd/0x00000000/12737 caller is rt_spin_lock_slowlock+0xcf/0x14f Pid: 12737, comm: auditd Not tainted 2.6.24.4-42.el5rt #1 [<c0633b59>] schedule+0x8e/0x114 [<c06343ef>] rt_spin_lock_slowlock+0xcf/0x14f [<c0634a65>] __rt_spin_lock+0x4c/0x4e [<c0634a6f>] rt_spin_lock+0x8/0xa [<c04770bf>] page_address+0x4d/0x80 [<c047730d>] kmap_high+0xf0/0x463 [<c0403226>] ? __switch_to+0xa3/0x125 [<c042a515>] ? finish_task_switch+0x29/0xc4 [<c042c6cd>] ? __wake_up+0x34/0x4f [<c0477191>] ? kunmap_high+0x9f/0xa1 [<c0424140>] ? kunmap+0x52/0x54 [<c0424180>] kmap+0x3e/0x49 [<c0421398>] kmap_atomic_func+0x12/0x15 [<c0423802>] gup_pte_range+0x4f/0x135 [<c0423a86>] fast_gup+0x19e/0x264 [<c0449c96>] get_futex_key+0x70/0xa2 [<c044b1e5>] do_futex+0x383/0xa4b [<c048e0bd>] ? do_readv_writev+0x16d/0x178 [<c044b994>] sys_futex+0xe7/0xfa [<c048e5d8>] ? sys_writev+0x58/0x8f [<c0404226>] syscall_call+0x7/0xb ======================= WARNING: at arch/x86/kernel/smp_32.c:580 native_smp_call_function_mask() Pid: 13593, comm: automount Not tainted 2.6.24.4-42.el5rt #1 [<c0419d61>] ? do_flush_tlb_all+0x0/0x3f [<c0419f4e>] native_smp_call_function_mask+0x4c/0x12a [<c0419d61>] ? do_flush_tlb_all+0x0/0x3f [<c0476f78>] ? __set_page_address+0x8b/0x95 [<c0419d61>] ? do_flush_tlb_all+0x0/0x3f [<c0419d61>] ? do_flush_tlb_all+0x0/0x3f [<c041b4e5>] smp_call_function+0x1e/0x22 [<c0434059>] on_each_cpu+0x24/0x5c [<c0419c75>] flush_tlb_all+0x1e/0x20 [<c04774ba>] kmap_high+0x29d/0x463 [<c0424180>] kmap+0x3e/0x49 [<c0421398>] kmap_atomic_func+0x12/0x15 [<c0423802>] gup_pte_range+0x4f/0x135 [<c047bb7e>] ? handle_mm_fault+0xb7a/0xbb3 [<c0423a86>] fast_gup+0x19e/0x264 [<c0449c96>] get_futex_key+0x70/0xa2 [<c044a302>] futex_wake+0x38/0xbe [<c044aee7>] do_futex+0x85/0xa4b [<c0493d0a>] ? pipe_write+0x36f/0x3c4 [<c046fcf8>] ? __pagevec_free+0x17/0x1e [<c048d9fb>] ? do_sync_write+0xc5/0x102 [<c044b994>] sys_futex+0xe7/0xfa [<c0468c99>] ? __delayacct_add_tsk+0x205/0x210 [<c042cdd0>] mm_release+0x84/0x8b [<c0430c01>] exit_mm+0x15/0xfd [<c0432128>] do_exit+0x213/0x716 [<c04070a8>] ? do_syscall_trace+0x14c/0x198 [<c04326bb>] complete_and_exit+0x0/0x16 [<c0404226>] syscall_call+0x7/0xb =======================
I opened a new bug 442595 for the BUG in -42, as it does not look related to the ioapic quirk.
so does your 32-bit ioapic quirk patch fix this bug? Clark
Gurhan, does -47 boot on the machine? And does the ioapic quirk help?
(In reply to comment #11) > Gurhan, does -47 boot on the machine? And does the ioapic quirk help? I've never tried -47 kernel on this machine. I'll be PTO until april 29th and be in the office on the 30th.. i doubt i'll get any chance to try this till then, especially given that that machine has no remote management console set up. Will update the BZ once i try it.
Update as a response to clark's email on rt list.. This issue is still open I didn't get a chance to try -47 kernel before the office move . The box is not hooked up in the new lab yet, i am hoping that it'll be today. Once it's online i'll test this one.
-47 kernel booted fine.. and i ran "cat /dev/sda > dev/null" 50 times without any damage.
closing