Bug 438878
Summary: | megaraid SAS driver corruption on 2.6.24.3-29.el5rt.i386 kernel | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Gurhan Ozen <gozen> | ||||||||
Component: | realtime-kernel | Assignee: | Michal Schmidt <mschmidt> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 1.0 | CC: | acme, bhu, gozen, jburke, jcm, mschmidt, pzijlstr | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | 2.6.24.7-47.el5rt | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-05-27 22:04:26 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Gurhan Ozen
2008-03-25 18:06:50 UTC
Created attachment 299054 [details]
dmesg messages
Note to myself: port preempt-irqs-x86-64-ioapic-mask-quirk-jcm.patch to i386 and see if it fixes it. Created attachment 299347 [details] ioapic quirk ported to i386 Straightforward port of preempt-irqs-x86-64-ioapic-mask-quirk-jcm.patch for 32-bit kernels. I'm testing on dell-pe1850-01 in RHTS (I could not find a pe1950). With 2.6.24.4-30.el5rt I was getting: megaraid: aborting-6695 cmd=2a <c=1 t=0 l=0> megaraid abort: 6695:13[255:128], fw owner megaraid: aborting-6696 cmd=2a <c=1 t=0 l=0> megaraid abort: 6696:9[255:128], fw owner megaraid: aborting-6699 cmd=2a <c=1 t=0 l=0> megaraid abort: 6699:0[255:128], fw owner megaraid: aborting-6700 cmd=28 <c=1 t=0 l=0> megaraid abort: 6700:18[255:128], fw owner megaraid: 4 outstanding commands. Max wait 300 sec megaraid mbox: Wait for 2 commands to complete:300 megaraid mbox: reset sequence completed sucessfully With the patch it hasn't happened yet. Scratch kernel build with the patch: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1231815 Verified this with 2.6.24.4-30.m1.el5rt.i386 kernel ... Created attachment 302329 [details]
ioapic quirk for i386
We still do not have the patch applied in CVS. Here's the patch against the
current 2.6.24.4-41. jcm noticed that previously the ioapic quirks in
arch/x86/kernel/quirks.c were used only on x86_64. This version of the patch
removes the '#ifdef CONFIG_X64_64' so the automatic quirks are used on both
archs.
(Gurhan, I think setting this bug to VERIFIED after testing the scratch build
was a mistake. In the usual BZ workflow it means the fix was applied and passed
QA.)
(In reply to comment #5) > Created an attachment (id=302329) [edit] > ioapic quirk for i386 > > We still do not have the patch applied in CVS. Here's the patch against the > current 2.6.24.4-41. jcm noticed that previously the ioapic quirks in > arch/x86/kernel/quirks.c were used only on x86_64. This version of the patch > removes the '#ifdef CONFIG_X64_64' so the automatic quirks are used on both > archs. > > (Gurhan, I think setting this bug to VERIFIED after testing the scratch build > was a mistake. In the usual BZ workflow it means the fix was applied and passed > QA.) Eekk. Sorry about creating a confusion, I guess I still don't know small cornercases of BZ workflow such as this. I didn't realize this. Anyway, feel free to change the status of the bug and just let me know when it goes in main -rt tree and i'll test it. It's now in kernel-rt-2.6.24.4-42.el5rt. (In reply to comment #7) > It's now in kernel-rt-2.6.24.4-42.el5rt. Ok in that case, I'll just revive this bug. There are some issues with -42 , I can't even get this to be tested. Here are some backtraces from dmesg: BUG: scheduling with irqs disabled: auditd/0x00000000/12737 caller is rt_spin_lock_slowlock+0xcf/0x14f Pid: 12737, comm: auditd Not tainted 2.6.24.4-42.el5rt #1 [<c0633b59>] schedule+0x8e/0x114 [<c06343ef>] rt_spin_lock_slowlock+0xcf/0x14f [<c0634a65>] __rt_spin_lock+0x4c/0x4e [<c0634a6f>] rt_spin_lock+0x8/0xa [<c04770bf>] page_address+0x4d/0x80 [<c047730d>] kmap_high+0xf0/0x463 [<c0403226>] ? __switch_to+0xa3/0x125 [<c042a515>] ? finish_task_switch+0x29/0xc4 [<c042c6cd>] ? __wake_up+0x34/0x4f [<c0477191>] ? kunmap_high+0x9f/0xa1 [<c0424140>] ? kunmap+0x52/0x54 [<c0424180>] kmap+0x3e/0x49 [<c0421398>] kmap_atomic_func+0x12/0x15 [<c0423802>] gup_pte_range+0x4f/0x135 [<c0423a86>] fast_gup+0x19e/0x264 [<c0449c96>] get_futex_key+0x70/0xa2 [<c044b1e5>] do_futex+0x383/0xa4b [<c048e0bd>] ? do_readv_writev+0x16d/0x178 [<c044b994>] sys_futex+0xe7/0xfa [<c048e5d8>] ? sys_writev+0x58/0x8f [<c0404226>] syscall_call+0x7/0xb ======================= WARNING: at arch/x86/kernel/smp_32.c:580 native_smp_call_function_mask() Pid: 13593, comm: automount Not tainted 2.6.24.4-42.el5rt #1 [<c0419d61>] ? do_flush_tlb_all+0x0/0x3f [<c0419f4e>] native_smp_call_function_mask+0x4c/0x12a [<c0419d61>] ? do_flush_tlb_all+0x0/0x3f [<c0476f78>] ? __set_page_address+0x8b/0x95 [<c0419d61>] ? do_flush_tlb_all+0x0/0x3f [<c0419d61>] ? do_flush_tlb_all+0x0/0x3f [<c041b4e5>] smp_call_function+0x1e/0x22 [<c0434059>] on_each_cpu+0x24/0x5c [<c0419c75>] flush_tlb_all+0x1e/0x20 [<c04774ba>] kmap_high+0x29d/0x463 [<c0424180>] kmap+0x3e/0x49 [<c0421398>] kmap_atomic_func+0x12/0x15 [<c0423802>] gup_pte_range+0x4f/0x135 [<c047bb7e>] ? handle_mm_fault+0xb7a/0xbb3 [<c0423a86>] fast_gup+0x19e/0x264 [<c0449c96>] get_futex_key+0x70/0xa2 [<c044a302>] futex_wake+0x38/0xbe [<c044aee7>] do_futex+0x85/0xa4b [<c0493d0a>] ? pipe_write+0x36f/0x3c4 [<c046fcf8>] ? __pagevec_free+0x17/0x1e [<c048d9fb>] ? do_sync_write+0xc5/0x102 [<c044b994>] sys_futex+0xe7/0xfa [<c0468c99>] ? __delayacct_add_tsk+0x205/0x210 [<c042cdd0>] mm_release+0x84/0x8b [<c0430c01>] exit_mm+0x15/0xfd [<c0432128>] do_exit+0x213/0x716 [<c04070a8>] ? do_syscall_trace+0x14c/0x198 [<c04326bb>] complete_and_exit+0x0/0x16 [<c0404226>] syscall_call+0x7/0xb ======================= I opened a new bug 442595 for the BUG in -42, as it does not look related to the ioapic quirk. so does your 32-bit ioapic quirk patch fix this bug? Clark Gurhan, does -47 boot on the machine? And does the ioapic quirk help? (In reply to comment #11) > Gurhan, does -47 boot on the machine? And does the ioapic quirk help? I've never tried -47 kernel on this machine. I'll be PTO until april 29th and be in the office on the 30th.. i doubt i'll get any chance to try this till then, especially given that that machine has no remote management console set up. Will update the BZ once i try it. Update as a response to clark's email on rt list.. This issue is still open I didn't get a chance to try -47 kernel before the office move . The box is not hooked up in the new lab yet, i am hoping that it'll be today. Once it's online i'll test this one. -47 kernel booted fine.. and i ran "cat /dev/sda > dev/null" 50 times without any damage. closing |