Bug 45207
Summary: | Megaraid module will not load after unloading on seawolf IA64 gold | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Matt Domsch <matt_domsch> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED RAWHIDE | QA Contact: | Brock Organ <borgan> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | afom_m, dale_kaisner, john_hull, matt_domsch, michael_e_brown, peterj |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | ia64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2001-11-15 21:44:59 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Matt Domsch
2001-06-20 20:34:13 UTC
AMI suspects a hardware problem with their card in this case, per the megaraid 1.17 driver release notes. They're continuing their investigation. This defect considered MUST-FIX for Fairfax gold-release. I'm wondering if it's at all related to the sometimes-seen qla12160 controller lock-up on IA-64 systems. If you repeatedly reboot the system and start Linux, the qla12160 driver sometimes fails to properly initialize the qla12160, and the system hangs. I've asked AMI to include this thought in their investigation. Maybe we can kill two birds with one stone. Reproduced on pe7150 running 2.4.7-0.3 kernel, machine bios X15, using a Perc3DC (taos) raid controller. Reproduced on Fairfax RC1 using PERC3/QC. *** Bug 52312 has been marked as a duplicate of this bug. *** This issue still exists with Fairfax RC1 on a systems with 2GB and 16GB of RAM. They have reported that this succeeds on 2.4.7 + ia64 patch + latest megaraid, but fails on our kernel. This should be fixed in 2.4.7-6.1 or later. Please confirm. This issue still exists with the recent RC1 (kernel 2.4.9-0.12 and 2.4.9-0.18). Per Clay Cooper: Still broken with 2.4.9-9.1. I loaded....unloaded......then loaded and then the system hung. I reproduced this with 2.4.9-13dell2smp, which is -13 + megaraid 1.18 + kdb patch, and made some progress. Here's a backtrace. It appears to be getting stuck waiting on mbox->status I think. 0xa000000000327c20 [megaraid]megaIssueCmd+0x580) args (0xe000000fee7600c0, 0xe000000feead0050, 0x0, 0xe000000fee760650, megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570) args (0xe000000fee7600c0, 0xff, 0xe000000fee777c98, 0xe000000fee777c99, megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570) args (0x2000000, 0xa000000000335054, 0x0, 0xa000000000335f10, 0xa000000 megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570) args (0x39, 0x0, 0x3, 0x0, 0x0) megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570 args (0x713, 0xa000000000334a28, 0x101e, 0x1960, 0xa000000) megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570 args (0x1ddd5a, 0xa000000000334ae0, 0x0, 0xe0000001022515a0, 0x0) megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570 args (0xc9e, 0x1, 0xa000000000334a28, 0xa000000000330f90, 0x206) megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570) args (0xe000000fee777e68, 0xa000000000324034,0xa000000000324010, 0x8, megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570) args (0x20000000002bb0b0, 0x20000000002bb0b8, 0x0, 0x2000000000136090, megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570) args (0x0, 0x400000000000a7d0, 0xc000000000000c1e, 0x6000000000023510, megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 0xa000000000327c10 [megaraid]megaIssueCmd+0x570) args (0x4000000000013610, 0xc000000000000a98, 0x6000000000023510, 0x600 megaraid .text 0xa0000000003240c0 0xa0000000003276a0 0xa000000000328060 So, we're looping for a while, then (maybe) getting past it? [0]kdb> id 0xa000000000327c10 0xa000000000327c10 megaIssueCmd+0x570 [MFI] mov r1=r37 0xa000000000327c16 megaIssueCmd+0x576 nop.f 0x0 0xa000000000327c1c megaIssueCmd+0x57c mov r15=255 0xa000000000327c20 megaIssueCmd+0x580 [MMI] ld1.acq r14=[r36];; 0xa000000000327c26 megaIssueCmd+0x586 cmp4.eq p6,p7=r15,r14 0xa000000000327c2c megaIssueCmd+0x58c nop.i 0x0 0xa000000000327c30 megaIssueCmd+0x590 [MFB] nop.m 0x0 0xa000000000327c36 megaIssueCmd+0x596 nop.f 0x0 0xa000000000327c3c megaIssueCmd+0x59c (p06) br.cond.dpnt.few This corresponds to this code in megaraid.s: .loc 1 2355 0 br.call.sptk.many b0 = WRINDOOR# ;; mov r1 = r37 .loc 1 2357 0 addl r15 = 255, r0 .L3046: ld1.acq r14 = [r36] ;; cmp4.eq p6, p7 = r15, r14 (p6) br.cond.dpnt .L3046 .loc 1 2358 0 mov r15 = r39 addl r16 = 255, r0 ;; which is line 2357 in megaraid.c v1.18: mbox->numstatus = 0xFF; mbox->status = 0xFF; WRINDOOR (megaCfg, phys_mbox | 0x1); while (mbox->numstatus == 0xFF) ; while (mbox->status == 0xFF) ; while (mbox->mraid_poll != 0x77) ; So, for some reason, either numstatus or status isn't getting updated (the card doesn't respond). Likely it's not getting reset properly when the module is unloaded on IA-64. >while (mbox->numstatus == 0xFF) ;
>while (mbox->status == 0xFF) ;
>while (mbox->mraid_poll != 0x77) ;
I'm suprised that Intel hasn't had a conniption about these lines
already; Intel says that busy waits like that will damage their
fragile PIV chips...
We have a potential fix (mark those structure members volatile and put in cpu_relax to make Intel happy) and it will be in 2.4.9-13.4 when you get it, please test. 2.4.9-13.4 is now available at ftp://ftp.beta.redhat.com/pub/testing/kernel/ Please give it a whirl! works with 2.4.9-13.4smp |