Bug 248042
Summary: | unloading fw_ohci causes kernel panic (firewall_ohci in 2.6.22) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Colin <bugzilla> |
Component: | kernel | Assignee: | Jay Fenlason <fenlason> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7 | CC: | chris.brown, jfeeney, krh, stefan-r-rhbz, zaitcev |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-01-04 00:29:02 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Colin
2007-07-12 19:49:36 UTC
I forgot to note that this might be related to bug #246256 which talks about fw_ohci causing a kernel panic on resuming from a suspended state. No stack traces there to indicate the problem though, but they seem to talk about a kernel panic. Hello Colin, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? The bug you mention was resolved in a recent kernel update so this could also be the case for your issue. If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. Cheers Chris Hi Chris, This continues to be a problem with the latest Fedora 7 kernel RPM (2.6.22.5-76.fc7.x86_64.rpm). In 2.6.22 the modules in question are now called firewire_sbp2 firewire_ohci firewire_core Unloading them in this order causes no problems. Unloading firewire_core while the other two remain causes the expected "module is in use" error and no crash. But if firewire_ohci is unloaded while firewire_sbp2 is loaded then it crashes. One observation which may/may not help is that it does not crash if you unload firewire_ohci straight after boot. In this case it simply unloaded. I therefore tried mounting and accessing the disk and then unmounting it and trying to unload the module. This caused a crash. My guess is that accessing the disk causes something to be loaded which isn't otherwise? After some head-banging with Hyperterminal, I realised I could use PuTTY to get my kernel panic without it being mangled - for anyone reading this I'd thoroughly recommend this instead of the mangled nonsense HT outputs - and so here it is. The first part of this suggests that firewire_ohci is trying to remove firewire_sbp2 as part of its unloading and this is what is messing up. How this works on the command line but not from a LKM calling the same routine I don't know, but this seems to be the problem area: nike# rmmod firewire_ohci firewire_sbp2: management write failed, rcode 0xffffffed sd 8:0:0:0: [sdg] Synchronizing SCSI cache firewire_sbp2: removed sbp2 unit fw1.0 firewire_ohci: Removed fw-ohci device. nike# general protection fault: 0000 [1] SMP last sysfs file: /block/sdg/sdg1/dev CPU 0 Modules linked in: firewire_sbp2 firewire_core ipv6 nf_conntrack_ftp ipt_owner ipt_LOG xt_limit ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables x_tables cpufreq_ondemand dock crc_itu_t rtc_cmos k8temp hwmon ac97_bus forcedeth snd_timer snd soundcore snd_page_alloc i2c_nforce2 i2c_core sr_mod cdrom joydev sg dm_snapshot dm_zero dm_mirror dm_mod usb_storage pata_amd sata_nv libata sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Not tainted 2.6.22.5-76.fc7 #1 RIP: 0010:[<ffffffff8103b7e0>] [<ffffffff8103b7e0>] run_timer_softirq+0x159/0x1d1 RSP: 0018:ffffffff81476f00 EFLAGS: 00010282 RAX: ffffffff81419fd8 RBX: 313220350000302e RCX: 3177666632785c31 RDX: ffffffff81476f00 RSI: 0000000030203020 RDI: 313220350000302e RBP: 0000000000000100 R08: ffff810074afb070 R09: 000000000000000a R10: ffffffff81365640 R11: ffff810077fce910 R12: ffffffff814b9c00 R13: 3177666632785c31 R14: ffffffff81447300 R15: 0000000000000000 FS: 00002aaaab67b710(0000) GS:ffffffff813ae000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002aaaab8860a0 CR3: 0000000000201000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff81418000, task ffffffff81365640) Stack: ffffffff81476f00 ffffffff81476f00 ffff810002355880 0000000000000001 ffffffff813b4110 000000000000000a 0000000000000000 ffffffff81038f33 ffffffff81476f38 0000000000000046 ffffffff81476f78 0000000000000000 Call Trace: <IRQ> [<ffffffff81038f33>] __do_softirq+0x55/0xc3 [<ffffffff8100acec>] call_softirq+0x1c/0x28 [<ffffffff8100be11>] do_softirq+0x2c/0x85 [<ffffffff81019c0f>] smp_apic_timer_interrupt+0x48/0x5d [<ffffffff81008d8c>] default_idle+0x0/0x3d [<ffffffff8100a796>] apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff81008db5>] default_idle+0x29/0x3d [<ffffffff81008e55>] cpu_idle+0x8c/0xaf [<ffffffff81423809>] start_kernel+0x2ca/0x2d6 [<ffffffff81423140>] _sinittext+0x140/0x144 Code: 41 ff d5 65 48 8b 04 25 10 00 00 00 3b a8 44 e0 ff ff 74 1d RIP [<ffffffff8103b7e0>] run_timer_softirq+0x159/0x1d1 RSP <ffffffff81476f00> Kernel panic - not syncing: Aiee, killing interrupt handler! ----------------------------------------------------------- Regards, Colin. Okay, thanks for the additional information Colin, thats some good debugging. I'm re-assigning this to the firewire subsystem maintainer and they may be able to shed further light on the problem. Cheers Chris I forgot: Upstream bug is fixed in kernel 2.6.23-rc4. I wrote in comment #5: > Upstream bug is fixed in kernel 2.6.23-rc4. Well, the part of thge upstream bug which was described here has been fixed. Colin wrote: > if the module is in use the expected results should be that it > says the module is in use and doesn't unload it. fw-sbp2 does not use fw-ohci. It only indirectly requires its presence and functioning in order to stay connected with SBP-2 devices. The old sbp2 driver contains a hack which increases the use count of a card driver module as soon as it logs in to a device behind the respective card (and decreases the use count if it logs out or is otherwise disconnected). I added that hack because the old IEEE1394 driver stack has two drivers, video1394 and dv1394, which use symbols of ohci1394 and hence increase and decrease ohci1394's use count when loaded and unloaded. So, if somebody had an SBP-2 disk mounted and unloaded dv1394, ohci1394 was unloaded without that hack and the connection to the disk was lost. We don't need this hack for the new driver stack because there is no driver, and never will be, which uses symbols of fw-ohci. Of course people can shoot themselves in the foot by unloading fw-ohci while they still got a filesystem on an SBP-2 disk mounted. (Would panic before 2.6.23-rc4, will "only" cause connection loss and thus possible filesystem corruption since 2.6.23-rc4.) But while there may be reasons to unload video1394 or dv1394 while sbp2 is active, there are hardly reasons to unload fw-ohci while fw-sbp2 is active. Best would be though if drivers/scsi/scsi.c::scsi_device_get() and scsi_device_put() would be expanded to call into hooks provided by SCSI lowlevel drivers. Then fw-sbp2 could get and put the card driver module when the scsi_device of an SBP-2 device behind the cart is being _get() and _put(), e.g. if a filesystem on it is mounted and unmounted. Colin wrote in comment #3: > The first part of this suggests that firewire_ohci is trying to remove > firewire_sbp2 as part of its unloading [...] > nike# rmmod firewire_ohci > firewire_sbp2: management write failed, rcode 0xffffffed > sd 8:0:0:0: [sdg] Synchronizing SCSI cache > firewire_sbp2: removed sbp2 unit fw1.0 > firewire_ohci: Removed fw-ohci device. No, firewire-ohci knows nothing of firewire-sbp2. When firewire-ohci is unloaded, it first tells firewire-core to shut down all cards which firewire-ohci services, and firewire-core therefore shuts down all devices on that card. It does a quick shutdown though which doesn't give scsi-highlevel and firewire-sbp2 any chance anymore to perform shutdown procedures (synchronize cache, log out). The panic after that happened because firewire-core forgot to remove a card-related timer before letting firewire-ohci proceed to remove the card's data structure. I suppose the fix (upstream commit 8a2d9ed3210464d22fccb9834970629c1c36fa36 "firewire: fix unloading of fw-ohci while devices are attached") made it into one or another Fedora kernel by now. Could a kernel package maintainer or the reporter have a look? Hi Stefan, Yes, its in current 2.6.23 based kernel. I'm pretty sure it won't be backported as F-7 is also running 2.6.23 and previous Fedora releases are EOL'd. As we haven't heard anything from the original reporter for three months, I'm closing this INSUFFICIENT_DATA. Please re-open if required... Cheers Chris > I'm pretty sure it won't be backported as F-7 is also running
> 2.6.23 and previous Fedora releases are EOL'd.
Versions before Fedora 7 are not affected.
|