Bug 235463
Summary: | Unplugging and replugging firewire devices kernel panics system | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jarod Wilson <jarod> |
Component: | kernel | Assignee: | Kristian Høgsberg <krh> |
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | bjohnson, fenlason, stefan-r-rhbz |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-03-04 17:12:18 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jarod Wilson
2007-04-05 22:02:44 UTC
Hrm... This might not be ppc-specific. I just unplugged my fw hd and cd-rw from an i686 box, plugged them back in and powered them back up. Kernel panic. 007-04-05 18:03:37 fw_sbp2: reconnected to unit fw1.0 (1 retries) 2007-04-05 18:03:39 fw_core: Unsolicited response (source ffc0, tlabel 1f) 2007-04-05 18:03:39 fw_core: Unsolicited response (source ffc0, tlabel 19) 2007-04-05 18:04:32 fw_core: phy config: card 0, new root=ffc2, gap_count=7 2007-04-05 18:04:32 fw_sbp2: management write failed, rcode 0x12 2007-04-05 18:04:32 fw_sbp2: reconnected to unit fw1.0 (1 retries) 2007-04-05 18:04:38 fw_core: phy config: card 0, new root=ffc2, gap_count=7 2007-04-05 18:04:38 fw_sbp2: management write failed, rcode 0x12 2007-04-05 18:04:38 fw_sbp2: reconnected to unit fw1.0 (1 retries) 2007-04-05 18:04:43 fw_core: Unsolicited response (source ffc0, tlabel 14) 2007-04-05 18:04:43 fw_core: Unsolicited response (source ffc0, tlabel 15) 2007-04-05 18:04:43 BUG: unable to handle kernel paging request at virtual address 6b6b6b6b 2007-04-05 18:04:43 printing eip: 2007-04-05 18:04:43 6b6b6b6b 2007-04-05 18:04:43 *pde = 00000000 2007-04-05 18:04:43 Oops: 0000 [#1] 2007-04-05 18:04:43 SMP 2007-04-05 18:04:43 last sysfs file: /block/sr2/size 2007-04-05 18:04:43 Modules linked in: fw_sbp2 radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod raid0 parport_pc lp parport serio_raw 3c59x mii pcspkr i2c_piix4 fw_ohci fw_core i2c_core ata_piix ata_generic libata sr_mod cdrom sg aic7xxx scsi_transport_spi sd_mod scsi_mod raid456 xor raid1 ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd 2007-04-05 18:04:43 CPU: 0 2007-04-05 18:04:43 EIP: 0060:[<6b6b6b6b>] Not tainted VLI 2007-04-05 18:04:43 EFLAGS: 00010283 (2.6.20-1.3045.fc7 #1) 2007-04-05 18:04:43 EIP is at 0x6b6b6b6b 2007-04-05 18:04:43 eax: ea876014 ebx: ea876014 ecx: 00000002 edx: f7f19000 2007-04-05 18:04:43 esi: f7f19000 edi: f6700ca0 ebp: c077efbc esp: c077efb0 2007-04-05 18:04:43 ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 2007-04-05 18:04:43 Process swapper (pid: 0, ti=c077e000 task=c06f24e0 task.ti=c0736000) 2007-04-05 18:04:43 Stack: f8c0e8c6 f6700c80 f7f19dac c077efd0 f8c0e034 f7f19dd4 00000000 00000008 2007-04-05 18:04:43 c077efe0 c042c16c 00000001 c072bb28 c077eff8 c042c06b 00000000 c0736f28 2007-04-05 18:04:43 c0736000 00000046 c0736f48 c0407420 2007-04-05 18:04:43 Call Trace: 2007-04-05 18:04:43 [<c04061e9>] show_trace_log_lvl+0x1a/0x2f 2007-04-05 18:04:43 [<c0406299>] show_stack_log_lvl+0x9b/0xa3 2007-04-05 18:04:43 [<c0406459>] show_registers+0x1b8/0x289 2007-04-05 18:04:43 [<c0406657>] die+0x12d/0x242 2007-04-05 18:04:43 [<c0617666>] do_page_fault+0x3ee/0x4ba 2007-04-05 18:04:43 [<c0615dfc>] error_code+0x7c/0x84 2007-04-05 18:04:43 [<f8c0e034>] context_tasklet+0x34/0x4a [fw_ohci] 2007-04-05 18:04:43 [<c042c16c>] tasklet_action+0x5b/0xc3 2007-04-05 18:04:43 [<c042c06b>] __do_softirq+0x6f/0xe2 2007-04-05 18:04:43 [<c0407420>] do_softirq+0x61/0xd0 2007-04-05 18:04:43 ======================= 2007-04-05 18:04:43 Code: Bad EIP value. 2007-04-05 18:04:43 EIP: [<6b6b6b6b>] 0x6b6b6b6b SS:ESP 0068:c077efb0 2007-04-05 18:04:43 Kernel panic - not syncing: Fatal exception in interrupt Looking at all the "unsolicited response" messages, it looks like you're hitting the bug I just fixed in rawhide. kernel-2.6.20-1.3051.fc7 has the fix, please give it a try when you get the time. I was re-starting DMA when it was already running in some cases, which can explain both the "unsolicted response" messages and the crash. Slightly better with 2.6.20-1.3054.fc7, in that upon replugging the devices and powering them up (cd-rw first), I do briefly see both devices being recognized, but I'm still getting a kernel panic (part of it never made it to the console though): 2007-04-10 13:17:42 fw_core: phy config: card 0, new root=ffc2, gap_count=7 2007-04-10 13:17:46 fw_core: phy config: card 0, new root=ffc2, gap_count=7 2007-04-10 13:17:47 fw_core: phy config: card 0, new root=ffc2, gap_count=7 2007-04-10 13:17:48 fw_core: created new fw device fw1 (3 config rom retries) 2007-04-10 13:17:48 fw_sbp2: logged in to sbp2 unit fw1.0 (0 retries) 2007-04-10 13:17:48 fw_sbp2: - management_agent_address: 0xfffff0030000 2007-04-10 13:17:48 fw_sbp2: - command_block_agent_address: 0xfffff0100000 2007-04-10 13:17:48 fw_sbp2: - status write address: 0x000100000000 2007-04-10 13:17:48 scsi 3:0:0:0: CD-ROM TEAC CD-W540E 1.0B PQ: 0 ANSI: 0 2007-04-10 13:17:50 sr2: scsi3-mmc drive: 48x/48x writer cd/rw xa/form2 cdda tray 2007-04-10 13:17:50 sr 3:0:0:0: Attached scsi generic sg4 type 5 2007-04-10 13:17:52 fw_core: giving up on config rom for node id ffc1 2007-04-10 13:17:59 fw_core: phy config: card 0, new root=ffc2, gap_count=7 2007-04-10 13:17:59 fw_sbp2: orb reply timed out, rcode=0x12 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 8 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 2 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 12 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 3 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 4 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 5 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4 2007-04-10 13:18:00 fw_sbp2: reconnected to unit fw1.0 (1 retries) 2007-04-10 13:18:14 fw_core: Unsolicited response (source ffc0, tlabel 8) 2007-04-10 13:18:14 fw_core: Unsolicited response (source ffc0, tlabel 9) 2007-04-10 13:18:14 BUG: unable to handle kernel paging request at virtual address 6b6b6b6b 2007-04-10 13:18:14 printing eip: 2007-04-10 13:18:14 f8c533f5 2007-04-10 13:18:14 *pde = 00000000 2007-04-10 13:18:14 Oops: 0000 [#1] 2007-04-10 13:18:14 SMP 2007-04-10 13:18:14 last sysfs file: /devices/pci0000:00/0000:00:00.0/irq 2007-04-10 13:18:14 Modules linked in: fw_sbp2 radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod raid0 parport_pc lp parport serio_raw ata_piix ata_generic fw_ohci pcspkr fw_core libata i2c_piix4 Sigh... taking out of modified. So to further clarify the issue... Things behave nicely if the drives aren't powered off when unplugging and replugging the firewire cables, but things go south in a hurry if the drives are powered off, unplugged, plugged back in and then powered back up. For posterity, here's the full trace captured the last go 'round: BUG: unable to handle kernel paging request at virtual address 6b6b6b6b printing eip: f8c863f5 *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:09.0/fw1/fw1.0/host23/target23:0:0/23:0:0:0/rev Modules linked in: fw_sbp2 radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod raid0 parport_pc lp parport serio_raw fw_ohci fw_core ata_piix i2c_piix4 pcspkr ata_generic 3c59x i2c_core libata mii sr_mod cdrom sg aic7xxx scsi_transport_spi sd_mod scsi_mod raid456 xor raid1 ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<f8c863f5>] Not tainted VLI EFLAGS: 00010002 (2.6.20-1.3054.fc7 #1) EIP is at fw_core_handle_response+0x95/0x108 [fw_core] eax: 0000ffc0 ebx: 6b6b6b5f ecx: ede1000c edx: 6b6b6b6b esi: 0000001e edi: f6333470 ebp: c077ff4c esp: c077ff1c ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Process swapper (pid: 0, ti=c077f000 task=c06f34e0 task.ti=c0737000) Stack: 00000296 f7e52218 00000000 c077ff58 f6333000 ffc27920 ffc06000 00000286 0000ffc0 f6333000 f5ddda18 f6333d84 c077ffa0 f8c429af f5ddda14 00000002 0000005a ffc27920 ffc06000 00000000 c1e68a90 0000000c f5ddda18 00000000 Call Trace: [<c04061e9>] show_trace_log_lvl+0x1a/0x2f [<c0406299>] show_stack_log_lvl+0x9b/0xa3 [<c0406459>] show_registers+0x1b8/0x289 [<c0406657>] die+0x12d/0x242 [<c0618026>] do_page_fault+0x3ee/0x4ba [<c06167bc>] error_code+0x7c/0x84 [<f8c429af>] handle_ar_packet+0xe0/0xeb [fw_ohci] [<f8c437e4>] ar_context_tasklet+0xb6/0xc4 [fw_ohci] [<c042c0a8>] tasklet_action+0x5b/0xc3 [<c042bfa7>] __do_softirq+0x6f/0xe2 [<c0407420>] do_softirq+0x61/0xd0 ======================= Code: 89 f8 8b 55 ec e8 0b 02 99 c7 8b 45 e4 c1 e8 04 83 e0 0f 83 f8 07 74 36 83 f8 0b 74 31 31 c9 31 d2 83 f8 06 75 32 eb 19 8d 5a f4 <8b> 53 0c 0f 18 02 90 8b 45 e0 8d 4b 0c 83 c0 20 39 c1 75 9e eb EIP: [<f8c863f5>] fw_core_handle_response+0x95/0x108 [fw_core] SS:ESP 0068:c077ff1c Kernel panic - not syncing: Fatal exception in interrupt BUG: warning at arch/i386/kernel/smp.c:549/smp_call_function() (Not tainted) [<c04061e9>] show_trace_log_lvl+0x1a/0x2f [<c04067ad>] show_trace+0x12/0x14 [<c0406831>] dump_stack+0x16/0x18 [<c0417baa>] smp_call_function+0x67/0xcd [<c0417c2e>] smp_send_stop+0x1e/0x31 [<c0427443>] panic+0x54/0x195 [<c0406737>] die+0x20d/0x242 [<c0618026>] do_page_fault+0x3ee/0x4ba [<c06167bc>] error_code+0x7c/0x84 [<f8c429af>] handle_ar_packet+0xe0/0xeb [fw_ohci] [<f8c437e4>] ar_context_tasklet+0xb6/0xc4 [fw_ohci] [<c042c0a8>] tasklet_action+0x5b/0xc3 [<c042bfa7>] __do_softirq+0x6f/0xe2 [<c0407420>] do_softirq+0x61/0xd0 ======================= Okay, I'm no longer seeing the kernel panic with 2.6.20-1.3079.fc7. I still have to unplug and replug the drives at least once after they're powered up to get them usable on the system, but this is definitely progress... Kristian, some (better) SBP-2 devices with extra PSU power their PHY from bus power when the PSU is switched off. That way they still act as a repeater when switched off but plugged in (provided another node powers the bus), and daisy-chained devices can still be accessed. When such an SBP-2 device is switched on, it powers up the link layer controller and spins up the disk, but doesn't make a config ROM available yet. After the disk is up and running and the LLC scanned it, it constructs and exposes the config ROM. (And usually, but not necessarily, it issues a bus reset to announce it to the bus. The SBP-2 spec says it doesn't have to, or actually /should not/, issue a bus reset if the ROM becomes available within 5s after power reset.) Perhaps fw-core's device discovery code ignores such devices for whatever reason. Jarod, is anything of this bug still relevant? (The still unfixed http://bugzilla.kernel.org/show_bug.cgi?id=9617 looks different.) Whoa, I filed this forever ago, forgot all about it... :) No, haven't seen the panic in ages, and teh unplug-replug dance in comment #6 I believe was remedied by the login orb config rom read timeout patch. |