Bug 235463

Summary: Unplugging and replugging firewire devices kernel panics system
Product: [Fedora] Fedora Reporter: Jarod Wilson <jarod>
Component: kernelAssignee: Kristian Høgsberg <krh>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: bjohnson, fenlason, stefan-r-rhbz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-04 17:12:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jarod Wilson 2007-04-05 22:02:44 UTC
Description of problem:
For starters, my firewire devices aren't brought up at all by the firewire
stack, using the onboard firewire port on a PowerBook G4. I unplugged the
devices and then plugged them back in to see if they'd maybe do something
different the second time, and I wound up seeing xmon for the first time ever on
my PowerBook. :)

Version-Release number of selected component (if applicable):
kernel-2.6.20-1.2945.fc7.ppc

Comment 1 Jarod Wilson 2007-04-05 22:06:54 UTC
Hrm... This might not be ppc-specific. I just unplugged my fw hd and cd-rw from
an i686 box, plugged them back in and powered them back up. Kernel panic.
007-04-05 18:03:37 fw_sbp2: reconnected to unit fw1.0 (1 retries)
2007-04-05 18:03:39 fw_core: Unsolicited response (source ffc0, tlabel 1f)
2007-04-05 18:03:39 fw_core: Unsolicited response (source ffc0, tlabel 19)
2007-04-05 18:04:32 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-05 18:04:32 fw_sbp2: management write failed, rcode 0x12
2007-04-05 18:04:32 fw_sbp2: reconnected to unit fw1.0 (1 retries)
2007-04-05 18:04:38 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-05 18:04:38 fw_sbp2: management write failed, rcode 0x12
2007-04-05 18:04:38 fw_sbp2: reconnected to unit fw1.0 (1 retries)
2007-04-05 18:04:43 fw_core: Unsolicited response (source ffc0, tlabel 14)
2007-04-05 18:04:43 fw_core: Unsolicited response (source ffc0, tlabel 15)
2007-04-05 18:04:43 BUG: unable to handle kernel paging request at virtual
address 6b6b6b6b
2007-04-05 18:04:43  printing eip:
2007-04-05 18:04:43 6b6b6b6b
2007-04-05 18:04:43 *pde = 00000000
2007-04-05 18:04:43 Oops: 0000 [#1]
2007-04-05 18:04:43 SMP 
2007-04-05 18:04:43 last sysfs file: /block/sr2/size
2007-04-05 18:04:43 Modules linked in: fw_sbp2 radeon drm autofs4 hidp rfcomm
l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4
xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp
ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod raid0
parport_pc lp parport serio_raw 3c59x mii pcspkr i2c_piix4 fw_ohci fw_core
i2c_core ata_piix ata_generic libata sr_mod cdrom sg aic7xxx scsi_transport_spi
sd_mod scsi_mod raid456 xor raid1 ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
2007-04-05 18:04:43 CPU:    0
2007-04-05 18:04:43 EIP:    0060:[<6b6b6b6b>]    Not tainted VLI
2007-04-05 18:04:43 EFLAGS: 00010283   (2.6.20-1.3045.fc7 #1)
2007-04-05 18:04:43 EIP is at 0x6b6b6b6b
2007-04-05 18:04:43 eax: ea876014   ebx: ea876014   ecx: 00000002   edx: f7f19000
2007-04-05 18:04:43 esi: f7f19000   edi: f6700ca0   ebp: c077efbc   esp: c077efb0
2007-04-05 18:04:43 ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
2007-04-05 18:04:43 Process swapper (pid: 0, ti=c077e000 task=c06f24e0
task.ti=c0736000)
2007-04-05 18:04:43 Stack: f8c0e8c6 f6700c80 f7f19dac c077efd0 f8c0e034 f7f19dd4
00000000 00000008 
2007-04-05 18:04:43        c077efe0 c042c16c 00000001 c072bb28 c077eff8 c042c06b
00000000 c0736f28 
2007-04-05 18:04:43        c0736000 00000046 c0736f48 c0407420 
2007-04-05 18:04:43 Call Trace:
2007-04-05 18:04:43  [<c04061e9>] show_trace_log_lvl+0x1a/0x2f
2007-04-05 18:04:43  [<c0406299>] show_stack_log_lvl+0x9b/0xa3
2007-04-05 18:04:43  [<c0406459>] show_registers+0x1b8/0x289
2007-04-05 18:04:43  [<c0406657>] die+0x12d/0x242
2007-04-05 18:04:43  [<c0617666>] do_page_fault+0x3ee/0x4ba
2007-04-05 18:04:43  [<c0615dfc>] error_code+0x7c/0x84
2007-04-05 18:04:43  [<f8c0e034>] context_tasklet+0x34/0x4a [fw_ohci]
2007-04-05 18:04:43  [<c042c16c>] tasklet_action+0x5b/0xc3
2007-04-05 18:04:43  [<c042c06b>] __do_softirq+0x6f/0xe2
2007-04-05 18:04:43  [<c0407420>] do_softirq+0x61/0xd0
2007-04-05 18:04:43  =======================
2007-04-05 18:04:43 Code:  Bad EIP value.
2007-04-05 18:04:43 EIP: [<6b6b6b6b>] 0x6b6b6b6b SS:ESP 0068:c077efb0
2007-04-05 18:04:43 Kernel panic - not syncing: Fatal exception in interrupt


Comment 2 Kristian Høgsberg 2007-04-06 22:56:14 UTC
Looking at all the "unsolicited response" messages, it looks like you're hitting
the bug I just fixed in rawhide.  kernel-2.6.20-1.3051.fc7 has the fix, please
give it a try when you get the time.  I was re-starting DMA when it was already
running in some cases, which can explain both the "unsolicted response" messages
and the crash.


Comment 3 Jarod Wilson 2007-04-10 17:25:53 UTC
Slightly better with 2.6.20-1.3054.fc7, in that upon replugging the devices and
powering them up (cd-rw first), I do briefly see both devices being recognized,
but I'm still getting a kernel panic (part of it never made it to the console
though):

2007-04-10 13:17:42 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-10 13:17:46 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-10 13:17:47 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-10 13:17:48 fw_core: created new fw device fw1 (3 config rom retries)
2007-04-10 13:17:48 fw_sbp2: logged in to sbp2 unit fw1.0 (0 retries)
2007-04-10 13:17:48 fw_sbp2:  - management_agent_address:    0xfffff0030000
2007-04-10 13:17:48 fw_sbp2:  - command_block_agent_address: 0xfffff0100000
2007-04-10 13:17:48 fw_sbp2:  - status write address:        0x000100000000
2007-04-10 13:17:48 scsi 3:0:0:0: CD-ROM            TEAC     CD-W540E        
1.0B PQ: 0 ANSI: 0
2007-04-10 13:17:50 sr2: scsi3-mmc drive: 48x/48x writer cd/rw xa/form2 cdda tray
2007-04-10 13:17:50 sr 3:0:0:0: Attached scsi generic sg4 type 5
2007-04-10 13:17:52 fw_core: giving up on config rom for node id ffc1
2007-04-10 13:17:59 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-10 13:17:59 fw_sbp2: orb reply timed out, rcode=0x12
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 8
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 2
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 12
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 3
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 5
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 fw_sbp2: reconnected to unit fw1.0 (1 retries)
2007-04-10 13:18:14 fw_core: Unsolicited response (source ffc0, tlabel 8)
2007-04-10 13:18:14 fw_core: Unsolicited response (source ffc0, tlabel 9)
2007-04-10 13:18:14 BUG: unable to handle kernel paging request at virtual
address 6b6b6b6b
2007-04-10 13:18:14  printing eip:
2007-04-10 13:18:14 f8c533f5
2007-04-10 13:18:14 *pde = 00000000
2007-04-10 13:18:14 Oops: 0000 [#1]
2007-04-10 13:18:14 SMP 
2007-04-10 13:18:14 last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
2007-04-10 13:18:14 Modules linked in: fw_sbp2 radeon drm autofs4 hidp rfcomm
l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4
xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp
ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod raid0
parport_pc lp parport serio_raw ata_piix ata_generic fw_ohci pcspkr fw_core
libata i2c_piix4


Comment 4 Kristian Høgsberg 2007-04-10 21:30:18 UTC
Sigh... taking out of modified.

Comment 5 Jarod Wilson 2007-04-10 21:50:54 UTC
So to further clarify the issue... Things behave nicely if the drives aren't
powered off when unplugging and replugging the firewire cables, but things go
south in a hurry if the drives are powered off, unplugged, plugged back in and
then powered back up.

For posterity, here's the full trace captured the last go 'round:

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
 printing eip:
f8c863f5
*pde = 00000000
Oops: 0000 [#1]
SMP 
last sysfs file:
/devices/pci0000:00/0000:00:09.0/fw1/fw1.0/host23/target23:0:0/23:0:0:0/rev
Modules linked in: fw_sbp2 radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc
nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack
nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter
ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod raid0 parport_pc lp
parport serio_raw fw_ohci fw_core ata_piix i2c_piix4 pcspkr ata_generic 3c59x
i2c_core libata mii sr_mod cdrom sg aic7xxx scsi_transport_spi sd_mod scsi_mod
raid456 xor raid1 ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<f8c863f5>]    Not tainted VLI
EFLAGS: 00010002   (2.6.20-1.3054.fc7 #1)
EIP is at fw_core_handle_response+0x95/0x108 [fw_core]
eax: 0000ffc0   ebx: 6b6b6b5f   ecx: ede1000c   edx: 6b6b6b6b
esi: 0000001e   edi: f6333470   ebp: c077ff4c   esp: c077ff1c
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process swapper (pid: 0, ti=c077f000 task=c06f34e0 task.ti=c0737000)
Stack: 00000296 f7e52218 00000000 c077ff58 f6333000 ffc27920 ffc06000 00000286 
       0000ffc0 f6333000 f5ddda18 f6333d84 c077ffa0 f8c429af f5ddda14 00000002 
       0000005a ffc27920 ffc06000 00000000 c1e68a90 0000000c f5ddda18 00000000 
Call Trace:
 [<c04061e9>] show_trace_log_lvl+0x1a/0x2f
 [<c0406299>] show_stack_log_lvl+0x9b/0xa3
 [<c0406459>] show_registers+0x1b8/0x289
 [<c0406657>] die+0x12d/0x242
 [<c0618026>] do_page_fault+0x3ee/0x4ba
 [<c06167bc>] error_code+0x7c/0x84
 [<f8c429af>] handle_ar_packet+0xe0/0xeb [fw_ohci]
 [<f8c437e4>] ar_context_tasklet+0xb6/0xc4 [fw_ohci]
 [<c042c0a8>] tasklet_action+0x5b/0xc3
 [<c042bfa7>] __do_softirq+0x6f/0xe2
 [<c0407420>] do_softirq+0x61/0xd0
 =======================
Code: 89 f8 8b 55 ec e8 0b 02 99 c7 8b 45 e4 c1 e8 04 83 e0 0f 83 f8 07 74 36 83
f8 0b 74 31 31 c9 31 d2 83 f8 06 75 32 eb 19 8d 5a f4 <8b> 53 0c 0f 18 02 90 8b
45 e0 8d 4b 0c 83 c0 20 39 c1 75 9e eb 
EIP: [<f8c863f5>] fw_core_handle_response+0x95/0x108 [fw_core] SS:ESP 0068:c077ff1c
Kernel panic - not syncing: Fatal exception in interrupt
BUG: warning at arch/i386/kernel/smp.c:549/smp_call_function() (Not tainted)
 [<c04061e9>] show_trace_log_lvl+0x1a/0x2f
 [<c04067ad>] show_trace+0x12/0x14
 [<c0406831>] dump_stack+0x16/0x18
 [<c0417baa>] smp_call_function+0x67/0xcd
 [<c0417c2e>] smp_send_stop+0x1e/0x31
 [<c0427443>] panic+0x54/0x195
 [<c0406737>] die+0x20d/0x242
 [<c0618026>] do_page_fault+0x3ee/0x4ba
 [<c06167bc>] error_code+0x7c/0x84
 [<f8c429af>] handle_ar_packet+0xe0/0xeb [fw_ohci]
 [<f8c437e4>] ar_context_tasklet+0xb6/0xc4 [fw_ohci]
 [<c042c0a8>] tasklet_action+0x5b/0xc3
 [<c042bfa7>] __do_softirq+0x6f/0xe2
 [<c0407420>] do_softirq+0x61/0xd0
 =======================


Comment 6 Jarod Wilson 2007-04-17 17:38:56 UTC
Okay, I'm no longer seeing the kernel panic with 2.6.20-1.3079.fc7. I still have
to unplug and replug the drives at least once after they're powered up to get
them usable on the system, but this is definitely progress...

Comment 7 Stefan Richter 2007-04-28 14:17:43 UTC
Kristian, some (better) SBP-2 devices with extra PSU power their PHY from bus
power when the PSU is switched off.  That way they still act as a repeater when
switched off but plugged in (provided another node powers the bus), and
daisy-chained devices can still be accessed.

When such an SBP-2 device is switched on, it powers up the link layer controller
and spins up the disk, but doesn't make a config ROM available yet.  After the
disk is up and running and the LLC scanned it, it constructs and exposes the
config ROM.  (And usually, but not necessarily, it issues a bus reset to
announce it to the bus.  The SBP-2 spec says it doesn't have to, or actually
/should not/, issue a bus reset if the ROM becomes available within 5s after
power reset.)

Perhaps fw-core's device discovery code ignores such devices for whatever reason.

Comment 8 Stefan Richter 2008-03-04 16:10:04 UTC
Jarod, is anything of this bug still relevant?

(The still unfixed http://bugzilla.kernel.org/show_bug.cgi?id=9617 looks different.)

Comment 9 Jarod Wilson 2008-03-04 17:12:18 UTC
Whoa, I filed this forever ago, forgot all about it... :)

No, haven't seen the panic in ages, and teh unplug-replug dance in comment #6 I
believe was remedied by the login orb config rom read timeout patch.