Description of problem: During boot, following appears in log: Badness in get_phy_reg at drivers/ieee1394/ohci1394.c:235 (Not tainted) [<f8a22a71>] get_phy_reg+0x7a/0xe2 [ohci1394] [<f8a22dfb>] ohci_devctl+0x46/0x57f [ohci1394] [<f8a25aaa>] ohci_irq_handler+0x2ab/0x6be [ohci1394] [<c013a192>] handle_IRQ_event+0x23/0x4c [<c013a235>] __do_IRQ+0x7a/0xcd [<c0104c2e>] do_IRQ+0x5c/0x77 ======================= [<c01035b6>] common_interrupt+0x1a/0x20 [<c02e39b1>] _spin_unlock_irqrestore+0xa/0xc [<f8a22e0f>] ohci_devctl+0x5a/0x57f [ohci1394] [<f8a3f960>] csr1212_generate_csr_image+0x308/0x35d [ieee1394] [<c02e39a5>] _spin_unlock_irq+0x5/0x7 [<c02e2227>] schedule+0x4a1/0x4ff [<f8a3701c>] hpsb_reset_bus+0x1c/0x22 [ieee1394] [<c0128bf4>] run_workqueue+0x7a/0xb5 [<f8a38cf0>] delayed_reset_bus+0x0/0xca [ieee1394] [<c0129006>] worker_thread+0x0/0x10b [<c01290e1>] worker_thread+0xdb/0x10b [<c0117de0>] default_wake_function+0x0/0xc [<c012b676>] kthread+0x91/0xbd [<c012b5e5>] kthread+0x0/0xbd [<c0101005>] kernel_thread_helper+0x5/0xb Badness in set_phy_reg at drivers/ieee1394/ohci1394.c:264 (Not tainted) [<f8a229aa>] set_phy_reg+0x84/0xd1 [ohci1394] [<f8a22e0f>] ohci_devctl+0x5a/0x57f [ohci1394] [<f8a25aaa>] ohci_irq_handler+0x2ab/0x6be [ohci1394] [<c013a192>] handle_IRQ_event+0x23/0x4c [<c013a235>] __do_IRQ+0x7a/0xcd [<c0104c2e>] do_IRQ+0x5c/0x77 ======================= [<c01035b6>] common_interrupt+0x1a/0x20 [<c02e39b1>] _spin_unlock_irqrestore+0xa/0xc [<f8a22e0f>] ohci_devctl+0x5a/0x57f [ohci1394] [<f8a3f960>] csr1212_generate_csr_image+0x308/0x35d [ieee1394] [<c02e39a5>] _spin_unlock_irq+0x5/0x7 [<c02e2227>] schedule+0x4a1/0x4ff [<f8a3701c>] hpsb_reset_bus+0x1c/0x22 [ieee1394] [<c0128bf4>] run_workqueue+0x7a/0xb5 [<f8a38cf0>] delayed_reset_bus+0x0/0xca [ieee1394] [<c0129006>] worker_thread+0x0/0x10b [<c01290e1>] worker_thread+0xdb/0x10b [<c0117de0>] default_wake_function+0x0/0xc [<c012b676>] kthread+0x91/0xbd [<c012b5e5>] kthread+0x0/0xbd [<c0101005>] kernel_thread_helper+0x5/0xb ohci1394: fw-host0: SelfID received outside of bus reset sequence Version-Release number of selected component (if applicable): kernel-2.6.16-1.2133_FC5 How reproducible: Always Related lspci entry: 00:0d.0 FireWire (IEEE 1394): nVidia Corporation nForce2 FireWire (IEEE 1394) Controller (rev a3) (prog-if 10 [OHCI]) Subsystem: ASUSTeK Computer Inc. Unknown device 809a Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 (750ns min, 250ns max) Interrupt: pin A routed to IRQ 20 Region 0: Memory at ed084000 (32-bit, non-prefetchable) [size=2K] Region 1: Memory at ed085000 (32-bit, non-prefetchable) [size=64] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME+
This comes from linux-2.6-debug-sleep-in-irq-warning.patch. The ohci1394 goes to sleep for 1ms when tries to grab the PHY information and cannot. This only happens when the chip is in the bus reset sequence and the driver looped for a 1000 times. Does the FW hardware actually work on that thing once you let it settle? Perhaps it may be easier just blacklist ohci1394 in /etc/modprobe.conf.
The last time I tested (was still FC4), it didn't. I own a external hard drive which has USB and Firewire connectors. USB works fine, Firewire didn't. So it would be only a low prio bug for me, but anyway it should be fixed. I can run tests easily because the related machine would be only used on demand. So if one provide me a new kernel (or even a kernel module), I can test it.
It probably didn't work for the same reason, only now we have a patch which hunts for latency disruptors, so we get a message.
The "badness" was also reported in https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144201#c37 and is tracked as an upstream bug at http://bugzilla.kernel.org/show_bug.cgi?id=6070 These reports are appreciated; sorry that we were not yet able to put fixes for the offending code paths together. The nForce2 is a special case because we are so far relying on an evil hack to get its FireWire controller going in the first place. Problems to access a FireWire HDD may or may not be related to this issue in ohci1394. Peter, are there messages besides that trace when you try to access the disk via FireWire?
Following happen after: 1) connecting powerless external HDD to Firewire 2) power on external HDD Jun 21 23:23:23 host kernel: ieee1394: Error parsing configrom for node 0-00:1023 Jun 21 23:23:23 host kernel: ieee1394: Node changed: 0-00:1023 -> 0-01:1023 Jun 21 23:23:33 host kernel: ieee1394: Node added: ID:BUS[0-00:1023] GUID[0050770e501e0108] Jun 21 23:23:33 host kernel: ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1) Jun 21 23:23:33 host kernel: ieee1394: sbp2: Try serialize_io=0 for better performance Jun 21 23:23:33 host kernel: scsi3 : SBP-2 IEEE-1394 Jun 21 23:23:55 host kernel: ieee1394: sbp2: Error logging into SBP-2 device - login timed-out Jun 21 23:23:55 host kernel: sbp2: probe of 0050770e501e0108-0 failed with error -16 Same happen after reboot with powered-on HDD: usb 3-2.4: new low speed USB device using ohci_hcd and address 3 usb 3-2.4: configuration #1 chosen from 1 choice input: Logitech USB Receiver as /class/input/input1 input: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:02.1-2.4 Badness in get_phy_reg at drivers/ieee1394/ohci1394.c:235 (Not tainted) [<f8a58a71>] get_phy_reg+0x7a/0xe2 [ohci1394] [<f8a58dfb>] ohci_devctl+0x46/0x57f [ohci1394] [<f8a5baaa>] ohci_irq_handler+0x2ab/0x6be [ohci1394] [<c013a192>] handle_IRQ_event+0x23/0x4c [<c013a235>] __do_IRQ+0x7a/0xcd [<c0104c2e>] do_IRQ+0x5c/0x77 ======================= [<c01035b6>] common_interrupt+0x1a/0x20 [<c02e39b1>] _spin_unlock_irqrestore+0xa/0xc [<f8a58e0f>] ohci_devctl+0x5a/0x57f [ohci1394] [<f8aa6960>] csr1212_generate_csr_image+0x308/0x35d [ieee1394] [<c02e39a5>] _spin_unlock_irq+0x5/0x7 [<c02e2227>] schedule+0x4a1/0x4ff [<f8a9e01c>] hpsb_reset_bus+0x1c/0x22 [ieee1394] [<c0128bf4>] run_workqueue+0x7a/0xb5 [<f8a9fcf0>] delayed_reset_bus+0x0/0xca [ieee1394] [<c0129006>] worker_thread+0x0/0x10b [<c01290e1>] worker_thread+0xdb/0x10b [<c0117de0>] default_wake_function+0x0/0xc [<c012b676>] kthread+0x91/0xbd [<c012b5e5>] kthread+0x0/0xbd [<c0101005>] kernel_thread_helper+0x5/0xb Badness in set_phy_reg at drivers/ieee1394/ohci1394.c:264 (Not tainted) [<f8a589aa>] set_phy_reg+0x84/0xd1 [ohci1394] [<f8a58e0f>] ohci_devctl+0x5a/0x57f [ohci1394] [<f8a5baaa>] ohci_irq_handler+0x2ab/0x6be [ohci1394] [<c013a192>] handle_IRQ_event+0x23/0x4c [<c013a235>] __do_IRQ+0x7a/0xcd [<c0104c2e>] do_IRQ+0x5c/0x77 ======================= [<c01035b6>] common_interrupt+0x1a/0x20 [<c02e39b1>] _spin_unlock_irqrestore+0xa/0xc [<f8a58e0f>] ohci_devctl+0x5a/0x57f [ohci1394] [<f8aa6960>] csr1212_generate_csr_image+0x308/0x35d [ieee1394] [<c02e39a5>] _spin_unlock_irq+0x5/0x7 [<c02e2227>] schedule+0x4a1/0x4ff [<f8a9e01c>] hpsb_reset_bus+0x1c/0x22 [ieee1394] [<c0128bf4>] run_workqueue+0x7a/0xb5 [<f8a9fcf0>] delayed_reset_bus+0x0/0xca [ieee1394] [<c0129006>] worker_thread+0x0/0x10b [<c01290e1>] worker_thread+0xdb/0x10b [<c0117de0>] default_wake_function+0x0/0xc [<c012b676>] kthread+0x91/0xbd [<c012b5e5>] kthread+0x0/0xbd [<c0101005>] kernel_thread_helper+0x5/0xb ohci1394: fw-host0: SelfID received outside of bus reset sequence Non-volatile memory driver v1.2 Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 lp: driver loaded but no devices found ieee1394: Node added: ID:BUS[0-00:1023] GUID[0050770e501e0108] ieee1394: Host added: ID:BUS[0-01:1023] GUID[0011d80000093166] input: PC Speaker as /class/input/input2 ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1) ieee1394: sbp2: Try serialize_io=0 for better performance scsi3 : SBP-2 IEEE-1394 ieee1394: sbp2: Error logging into SBP-2 device - login failed sbp2: probe of 0050770e501e0108-0 failed with error -16 So I can't use the HDD via FireWire, but currently don't know whether it is caused by kernel-only or kernel&drive.
Same happen on kernel 2.6.17-1.2130_FC5
The last fix for login timeout problems went into Linux 2.6.16. Except for your report I haven't heard of such issue in the meantime, and none of the latest ieee1394 updates look like seem related. What really happens during the timeout is difficult to find out; a recompilation of the 1394 drivers with debug parameters and running ohci1394+sbp2 in an experimental mode would be required. But here is one thing you could try if you have the time: - install 1394commander (Get the source tarball if it isn't available as package: http://www.ict.tuwien.ac.at/ieee1394/opensource.html ) - after the login timeout, unload sbp2 (modprobe -r sbp2) - run "echo br short | 1394commander" (The raw1394 driver should automatically load when you do that.) - Wait a few seconds and load sbp2 again (modprobe sbp2). I wonder if the "badness" also happens during 1394commander's "short bus reset" and/ or if sbp2 is able to log in after that.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
The "badness" in ohci1394 is not yet fixed upstream. I don't know the status of the login timeout.
Can confirm the bug still occurs, here related dmesg from 2.6.18-1.2200.fc5 BUG: warning at drivers/ieee1394/ohci1394.c:234/get_phy_reg() (Not tainted) [<c0403f10>] dump_trace+0x69/0x1af [<c040406e>] show_trace_log_lvl+0x18/0x2c [<c04045e9>] show_trace+0xf/0x11 [<c0404673>] dump_stack+0x15/0x17 [<f8a6164e>] get_phy_reg+0x8b/0x105 [ohci1394] [<f8a62a7e>] ohci_devctl+0x49/0x5de [ohci1394] [<f8a64f03>] ohci_irq_handler+0x378/0x786 [ohci1394] [<c0440c82>] handle_IRQ_event+0x23/0x49 [<c0440d2a>] __do_IRQ+0x82/0xde [<c040536d>] do_IRQ+0x9a/0xb8 ======================= BUG: warning at drivers/ieee1394/ohci1394.c:263/set_phy_reg() (Not tainted) [<c0403f10>] dump_trace+0x69/0x1af [<c040406e>] show_trace_log_lvl+0x18/0x2c [<c04045e9>] show_trace+0xf/0x11 [<c0404673>] dump_stack+0x15/0x17 [<f8a61564>] set_phy_reg+0x99/0xf8 [ohci1394] [<f8a62a92>] ohci_devctl+0x5d/0x5de [ohci1394] [<f8a64f03>] ohci_irq_handler+0x378/0x786 [ohci1394] [<c0440c82>] handle_IRQ_event+0x23/0x49 [<c0440d2a>] __do_IRQ+0x82/0xde [<c040536d>] do_IRQ+0x9a/0xb8 ======================= ohci1394: fw-host0: SelfID received outside of bus reset sequence lp: driver loaded but no devices found ieee1394: Host added: ID:BUS[0-00:1023] GUID[0011d80000093166]
Peter, did you get to try Stefans commands in comment #7 ?
# lsmod|grep sbp2 -> empty, module not loaded # echo br short | 1394commander 1394commander 0.1.1 Copyright (C) 2002-2004 by Manfred Weihs <weihs.ac.at> This software comes with absolutely no warranty. No adapter specified! couldn't get handle: No such file or directory This probably means that you don't have raw1394 support in the kernel or that you haven't loaded the raw1394 module. # modprobe raw1394 Oct 22 13:51:36 worker kernel: ieee1394: raw1394: /dev/raw1394 device initialized # echo br short | 1394commander 1394commander 0.1.1 Copyright (C) 2002-2004 by Manfred Weihs <weihs.ac.at> This software comes with absolutely no warranty. No adapter specified! successfully got handle current generation number (driver): 1 1 card(s) found nodes on bus: 1, card name: ohci1394 using adapter 0 found: 1 nodes on bus, local ID is 0, IRM is 0 current generation number (adapter): 1 entering command mode Type 'help' for more information! Command: successfully reset the bus (short bus reset) Command: bus reset occurred. new generation number: 2, 1 nodes on the bus, local ID: 0 reached EOF -> exit dmesg: Oct 22 13:52:03 host kernel: ohci1394: fw-host0: SelfID received outside of bus reset sequence # modprobe sbp2 dmesg: Oct 22 13:53:16 host kernel: ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1) Oct 22 13:53:16 host kernel: ieee1394: sbp2: Try serialize_io=0 for better performance Powering on external hard disk device: Oct 22 13:54:54 host kernel: ieee1394: Error parsing configrom for node 0-00:1023 Oct 22 13:54:54 host kernel: ieee1394: Node changed: 0-00:1023 -> 0-01:1023 Oct 22 13:55:04 host kernel: ieee1394: Node added: ID:BUS[0-00:1023] GUID[0050770e501e0108] Oct 22 13:55:04 host kernel: scsi3 : SBP-2 IEEE-1394 Oct 22 13:55:26 host kernel: ieee1394: sbp2: Error logging into SBP-2 device - login timed-out Oct 22 13:55:26 host kernel: sbp2: probe of 0050770e501e0108-0 failed with error -16 Oct 22 13:55:26 host kernel: scsi4 : SBP-2 IEEE-1394 Oct 22 13:55:48 host kernel: ieee1394: sbp2: Error logging into SBP-2 device - login timed-out Oct 22 13:55:48 host kernel: sbp2: probe of 0050770e501e0108-0 failed with error -16 Powering off: Oct 22 13:56:44 host kernel: ieee1394: Node changed: 0-01:1023 -> 0-00:1023 Oct 22 13:56:44 host kernel: ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0050770e501e0108]
Please also test this sequence: - Connect the disk and switch it on. - Watch the login failure to happen. - Unload sbp2 but keep the disk on and attached. - Issue the bus reset with 1394commander. - Wait a few seconds. - Load sbp2. Thanks for the help.
- Connect the disk and switch it on. - Watch the login failure to happen. Strange, doesn't happen like before after reboot (perhaps a power-cycle issue, also). Disk is recognized, but command timeouts (tried fdisk -l): Oct 22 15:46:18 host kernel: sd 3:0:0:0: Attached scsi generic sg1 type 14 Oct 22 15:46:48 host kernel: ieee1394: sbp2: aborting sbp2 command Oct 22 15:46:48 host kernel: sd 3:0:0:0: Oct 22 15:46:48 host kernel: command: Read(10): 28 00 1d 1c 45 6f 00 00 02 00 Oct 22 15:47:18 host kernel: ieee1394: sbp2: aborting sbp2 command Oct 22 15:47:18 host kernel: sd 3:0:0:0: Oct 22 15:47:18 host kernel: command: Read(10): 28 00 1d 1c 45 6f 00 00 02 00 Oct 22 15:47:48 host kernel: ieee1394: sbp2: aborting sbp2 command Oct 22 15:47:48 host kernel: sd 3:0:0:0: Oct 22 15:47:48 host kernel: command: Read(10): 28 00 1d 1c 45 6f 00 00 02 00 - Unload sbp2 but keep the disk on and attached. Done. - Issue the bus reset with 1394commander. Done. - Wait a few seconds. - Load sbp2. Done, now disk is re-recognized, fdisk -l works.
The FireWire-to-IDE bridge of your disk is from Prolific, as I can see from the vendor OUI part of the GUID. Many of these devices are sold with buggy FireWire firmwares which are also a PITA for Windows and MAC users. Their USB part generally works though. If you have got a Windows PC around, you can upload current FireWire firmware. Their uploader works AFAIK over USB. http://www.prolific.com.tw/eng/downloads.asp?ID=44 I have got a Prolific based DVD-RW too and it sometimes shows problems like in your last two reports. It is totally worthless on a TI 1394b adapter but somewhat works on a VIA 1394a adapter. Somebody with such an enclosure fixed it by firmware update. I will not update my enclosure anytime soon; perhaps I will get an idea someday what to do with those devices. Also, Windows and MacOS probably issue a bus reset for an unrelated bus-management feature which I plan to implement on Linux at some point; this might get those Prolific devices going as a side effect. What other people have to say about Prolific based FireWire devices: http://missig.org/julian/blog/2004/06/10/prolific-pl3507-firewire-device/ I still don't know whether the "badness" is influencing the Prolific device. At least it obviously doesn't occur during the software-triggered bus reset. The "badness" is a hard one to fix, and whatever fix anybody might come up with eventually may have quite a potential for other compatibility regressions. That's one of the reasons why it is not fixed yet...
Thank you for the update hint, I have to use the oldest provided updater, but then upload of newest firmware works. Anyway, the kernel reports after boot (without connecting the device): BUG: warning at drivers/ieee1394/ohci1394.c:234/get_phy_reg() (Not tainted) [<c0403f10>] dump_trace+0x69/0x1af [<c040406e>] show_trace_log_lvl+0x18/0x2c [<c04045e9>] show_trace+0xf/0x11 [<c0404673>] dump_stack+0x15/0x17 [<f8a6464e>] get_phy_reg+0x8b/0x105 [ohci1394] [<f8a65a7e>] ohci_devctl+0x49/0x5de [ohci1394] [<f8a67f03>] ohci_irq_handler+0x378/0x786 [ohci1394] [<c0440c82>] handle_IRQ_event+0x23/0x49 [<c0440d2a>] __do_IRQ+0x82/0xde [<c040536d>] do_IRQ+0x9a/0xb8 ======================= BUG: warning at drivers/ieee1394/ohci1394.c:263/set_phy_reg() (Not tainted) [<c0403f10>] dump_trace+0x69/0x1af [<c040406e>] show_trace_log_lvl+0x18/0x2c [<c04045e9>] show_trace+0xf/0x11 [<c0404673>] dump_stack+0x15/0x17 [<f8a64564>] set_phy_reg+0x99/0xf8 [ohci1394] [<f8a65a92>] ohci_devctl+0x5d/0x5de [ohci1394] [<f8a67f03>] ohci_irq_handler+0x378/0x786 [ohci1394] [<c0440c82>] handle_IRQ_event+0x23/0x49 [<c0440d2a>] __do_IRQ+0x82/0xde [<c040536d>] do_IRQ+0x9a/0xb8 ======================= So this is mainboard related and is unrelated to the disk drive. Afterwards (softbooting from W2K to FC5), the drive is proper recognized. ieee1394: Error parsing configrom for node 0-00:1023 <-- Hmm...happen still with newer firmware. ieee1394: Node changed: 0-00:1023 -> 0-01:1023 ieee1394: Node added: ID:BUS[0-00:1023] GUID[0050770e501e0108] ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1) ieee1394: sbp2: Try serialize_io=0 for better performance scsi4 : SBP-2 IEEE-1394 ieee1394: sbp2: Logged into SBP-2 device ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048] Vendor: WDC WD25 Model: 00JB-00GVA0 Rev: Type: Direct-Access-RBC ANSI SCSI revision: 04 SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) sdb: Write Protect is off sdb: Mode Sense: 00 13 00 00 SCSI device sdb: drive cache: write back SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB) sdb: Write Protect is off sdb: Mode Sense: 00 13 00 00 SCSI device sdb: drive cache: write back sdb: sdb1 sd 4:0:0:0: Attached scsi disk sdb sd 4:0:0:0: Attached scsi generic sg1 type 14 It also works now after power-cycle of PC and drive. So it looks like this issue is fixed, but the "BUG" should be fixed, too.
Yes, the "BUG: warning" is related to the host controller, although ieee1394 + ohci1394 trip over this with several (or many?) different controllers. However that one is tracked as FC bug 144201 and as upstream bug. So since the drive works now, or at least works most of the time due to firmware update, this bug here can be closed or marked as a duplicate.
*** This bug has been marked as a duplicate of 144201 ***