Bug 228073
Summary: | Uninorth 1394 controller quirks not handled | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Helmut Schlattl <helmut.schlattl> | ||||
Component: | kernel | Assignee: | Jay Fenlason <fenlason> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 12 | CC: | davej, jarod, jfeeney, jrb, stefan-r-rhbz, triage, wtogami, zaitcev | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | powerpc | ||||||
OS: | Linux | ||||||
Whiteboard: | bzcl34nup | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-12-05 07:17:04 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Helmut Schlattl
2007-02-09 21:16:00 UTC
Created attachment 147807 [details]
system log
Could you provide the lspci output that mentions your firewire controller and the model, make and if possible the bridge chipset version of your external harddisk Also, is this a regression from older Fedora kernels? For the next version of Fedora, we're switching to a different firewire stack. It's already available in the rawhide kernels, and if it's possible for you to try that out, that would be much appreciated. Be sure to try at least 2.6.20-1.2924.fc7 or laters, since that one has a lot of fixes over what we've been carrying the last couple of weeks. lspci gives the following for the firewire-controller: 0002:24:0e.0 FireWire (IEEE 1394): Apple Computer Inc. UniNorth FireWire (rev 01) (prog-if 10 [OHCI]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 (750ns min, 1000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 40 Region 0: Memory at f5000000 (32-bit, non-prefetchable) [size=4K] I plugged the firewire drive into another computer and got the following in the system log: Feb 13 20:13:33 ##### kernel: ieee1394: Error parsing configrom for node 0-00:1023 Feb 13 20:13:36 ##### kernel: ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1) Feb 13 20:13:36 ##### kernel: ieee1394: sbp2: Try serialize_io=0 for better performance Feb 13 20:13:36 ##### kernel: scsi2 : SBP-2 IEEE-1394 Feb 13 20:13:37 ##### kernel: ieee1394: sbp2: Logged into SBP-2 device Feb 13 20:13:37 ##### kernel: Vendor: Initio Model: SAMSUNG MP0402H Rev: 4.61 Feb 13 20:13:37 ##### kernel: Type: Direct-Access ANSI SCSI revision: 00 Feb 13 20:13:37 ##### kernel: SCSI device sdf: 78242976 512-byte hdwr sectors (40060 MB) Feb 13 20:13:37 ##### kernel: sdf: Write Protect is off Feb 13 20:13:37 ##### kernel: sdf: missing header in MODE_SENSE response Feb 13 20:13:37 ##### kernel: SCSI device sdf: drive cache: write back Feb 13 20:13:37 ##### kernel: SCSI device sdf: 78242976 512-byte hdwr sectors (40060 MB) Feb 13 20:13:37 ##### kernel: sdf: Write Protect is off Feb 13 20:13:37 ##### kernel: sdf: missing header in MODE_SENSE response Feb 13 20:13:37 ##### kernel: SCSI device sdf: drive cache: write back Feb 13 20:13:37 ##### kernel: sdf: [mac] sdf1 sdf2 sdf3 sdf4 sdf5 sdf6 sdf7 sdf8 sdf9 sdf10 sdf11 Feb 13 20:13:37 ##### kernel: sd 2:0:0:0: Attached scsi disk sdf Feb 13 20:13:37 ##### kernel: sd 2:0:0:0: Attached scsi generic sg7 type 0 Concerning the new kernel: I tried 2.6.20-1.2925.fc7, but could not boot, as the kernel stopped with: Kernel BUG at mm/slab.c:2878 (I did not upgrade mkinitrd to the fc7-version, as this would have required many more packages to be updated. But this should not have caused the kernel bug.) As far as I remember the drive did work in one of the previous kernels. But unfortunately I can't remember which kernel version it was. I did not need the drive for a while, and thus there have certainly be a couple of new kernel releases since it was last connected to my powerbook. I tried a new kernel of Fedora Development with the new firewire-stack: 2.6.20-1.2960.fc7 I also updated a few other packages from the development branch: udev: 105-1 mkinitrd: 6.0.6-6 (recompiled on FC6) module-init-tools: 3.3-0.pre6.1.9.fc7 iptables: 1.3.7-1.1 Just to be sure, I also included manually the module fw_sbp2, such that fw_core, fw_ohci, and fw_sbp2 are all loaded. When I now plug-in my firewire, I now only get two equal messages in the kernel log: fw_ohci: recursive bus reset detected, discarding self ids fw_ohci: recursive bus reset detected, discarding self ids That's all. Of course, no devices are created. Any ideas? Re comment #1: This is strange. Everything works --- read requests to the device's config ROM, quadlet write to the management agent register, reception of login status into sbp2's status FIFO --- up until the first quadlet write request. It appears as if that request never goes out to the bus, since the "Packet sent to node 1 tcode=0x0 tLabel=46 ack=..." line is missing. Only the hardware-triggered bus reset wakes everything up again. Even the hpsb_node_write in sbp2_set_busy_timeout does not timeout after a SPLIT_TIMEOUT duration. I.e. the driver stack must have hung up internally. Weird. The information in comment #4 does not fit into this picture either. If there was a bus reset loop, ohci1394 should repeatedly log the respective IntEvent if compiled for verbose debuging like Helmut did. (In reply to comment #5) > Re comment #1: > This is strange. Everything works --- read requests to the device's config ROM, > quadlet write to the management agent register, reception of login status into > sbp2's status FIFO --- up until the first quadlet write request. > > It appears as if that request never goes out to the bus, since the "Packet sent > to node 1 tcode=0x0 tLabel=46 ack=..." line is missing. Only the > hardware-triggered bus reset wakes everything up again. Even the > hpsb_node_write in sbp2_set_busy_timeout does not timeout after a SPLIT_TIMEOUT > duration. I.e. the driver stack must have hung up internally. Weird. > > The information in comment #4 does not fit into this picture either. If there > was a bus reset loop, ohci1394 should repeatedly log the respective IntEvent if > compiled for verbose debuging like Helmut did. My theory is that the problem in comment #4 could be a byte swapping issue. I've tested on ppc with an uninorth rev 2 chipset which worked, but I know the old stack has some endian hacks for the uninorth chipset. Seeing that this is a rev 1 chipset, maybe that revision has a problem with byteswapping... Perhaps we have a compatibility regression with UniNorth once again. From peeking into kernel-2.6.19-1.2895.fc6.src.rpm it seems that its ohci1394 is the same as in kernel.org's 2.6.19.2. Now it would help a bit to know which kernel worked for you last. However, checking all our ohci1394 updates from 2.6.16 to 2.6.19.2, this seemingly correct patch in 2.6.19.2 is certainly the most intrusive one if we assume an UniNorth specific bug: "ieee1394: ohci1394: add PPC_PMAC platform code to driver probe" http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.19.y.git;a=commitdiff_plain;h=459593b95acfee630b5c8a33e674d1a802a5b6c7 Full history for ohci1394 in 2.6.19.y: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.19.y.git;a=history;f=drivers/ieee1394/ohci1394.c Helmut, could you try with this patch reverted and the machine powered off and on again? And what kind of Mac is it? Do you have access to an add-on FireWire card to test in this Mac? Re comment (In reply to comment #7) As suggested, I compiled a new kernel based on 2.6.20-1.2925 with the patch reversed: No change. The same kernel log and udevmonitor output (at least as far I can see). The Mac is a Powerbook G3 Pismo with an G4 processor upgrade card. Unfortunately, I have no other firewire card for the powerbook available, so I cannot check whether this a specific UniNorth problem. Re comment #4: UniNorth rev1 support has been added to fw-ohci at the end of last week. It's largely untested though because my own PowerBook has a defect PHY, so that I can only verify the pci_driver probe, suspend, resume, remove, and the proper byte order of the self ID buffer whit just the controller's own self ID in it. Helmut, are you still using Fedora on the PowerBook? And if so, Jarod, do you have a bleeding-edge kernel for him? PS:
> fw_ohci: recursive bus reset detected, discarding self ids
> fw_ohci: recursive bus reset detected, discarding self ids
At least this is definitely fixed for PowerBook G4 first generation which should
have the same controller as the Pismo. I am not sure of the rest of the
UniNorth support; I wrote that code blindly according to what is in ohci1394
which didn't work properly according to the report against kernel 2.6.19.
The latest Fedora 8 kernel build carries all the PowerPC Mac fixes: http://koji.fedoraproject.org/packages/kernel/2.6.24.3/17.fc8/ Would definitely be worth giving it a go. So eventually I have install the latest kernel (2.6.24.3-17.fc8) on my Pismo. Unfortunately, there is still no success. Here the output from dmesg: firewire_core: phy config: card 0, new root=ffc1, gap_count=5 SCSI subsystem initialized scsi2 : SBP-2 IEEE-1394 firewire_core: created device fw1: GUID 0010100300000000, S400, 1 config ROM retries firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries) firewire_sbp2: fw1.0: orb reply timed out, rcode=0x11 It looks like we actually did login to the drive there, but then some other orb timed out... Can you post a full dmesg dump, from at least the first occurrence of firewire_core in your logs? Hard to tell from the truncated version there what went wrong. Certainly I would provide more output, if there had been anything more! Much earlier in dmesg, of course, the loading of the module has been reported: firewire_ohci: Added fw-ohci device 0002:24:0e.0, OHCI version 1.0 firewire_core: created device fw0: GUID 003065fffe3f9f74, S400 But I guess this does not help a lot. So, if you tell me what to do, then I'll post a more detailed output. By the way, 'cat /proc/sys/kernel/printk' yields: 7 4 1 7 Perhaps a manual # modprobe -r firewire-sbp2 # modprobe firewire-sbp2 gets a little bit more out of it. Right now I don't understand though why the fw-sbp2 driver doesn't proceed with some retries. Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers We got some additional fixage into later kernels, would potentially be worth trying out the latest updates kernel (2.6.24.4-64.fc8 right now, iirc). Could be coherent dma issues were impacting ppc similar to the way they were affecting some x86_64 systems. changing version to 'rawhide' to avoid EOL closure Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping There has been one or another driver update since Jarod's last message. If it still not works with a recent kernel update (Jarod: Should an F8 or F9 kernel be tried?), unloading firewire-ohci, "modprobe firewire-ohci debug=7", and plugging the disk in should provide some additional diagnostics in dmesg. For a Fedora 8 system, it might be worth trying the latest 2.6.25.x-based kernel: http://kojipkgs.fedoraproject.org/packages/kernel/2.6.25.4/14.fc8/ For a Fedora 9 system: http://kojipkgs.fedoraproject.org/packages/kernel/2.6.25.4/39.fc9/ The two should have more or less identical firewire stacks in them. Also, I now have a pismo powerbook in-house myself, though I need to fetch it back from someone else in the office, so I can do some prodding there as well... Helmut, do you still have the hardware (and Fedora on it)? If so, please test once more with the latest Fedora kernel package. There have been a few firewire driver updates in the meantime which /perhaps/ have a positive effect. Yes, I still have the hardware and I created a second fresh Fedora 10 system on my Pismo, without any changes in the configuration etc. So the kernel version is 2.6.27.12-170.2.5.fc10 The only thing I manually did was: rmmod firewire_ohci modprobe firewire_ohci debug=7 modprobe firewire_sbp2 (firewire_core is also loaded) firewire_ohci gave the following in the kernel log during modprobe: Feb 8 19:23:26 *** kernel: firewire_ohci: Added fw-ohci device 0002:24:0e.0, OHCI version 1.0 Feb 8 19:23:26 *** kernel: firewire_ohci: IRQ 00010010 selfID AR_req Feb 8 19:23:26 *** kernel: firewire_ohci: 1 selfIDs, generation 1, local node ID ffc0 Feb 8 19:23:26 *** kernel: firewire_ohci: selfID 0: 807f8c52, phy 0 [--.] S400 gc=63 -3W Lci Feb 8 19:23:26 *** kernel: firewire_ohci: AR evt_bus_reset, generation 1 Feb 8 19:23:26 *** kernel: firewire_core: created device fw0: GUID 003065fffe3f9f74, S400 When plugging in my external FW drive I get repeatedly from firewire_ohci: Feb 8 19:26:04 *** kernel: firewire_ohci: IRQ 00000010 AR_req Feb 8 19:26:04 *** kernel: firewire_ohci: IRQ 00010000 selfID Feb 8 19:26:04 *** kernel: firewire_ohci: AR evt_bus_reset, generation 2 Feb 8 19:26:04 *** kernel: firewire_ohci: 2 selfIDs, generation 2, local node ID ffc1 Feb 8 19:26:04 *** kernel: firewire_ohci: selfID 0: 807f8490, phy 0 [p-.] S400 gc=63 -3W L Feb 8 19:26:04 *** kernel: firewire_ohci: selfID 0: 817f8cd2, phy 1 [c-.] S400 gc=63 -3W Lci and from firewire_core: Feb 8 19:26:04 *** kernel: firewire_core: phy config: card 0, new root=ffc1, gap_count=5 ... Feb 8 19:26:34 *** kernel: firewire_core: giving up on config rom for node id ffc1 Feb 8 19:26:34 *** kernel: firewire_core: phy config: card 0, new root=ffc0, gap_count=5 The last message (changing sometimes from root=ffc0 to root=ffc1 and back) and the firewire_ohci messages are repeated until I remove the FW device. No message from firewire_sbp2. Does somebody has an idea? Yes, I still have the hardware and I created a second fresh Fedora 10 system on my Pismo, without any changes in the configuration etc. So the kernel version is 2.6.27.12-170.2.5.fc10 The only thing I manually did was: rmmod firewire_ohci modprobe firewire_ohci debug=7 modprobe firewire_sbp2 (firewire_core is also loaded) firewire_ohci gave the following in the kernel log during modprobe: Feb 8 19:23:26 *** kernel: firewire_ohci: Added fw-ohci device 0002:24:0e.0, OHCI version 1.0 Feb 8 19:23:26 *** kernel: firewire_ohci: IRQ 00010010 selfID AR_req Feb 8 19:23:26 *** kernel: firewire_ohci: 1 selfIDs, generation 1, local node ID ffc0 Feb 8 19:23:26 *** kernel: firewire_ohci: selfID 0: 807f8c52, phy 0 [--.] S400 gc=63 -3W Lci Feb 8 19:23:26 *** kernel: firewire_ohci: AR evt_bus_reset, generation 1 Feb 8 19:23:26 *** kernel: firewire_core: created device fw0: GUID 003065fffe3f9f74, S400 When plugging in my external FW drive I get repeatedly from firewire_ohci: Feb 8 19:26:04 *** kernel: firewire_ohci: IRQ 00000010 AR_req Feb 8 19:26:04 *** kernel: firewire_ohci: IRQ 00010000 selfID Feb 8 19:26:04 *** kernel: firewire_ohci: AR evt_bus_reset, generation 2 Feb 8 19:26:04 *** kernel: firewire_ohci: 2 selfIDs, generation 2, local node ID ffc1 Feb 8 19:26:04 *** kernel: firewire_ohci: selfID 0: 807f8490, phy 0 [p-.] S400 gc=63 -3W L Feb 8 19:26:04 *** kernel: firewire_ohci: selfID 0: 817f8cd2, phy 1 [c-.] S400 gc=63 -3W Lci and from firewire_core: Feb 8 19:26:04 *** kernel: firewire_core: phy config: card 0, new root=ffc1, gap_count=5 ... Feb 8 19:26:34 *** kernel: firewire_core: giving up on config rom for node id ffc1 Feb 8 19:26:34 *** kernel: firewire_core: phy config: card 0, new root=ffc0, gap_count=5 The last message (changing sometimes from root=ffc0 to root=ffc1 and back) and the firewire_ohci messages are repeated until I remove the FW device. No message from firewire_sbp2. Does anybody have an idea? Endless repetition of "self ID complete" events could be caused by firewire-ohci lacking a workaround for Uninorth v1 to break out of bus reset loops (but then I would rather expect endless bus reset events without self ID complete events), or could simply be a sign of dying hardware. (In reply to comment #25) > Endless repetition of "self ID complete" events could be caused by > firewire-ohci lacking a workaround for Uninorth v1 to break out of bus reset > loops (but then I would rather expect endless bus reset events without self ID > complete events), or could simply be a sign of dying hardware. I suppose, too, that there is some special treatment for the Uninorth v1 required. I don't think, that it's a hardware problem. The same FW hard disk is working correctly under Mac OSX (on the same Pismo) and on my second system (x86_64). Yep, I managed to acquire a Pismo a while back, and was able to observe similar behavior. I loaned the machine to someone else for some X-related stuff, and forgot all about it, need to go get it back from them, and with luck, I'll have some time soon to try to debug this... Sadly, my Pismo appears to have up and died, so I'm now without a realistic way to debug this. :( This message is a reminder that Fedora 10 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 10. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '10'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 10's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 10 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Can the title of this bug be changed to or amended with something like "Uninorth 1394 controller quirks not handled"? (I can't edit the title, can anybody else?) This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |