Bug 429598
Summary: | [firewire] unable to use disk (giving up on config rom) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Nicola <alf.tanner> |
Component: | kernel | Assignee: | Jarod Wilson <jarod> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 8 | CC: | kernel-maint, stefan-r-rhbz, whiteg |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-02-25 20:13:36 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nicola
2008-01-21 21:14:31 UTC
Those dmesg bits aren't valid for the Fedora 8 firewire stack. The ieee1394 stack isn't built or shipped, we ship the stack that prints 'firewire' instead, and fw-sbp2 instead of just sbp2. Please get the appropriate firewire stack running and we can certainly dig into this (might even already be fixed in the latest koji kernels). Hrm, in re-reading, perhaps that was supposed to be an example of the working case. To have a chance of getting things fixed, what I need is the non-working case, as described in comment #1. Hi Jarod. No, it can't be an example of a working case since it's F8 as I reported. With F8 firewire interface doesn't work. You asked me to use the appropriate firewire stack. How do I do that in F8 in order to report problems? Oops! Sorry, you're right, I've picked up the old messages file! Here we are: Jan 23 15:33:23 sputnik kernel: firewire_core: phy config: card 0, new root=ffc1, gap_count=5 Jan 23 15:33:30 sputnik kernel: firewire_core: phy config: card 0, new root=ffc1, gap_count=5 Jan 23 15:33:54 sputnik kernel: firewire_core: giving up on config rom for node id ffc0 and that's all. fdisk -l cant' see any device, like automounter and gnome-mount. Yep, that's the bits I need. Basically, we're failing to read the configuration rom on the drive, which means the firewire stack doesn't have a clue what sort of capabilities the device has -- doesn't know if its a drive, a camera, or what -- so we never set up the sbp2 layer for storage. As it happens, I'm actually working on this very problem right now with the upstream firewire maintainer, Stefan Richter (cc'ing). I can reproduce this on my laptop, and really want to use some firewire drives with it, so you can be sure I plan to get it working reliably... ;) Excellent, thanks a lot. So is this a kernel fault only or udev is involved as well? It's purely a fault in the kernel drivers. Once we got them fixed, udev and friends will do the right thing. I've got a patch based on some of Stefan's work and some review comments that is working quite well now on multiple system and drive combinations that were previously hitting the 'giving up on config rom' problem, which I've added to rawhide, and after a touch more testing, will get it into F8 and F7. In 2.6.23.14-130.fc8 (and later), 2.6.23.14-74.fc7 (and later) and current rawhide kernels. F8 and F7 builds not yet actually underway, but should happen RSN. These fixes are available in the latest rawhide, f8 and f7 kernels, closing bug. What is "latest"? I installed 2.6.24.3-12.fc8 from "testing', but the problem is unchanged. Under FC6 I was using firewire with an early ipod, but it has never worked with F7 or F8. It just goes to the "OK to disconnect" screen. There have been postings that ALi firewire needed DMA disabled in the old spb2, but I don't see the option with firewire_sbp2 -- does this card need one of the other workarounds? # lspci -v -s 00:07.4 00:07.4 FireWire (IEEE 1394): ALi Corporation M5253 P1394 OHCI 1.1 Controller (prog-if 10 [OHCI]) Subsystem: ALi Corporation M5253 P1394 OHCI 1.1 Controller Flags: 66MHz, medium devsel, IRQ 16 Memory at fe124000 (32-bit, non-prefetchable) [size=2K] Expansion ROM at fe000000 [disabled] [size=64K] Capabilities: [80] Power Management version 2 Kernel modules: firewire-ohci $ uname -a Linux cerberus.cwmannwn.nowhere 2.6.24.3-12.fc8 #1 SMP Tue Feb 26 14:58:29 EST 2008 i686 i686 i386 GNU/Linux $ dmesg | grep firewire firewire_ohci: Added fw-ohci device 0000:00:07.4, OHCI version 1.10 firewire_core: created device fw0: GUID 0090e639000005df, S400 firewire_core: phy config: card 0, new root=ffc1, gap_count=5 firewire_core: giving up on config rom for node id ffc0 > There have been postings that ALi firewire needed DMA disabled > in the old spb2 This is a compile-time option of the sbp2 driver and it would only do something if also a load-time option of ohci1394 was re-configured. I suppose that these postings refer to a bug which was fixed in Linux 2.6.16. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=35bdddb83f62978b5fad82a14fbfd78cc3a5a60c This bug is not present in firewire-{ohci,core,sbp2}. Also, that bug was about the drivers not seeing requests from the device, whereas your problem is about requests from the drivers to the device. These requests fail or don't return what the drivers want. We need to put some diagnostics into the drivers to better understand the failure mode. When you used the iPod with FC6, did it become operational as a storage device sooner or later than 30 seconds after plugin? Hmm, comment #11 says > What is "latest"? I installed 2.6.24.3-12.fc8 from "testing', but the > problem is unchanged. and comment #9 says > In 2.6.23.14-130.fc8 (and later), 2.6.23.14-74.fc7 (and later) and > current rawhide kernels about a driver update mentioned in comment #8. Could you try to get your hands on this or any later kernel package? On the other hand, the 2.6.24.3-12.fc8 build date is one month after the patch was posted, so I would expect it to be in there. (It appeared in 2.6.25-rc1 in mainline though.) Comment #11 asks: > When you used the iPod with FC6, did it become operational as a > storage device sooner or later than 30 seconds after plugin? Hard to be sure, but I'd say a bit less than 30s -- I could plug it in and see the little apple, and see it come online in /var/log/messages or dmesg, then use the info to type some 'mount -t hfsplus ...' without feeling that I was waiting for things to happen. It now takes 10s to get past the litle apple to the "OK to disconnect" screen which seems like about the time it used to come alive. Comment #14 -- I expected the patches to be there too, but the Murphy of the law is a busy guy. I have the sources, and there are two files of patches: -rw-r--r-- 1 gwhite bod 40388 2008-02-15 19:58 linux-2.6-firewire-git-pending.patch -rw-r--r-- 1 gwhite bod 39433 2008-02-15 19:58 linux-2.6-firewire-git-update.patch The git-pending-patch includes the "delay inquiry = 0x10" option to the workarounds parameter. I've tried that without success, if that is behind your query about the length of time it takes for the iPod to become mountable. 2.6.24.3-12.fc8 definitely has the config rom read fixups in it, I just triple-checked. FWIW, my own FireWire iPod works just fine w/the new stack, as do a few different models krh has. There are still some controller and device combinations I run into from time to time that simply don't play nice with the new stack yet though. :\ George:
> The git-pending-patch includes the "delay inquiry = 0x10" option to
> the workarounds parameter.
OK, this is indeed very recent and hence includes the "giving up on config rom"
patch (as Jarod confirmed).
BTW, switching the few firewire-sbp2 module options won't work for you. The
firewire-core doesn't come far enough to hand over control to firewire-sbp2; the
problem happens earlier.
One thing that often works when I do occasionally hit 'giving up on config rom' these days is to simply unload and reload the firewire-ohci module with the device already plugged in. Probably worth opening a new bug to track the cases where we still hit 'giving up on config rom'... I tried a different external firewire device and it also has the problem. I also tried reloading "firewire-ohci module with the device already plugged in" (both devices) and no success. I should get another cable just to be sure that isn't the problem. Could http://lkml.org/lkml/2008/3/2/80 be related? That one is only relevant though if two or more config ROM reads happen simultaneously and are spread over two or more workqueue threads (i.e. [events/*] threads). If so, said patch is included in the latest f8 kernel build in koji: http://koji.fedoraproject.org/packages/kernel/2.6.24.3/17.fc8/ Still no joy... 2.6.24.3-34.fc8 #1 SMP Wed Mar 12 16:51:49 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux dmesg: virbr0: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature. virbr0: starting userspace STP failed, starting kernel STP ip_tables: (C) 2000-2006 Netfilter Core Team nf_conntrack version 0.5.0 (16384 buckets, 65536 max) firewire_core: giving up on config rom for node id ffc1 firewire_core: phy config: card 0, new root=ffc0, gap_count=5 Nicola, how much memory does your system have? This *could* actually be the coherent dma problem that was biting my laptop, which is x86_64 w/4GB of RAM. If the system has memory mapped over the 4GB boundary, we'll sometimes try to use that for dma buffers, which causes problems (on my laptop, regular config rom read failures like you're seeing). This particular fix went into 2.6.24.3-37.fc8, and there's a subsequent 2.6.24.3-38.fc8. Give one of those a spin, please! http://koji.fedoraproject.org/packages/kernel/2.6.24.3/38.fc8/ Actually my x86_64 has 4G: free total used free shared buffers cached Mem: 4061872 2058908 2002964 0 105800 862696 -/+ buffers/cache: 1090412 2971460 Swap: 4192880 0 4192880 Experimenting with another kernel is however another issue, as I'm using nvidia server, thus I'm supposed to recompile kmod and server. That is a no go for a working machine. Could I instead reboot and tell grub I have, say, 2G of ram in order to try this hypothesis? Not sure if booting with mem=2G would help or not... However, if you were to simply boot into run-level 3 w/the new kernel and plug in the drive, we would still at least see stuff logged about the disk and be able to verify whether or not this is indeed the fix for your issue. I suspect now that it is. Oh, its also possible to simply rebuild the firewire kernel modules from -38.fc8 on top of -34.fc8 and drop them in place of the provided -34.fc8 ones. Not terribly complex to do either: 1) yum install kernel-devel 2) grab kernel-2.6.24.3-38.fc8.src.rpm 3) rpm -ivh kernel-2.6.24.3-38.fc8.src.rpm 4) rpmbuild -bp kernel.spec (will be in /usr/src/redhat/SPECS by default) 5) cd /usr/src/redhat/BUILD/kernel-2.6.24/linux-2.6.24.x86_64/drivers/firewire 6) make -C /usr/src/kernels/2.6.24.3-34.fc8-x86_64/ M=`pwd` modules 7) cp *.ko /lib/modules/2.6.24.3-34.fc8/kernel/drivers/firewire/ Again no luck. Telinit 3, then: Linux version 2.6.24.3-38.fc8 (mockbuild.redhat.com) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Fri Mar 14 19:26:21 EDT 2008 ip_tables: (C) 2000-2006 Netfilter Core Team nf_conntrack version 0.5.0 (16384 buckets, 65536 max) firewire_core: phy config: card 0, new root=ffc1, gap_count=5 firewire_core: giving up on config ROM for node id ffc0 (returned 17) This time the two lines are inverted, the latter show up only after trying the mount. I used gnome-gmount, which used to work. fdisk -l won't see the new disk. I would like to stress again that FC6 worked flawlessly on the same hardware. Trouble began with the new firewire stack in the kernel. Oh crap. There's a number of patches I thought I'd put into the F8 kernel build, including the one that I thought would help your system, which are actually NOT included at the moment. D'oh. I'll get that fixed shortly and let you know when an updated kernel is available. Also, for the record, what sort of controller is this with? Output of "lspci | grep Fire" should be sufficient to tell. We have some issues with a few controllers still, but I still suspect yours are fixed by the patch that isn't actually in the F8 kernels that I mistakenly thought was... :( Here is the lspci -v output. The mb is an abit IP-35 pro, the firewire is the embedded port. 04:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) (prog-if 10 [OHCI]) Subsystem: ABIT Computer Corp. Unknown device 1083 Flags: bus master, medium devsel, latency 68, IRQ 21 Memory at fddfd000 (32-bit, non-prefetchable) [size=2K] Memory at fddf8000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 2 Kernel driver in use: firewire_ohci Kernel modules: firewire-ohci That controller should definitely be functional. I've got a Fedora 8 kernel building right now that carries the patch I'd meant for ya to test with. http://koji.fedoraproject.org/koji/taskinfo?taskID=520390 Finally kernel-2.6.24.3-40 did it: firewire_core: phy config: card 0, new root=ffc1, gap_count=5 firewire_core: created device fw2: GUID 0050770e00000003, S400, 3 config ROM retries scsi12 : SBP-2 IEEE-1394 firewire_sbp2: fw2.0: logged in to LUN 0000 (0 retries) scsi 12:0:0:0: Direct-Access-RBC WDC WD30 00JB-00KFA0 PQ: 0 ANSI: 4 sd 12:0:0:0: [sdh] 586072368 512-byte hardware sectors (300069 MB) sd 12:0:0:0: [sdh] Write Protect is off sd 12:0:0:0: [sdh] Mode Sense: 11 00 00 00 sd 12:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 12:0:0:0: [sdh] 586072368 512-byte hardware sectors (300069 MB) sd 12:0:0:0: [sdh] Write Protect is off sd 12:0:0:0: [sdh] Mode Sense: 11 00 00 00 sd 12:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdh: sdh1 sd 12:0:0:0: [sdh] Attached SCSI disk sd 12:0:0:0: Attached scsi generic sg9 type 14 fdisk -l saw the disk. I mounted it, made a couple of ls -lR in order to see if it detaches, then performed an rsync backup. I think it's a reasonable test for the interface. What was the cause of all those problems? A big thanks to all the developers involved in the fix! Excellent, glad to hear we finally got it workin'. :) There was a problem with x86_64 systems with memory mapped over the 4GB mark, which is common even with machines having only 4GB of RAM, as there's a hole somewhere in the 3 to 4GB range that is used for PCI/PCIe devices. To get at the remaining memory, the BIOS remaps memory over the 4GB mark (if you're fortunate -- I have one nForce 4 system w/4GB of physical memory installed that can only use 3.2GB or so, since the BIOS doesn't remap memory). Well, we get into a situation where we're trying to use this memory for our firewire DMA buffers, and it simply doesn't work reliably for DMA for assorted technical reasons. So now we've made changes to ensure we're not using that memory, we're only using memory we know works reliably for DMA. The specific upstream commit that provides this fix is here: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bde1709aaa98f5004ab1580842c422be18eb4bc3 Thanks for the explanation and the solution. |