Description of problem: When using the new firewire stack, at boot time I see a message: "firewire_core: giving up on config rom for node ..." No firewire controller appears in the system and when I plug in my external firewire drive they aren't recognised (dmesg shows absolutely nothing and no device node is created) Version-Release number of selected component (if applicable): kernel 2.6.22.4-65.fc7 Additional info: I installed the "old" firewire kernel module provided by ATRPMS for my kernel. After blacklisting the new modules and rebooting no error message is shown and the drives work fine. The MB is a ASUS P5GD2 Premium, with a firewire onboard controller by Texax Instruments (hardware browser identifies it as "TI TSB82AA2-1394B link layer controller", using driver ohci1394). I attach below the output of dmesg when the drives are plugged in.
Created attachment 183541 [details] dmesg output when drives are recognised
Still seeing this problem with latest kernel. This is on a Gigabyte GA-P35-DQ6 motherboard with on-board Firewire to two external drives (a hard drive and DVD burner). Output from `uname -a`: Linux strauss 2.6.22.5-76.fc7 #1 SMP Thu Aug 30 13:08:59 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Output from `dmesg | grep firewire` showing connect/reconnect attempts: firewire_ohci: Added fw-ohci device 0000:05:00.0, OHCI version 1.10 firewire_ohci: Added fw-ohci device 0000:05:06.0, OHCI version 1.10 firewire_core: created new fw device fw0 (0 config rom retries) firewire_core: created new fw device fw1 (0 config rom retries) firewire_core: giving up on config rom for node id ffc0 firewire_core: giving up on config rom for node id ffc1 firewire_core: phy config: card 1, new root=ffc0, gap_count=5 firewire_core: giving up on config rom for node id ffc2 firewire_core: phy config: card 0, new root=ffc0, gap_count=63 firewire_core: giving up on config rom for node id ffc1 firewire_core: giving up on config rom for node id ffc2 firewire_core: phy config: card 0, new root=ffc0, gap_count=63 firewire_core: phy config: card 1, new root=ffc2, gap_count=7 firewire_core: giving up on config rom for node id ffc1 firewire_core: giving up on config rom for node id ffc0 My smolt page is http://smolt.fedoraproject.org/show?UUID=c413b36f-7ba0-405c-ad84-98d4ae3bfb52
Can anyone test with the Fedora8 test2 live CD? This will tell us if the problem is fixed in kernel 2.6.23.
Another option is to throw a kernel from Rawhide on top of FC-7 (with rpm --force if needed: should be ok for testing purposes).
Created attachment 201251 [details] Full dmesg output from Fedora8 test2
Verified problem exists under Fedora8 test2. uname -a reports: Linux localhost.localdomain 2.6.23-0.164.rc5.fc8 #1 SMP Tue Sep 4 18:24:12 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Please see prior attachment for full dmesg output.
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. I'm re-assigning to the firewire maintainers who may wish to review it and add comments. I have also elevated the priority and severity as this could potentially prevent a successful install of F8 if using external firewire drives. Cheers Chris
Re comment #2 and comment #5: There appear two PCI/OHCI-1394 devices in the log. Is this correct, i.e. are there two controllers in your machine? Re opening comment: TSB82AA2 is supported in principle. I'm successfully using a PCIe card with this chip and the new driver stack (with a different distro and various kernel.org prerelease kernels though). Right now I have no idea what could cause the local ROM reads to fail. What if you "modprobe -r firewire-ohci && modprobe firewire-ohci" later after boot?
Hi Chris, I have your request for add'l info. I'm currently traveling so it will likely take a day or so until I can get the info requested. Regards, Ed
Hi Chris, There are indeed two controllers on my machine. One is the motherboard's (Gigabyte GA-P35-DQ6 mobo) on-board controller and the other is a PCI card supporting firewire 800. I ran the commands you requested. They commands locked up my USB keyboard for several minutes, although control finally returned. Oddly, my USB mouse was not affected. I am attaching dmesg output from the modprobe command forward. After running the command, it looks like I can mount the external hard drive and view contents, although all accesses (ls on a directory, fdisk -l on the partition table) are extremely slow -- usually over a minute for initial access. In addition, after moving down a few levels into the mounted directory, the directory appears to be corrupted (please see attachment for that as well). Regards, Ed Lally
Created attachment 218561 [details] dmesg output from removing/reloading firewire kernel modules
Created attachment 218571 [details] Output from ls commands showing corrupted directory listings
Ed, you are bitten by a whole swarm of bugs. Where do I begin? 1.) firewire-core unable to access the cards if the modules are loaded early in the boot sequence I don't know why this is and what difference it makes to reload firewire-ohci later. I am using Gentoo Linux and they load firewire-ohci (or/and ohci1394, depending on how I configured the kernel) in one of the init scripts based on module aliases matching the PCI IDs or whatever. Works for me with 1, 2, or 3 cards present (onboard 1394a, PCIe 1394b, CardBus 1394a). So that's still a mystery to me. 2.) firewire-sbp2 blocking keyboard input when trying to add an SBP-2 device This is fixed in -mm kernels by a patch pending for inclusion into mainline 2.6.24-rc1, "firewire: fw-sbp2: use an own workqueue (fix system responsiveness)". http://marc.info/?l=linux1394-devel&m=118691816130507 http://me.in-berlin.de/~s5r6/linux1394/updates/2.6.22.y/patches/549-firewire-fw-sbp2-use-an-own-workqueue-fix-system-responsiveness.patch Besides, not only keyboard input but many other kernel functions, even in firewire-core, are negatively affected by firewire-sbp2's usage of the shared workqueue. 3.) "status write for unknown orb" errors This is fixed in mainline Linux 2.6.23-rc4 by patch "firewire: Add ref-counting for sbp2 orbs (fix command abortion)" http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e57d2011a6276d55a87f26653a0395f302ce0d51 These errors probably cause the corruption you saw with ls, the "FAT: Filesystem panic", and perhaps several other errors in your dmesg log. 4.) "scsi scan: 96 byte inquiry failed" Maybe this too was caused by the "unknown orb" error, or maybe it is a firmware bug. In the latter case, the driver can be instructed to use a different flavor of inquiry: "modprobe firewire-sbp2 workarounds=2" before firewire-sbp2 is auto-loaded or simply after a "modprobe -r firewire-sbp2". The workarounds parameter is AFAIK not available in the kernel you are using. It is available in -mm kernels by a patch scheduled for 2.4.24-rc1: "firewire: fw-sbp2: expose module parameter for workarounds" http://marc.info/?l=linux1394-devel&m=118691807906588 http://me.in-berlin.de/~s5r6/linux1394/updates/2.6.22.y/patches/548a-firewire-fw-sbp2-expose-module-parameter-for-workarounds.patch If it is a firmware bug, i.e. only the workarounds parameter suppresses it, then we should add a respective entry to firewire-sbp2's built-in device blacklist, based on the dmesg output you would get with the parameter activated. Jay, it may be appropriate to pull all the firewire-sbp2 fixes which went into mainline during the 2.6.23-rc phase as well as those scheduled for 2.6.24-rc1 into the FC8 kernel, as far as they aren't already in there. They may even be OK for an FC7 kernel update, as long as you are still releasing those. I didn't push those 2.6.24-rc1 fixes to Linus already before 2.6.23 because their line count and age seemed inappropriate to me for the late 2.6.23-rc phase. Maintaining the old ieee1394 drivers made me somewhat cautious about the speed of mainline inclusion of bug fixes. Have a look at http://me.in-berlin.de/~s5r6/linux1394/updates/ for series of pending patches.
PS, about "scsi scan: 96 byte inquiry failed": If this error does not occur with the old drivers, then it is not a firmware bug but a mere I/O error, i.e. fallout from "status write for unknown orb".
Hi Chris and Stefan, My apologies for the delayed reply to your notes. Thanks for taking the time to help identify my "swarm" -- I believe the metaphor "stung to death by gnats" may be apropos here ;-) At this point they are an inconvenience but not a showstopper. I am content to hold until the 2.6.24 kernels make it into FC7 or FC8 (as appropriate). Thanks again for all your help. Best regards, Ed
> At this point they are an inconvenience but not a showstopper. > I am content to hold until the 2.6.24 kernels make it into FC7 > or FC8 (as appropriate). Well, as mentioned, I recommend to the Fedora kernel package managers that they pull all of the 2.6.24-rc1 changes to the firewire drivers (except 2.6.24 specific kernel API changes of course) over into the 2.6.23 based Fedora kernels. I would have sent almost all of those changes to Linus before his 2.6.23 release if I had anticipated how long the 2.6.23-rc phase would stretch.
The issue persist with F8, kernel 2.6.23.1-49.fc8; I can access my firewire HD only with the modules from ATRPMS.
(In reply to comment #17) > The issue persist with F8, kernel 2.6.23.1-49.fc8; I can access my firewire HD > only with the modules from ATRPMS. 2.6.24 is almost upon us and as Stephan has indicated contains a raft of updates. Please could you test with this when it arrives (or even with a rawhide kernel if you are able) and report back. Regards Chris
Chris, I'm now on Fedora 8 with kernel 2.6.23.9-85.fc8. I will test with 2.6.24 as soon as I can get it. Cheers, Ed
The latest koji F8 kernel is also worth trying, and carries the same firewire updates as rawhide kernels. As of right now, that would be: http://koji.fedoraproject.org/packages/kernel/2.6.23.14/111.fc8/
I've same types of problems with Fedora 8 ("DVD" release, i386): Nothing at boot time, but here are "dmesg" logs when pluging in my camera: #When pluging in the camera on my Pinnacle DV500+ (dmesg): [fedora@fedora ~]$ dmesg [...] firewire_ohci: node ID not valid, new bus reset in progress firewire_ohci: node ID not valid, new bus reset in progress firewire_core: created new fw device fw2 (0 config rom retries, S100) firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 1, new root=ffc0, gap_count=5 firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_core: giving up on config rom for node id ffc1 #When pluging in the camera on my Asus Nvidia Nforce2 board (dmesg): [fedora@fedora ~]$ dmesg [...] firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 #... and so on, ending with a system freeze! Now, various tests with both dvgrab and a GStreamer setting: #First test, with dvgrab: [root@fedora fedora]# dvgrab -i Found AV/C device with GUID 0x00008500008cec1e ioctl call failed, retval = -1 ieee1394io.cc:460: In function "virtual bool iec61883Reader::StartReceive()": "iec61883_dv_fb_start( m_iec61883.dv, channel )" evaluated to -1 "Loading Medium" ff:ff:ff:ff "" sec #and dvgrab can't capture anything nor quit (I have to close the console...), #It even freezes all the system when I unplug the camera (I have to do a RESET!). #I also tested with GStreamer (after a reset, with same dmesgs): [root@fedora fedora]# gst-launch dv1394src ! decodebin name=d ! queue ! audioconvert ! audioresample ! alsasink d. ! ffmpegcolorspace ! xvimagesink Setting pipeline to PAUSED ... ioctl call failed, retval = -1 ERROR: Pipeline doesn't want to pause. ERROR: from element /pipeline0/dv1394src0: Could not read from resource. Additional debug info: gstdv1394src.c(866): gst_dv1394src_start (): /pipeline0/dv1394src0: can't start 1394 iso receive Setting pipeline to NULL ... FREEING pipeline ... #No exit problems here... I will test with the new kernel... Bastien.
Hi Bastien (and others!), The 'giving up on config rom' problem should be fixed in rawhide now. I still need to backport the fixes to F8 and F7 though. Not sure what the deal is with the nforce2 system, but it would definitely be worth testing the latest Fedora 8 kernel (or even better, a rawhide kernel), as well as updating misc userspace bits (particularly dvgrab and libraw1394). We've done a lot of work on this front just recently to greatly improve the situation, need to know if we've still got more work to do here or if we've already fixed your problem...
nForce2: bug 244576
Ah yes, I thought I'd seen that before... :)
I've updated to a newer kernel: Linux strauss 2.6.23.14-107.fc8 #1 SMP Mon Jan 14 22:07:11 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Three of the errors in comment #13 have been resolved -- "firewire-sbp2 blocking keyboard input when trying to add an SBP-2 device", "status write for unknown orb", and "scsi scan: 96 byte inquiry failed". Per Jarod's suggestion, I tried the koji F8 kernel, but realized that I still got the problem resolved with the earlier kernel from fedora-updates and went back. The only issue remaining is the first one -- "firewire-core unable to access the cards if the modules are loaded early in the boot sequence", which impacts an external hard drive and external CD-RW drive. I put the command "modprobe -r firewire-ohci && modprobe firewire-ohci" in /etc/rc.local to no effect. However, if I run the same command after logging in to GNOME, it works just fine -- the drives are recognized and mounted under /media. I am attaching outputs from lsmod before and after reloading the firewire modules. I am also attaching dmesg output from startup through accessing the drives. Also, if it helps, my smolt page is at http://www.smolts.org/show?UUID=c413b36f-7ba0-405c-ad84-98d4ae3bfb52 Please let me know if there's anything else I can try. Thanks! - Ed
Created attachment 293681 [details] newer dmesg output from removing/reloading firewire kernel modules
Created attachment 293682 [details] lsmod showing loaded modules before/after reloading firewire_ohci
Take back my earlier report... I did some load testing by rsync'ing a directory from another computer and ran into a bunch of buffer IO errors within a few seconds. I've attached dmesg output. I'll try moving back up to the latest koji kernel to see if that fixes the problem.
Created attachment 293720 [details] Buffer IO errors under load
I'm having problems even with koji kernel "Linux strauss 2.6.23.14-123.fc8 #1 SMP Fri Jan 25 19:54:41 EST 2008 x86_64 x86_64 x86_64 GNU/Linux". I'm testing the drive by rsyncing a directory from "bach" to the server "strauss" (the one that has the firewire drive) over the LAN. The rsync moves along just fine for a while, but then pauses for about 30 seconds with no apparent LAN or disk activity. I get I/O errors followed by the message "kernel: bad page state in process 'swapper'" appearing on the console. Sometime later, the computer with the drive will invariably crash (screen, keyboard, and network all go dead) and require a reboot. Also, the drive is still not recognized at boot -- I have to execute "modprobe -r firewire-ohci && modprobe firewire-ohci" to have them detected. Dmesg output is attached.
Created attachment 293810 [details] dmesg output with koji kernel
Hi Ed, From your dmesg output, it looks like the latest rawhide/devel kernel might get your disks working on boot, as you're hitting the 'giving up on config rom' problem, detailed in bug 429598. Please give that a spin and report back, and/or wait until I get the backports to the F8 kernel done...
Re attachment 293810 [details]: > Feb 2 19:25:20 strauss kernel: sd 15:0:0:0: [sde] Result: > hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK > Feb 2 19:25:20 strauss kernel: end_request: I/O error, dev sde, > sector 38971935 > Feb 2 19:25:20 strauss kernel: sd 15:0:0:0: rejecting I/O to offline device > Feb 2 19:25:20 strauss kernel: sd 15:0:0:0: [sde] Result: > hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK DID_BUS_BUSY typically happens when a bus reset occurs. DID_NO_CONNECT happens when the device was unplugged. Well, you apparently did not unplug it, but there might have been noise on the bus which inspired the controller to send a "self ID complete" event to the drivers, without self ID of the disk --- or with firewire-core misinterpreting the self ID buffer. I saw something similar infrequently happen on my test setup: When I plugged something in to a bus with already a few nodes present, firewire-core misinterpreted this as an existing device going away, rather than a new one joining the bunch.
Hi folks, I loaded up the latest rawhide kernel and the drives were detected on boot -- woohoo! Unfortunately the other problems with I/O buffers, etc., are still there. Regarding Stefan's suggestion, both the drives in question (external CD burner and external HD) are on two separate firewire buses, and each is the only device on its bus. The HD is attached to the motherboard's bus; the burner is attached to a TI firewire 800 PCI card. Please let me know if there's anything else I can try to work around or troubleshoot this. Cheers, Ed
Ed, exactly what kernel version was that with? I suspect some additional patches we have queued up for rawhide, which haven't yet been in a build due to some issues with gcc 4.3, might further help your situation.
Jarod -- it's 2.6.24-17.fc9. Architecture is x86_64.
Patches "firewire: fw-sbp2: fix I/O errors during reconnect" and "firewire: fw-sbp2: preemptively block sdev" may be beneficial to Ed's setup. I suspect the ultimate problem is electrically unstable hardware here, but the patches should make things smoother even for unreliable hardware. The issue described in http://marc.info/?l=linux1394-devel&m=120237058319592 needs to be addressed eventually as well. It is hopefully not of immediate importance to Ed's setup though.
Reference for comment #37: http://lkml.org/lkml/2008/2/3/195
(In reply to comment #36) > Jarod -- it's 2.6.24-17.fc9. Architecture is x86_64. -23 is the latest.
Hi every body! I tried with Kernel 2.6.24-7.fc9: #at start-up (dmesg): firewire_ohci: Added fw-ohci device 0000:00:0d.0, OHCI version 1.10- firewire_ohci: Added fw-ohci device 0000:02:0c.0, OHCI version 1.0 firewire_core: created new fw device fw0 (0 config rom retries, S400) firewire_core: created new fw device fw1 (0 config rom retries, S400) #when pluging in the camera (motherboard, nForce2), I still have the #same problem (endless messages, and final freeze) #when pluging in the camera (DV500+, after a "reset", dmesg): firewire_ohci: node ID not valid, new bus reset in progress firewire_ohci: node ID not valid, new bus reset in progress firewire_core: created new fw device fw2 (0 config rom retries, S100) firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 1, new root=ffc0, gap_count=5 firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_core: giving up on config rom for node id ffc1 #and after running dvgrab: firewire_ohci: context_stop: still active (0x40000411) dvgrab[3128]: segfault at b0bd6008 eip 0069119e esp bfced110 error 4 #evrything seems to work here, but I have problems when exiting the #capture apps: #dvgrab: [root@fedora fedora]# dvgrab -i ./ttt.avi Found AV/C device with GUID 0x00008500008cec1e Going interactive. Press '?' for help. "stdout": buffer underrun near: timecode 00:4875865:-1993832910.00 date ????.??.?? ??:??:?? This error means that the frames could not be written fast enough. q=quit, p=play, c=capture, Esc=stop, h=reverse, j=backward scan, k=pause l=forward scan, a=rewind, z=fast forward, 0-9=trickplay, <space>=play/pause Capture Started" ff:ff:ff:ff "" sec "./ttt001.avi": 39.40 MiB 277 frames timecode 00:250000000:-1076964892.03 date 2008.02.03 18:15:08 Capture Stopped Warning: 1 dropped frames. Erreur de segmentation:ff:ff "" sec #exept the "segfault" when living, everything seems okay! #gstreamer: [root@fedora fedora]# gst-launch dv1394src ! decodebin name=d ! queue ! audioconvert ! audioresample ! alsasink d. ! ffmpegcolorspace ! xvimagesink Setting pipeline to PAUSED ... Pipeline is live and does not need PREROLL ... Setting pipeline to PLAYING ... New clock: GstSystemClock Caught interrupt -- handling interrupt. Interrupt: Setting pipeline to PAUSED ... Execution ended after 33318467000 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Caught SIGSEGV accessing address 0xb6886004 #0 0x00110402 in _start () from /lib/ld-linux.so.2 #1 0x0065647b in ?? () #2 0x009c1218 in ?? () #3 0x00000000 in ?? () Spinning. Please run 'gdb gst-launch 3145' to continue debugging, Ctrl-C to quit, or Ctrl-\ to dump core. #Here everything work well too, until the exit (and another "segfault"!) #I can't quit by closing the video window: I have to do 'ctrl-C' in the terminal #I can't do a dump (french keyboard: '\' with 'altgr', doesn't work...) #The video window doesn't even close until I close the terminal! #Running gdb: [root@fedora sdb]# gdb gst-launch 3318 GNU gdb Red Hat Linux (6.6-35.fc8rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... warning: Missing the separate debug info file: /usr/lib/debug/.build-id/ec/a38595da00301898debe867d96a6c3b13a0201.debug (no debugging symbols found) Using host libthread_db library "/lib/libthread_db.so.1". Attaching to program: /usr/bin/gst-launch, process 3318 warning: Missing the separate debug info file: /usr/lib/debug/.build-id/ac/2eeb206486bb7315d6ac4cd64de0cb50838ff6.debug (no debugging symbols found) (no debugging symbols found) 0x00110402 in _start () from /lib/ld-linux.so.2 (gdb) bt #0 0x00110402 in _start () from /lib/ld-linux.so.2 #1 0x00655d26 in ?? () #2 0x00000000 in ?? () #With no "debug" version, nothing new... even if the back trace isn't #exactly the same ?!? #But killing gst-launch here deletes the video window (has expected...) I hope this might help... and I'll continue to test with newest versions when I can...
So we have a few different bugs that have ended up in here... Here's what I'd like to do: 1) the original bug, reported by Roberto "giving up on config rom" should be resolved -- this was actually tracked in bug 429598. Roberto, please confirm if you would though. Ed hit this too, initially, but has confirmed it to be resolved for him. Given that I believe the original problem is fixed, I'm goinge to close this bug. 2) Ed's additional issues listed in comment #13, all of which have been resolved, save the I/O buffer problems. I'd like to open a new bug for this issue, if its still a problem with the latest rawhide kernel. 3) Bastien's non-working nForce 2 controller is already being tracked separately in bug 244576. 4) Bastien's segfault-on-exit of dvgrab seems to be similar to bug 243081, would like to track it over there.
I'll try the new kernel as soon as possible and report back if it won't work. Thanks all for your interest in addressing this issue.