Bug 429598

Summary: [firewire] unable to use disk (giving up on config rom)
Product: [Fedora] Fedora Reporter: Nicola <alf.tanner>
Component: kernelAssignee: Jarod Wilson <jarod>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 8CC: kernel-maint, stefan-r-rhbz, whiteg
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-25 20:13:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nicola 2008-01-21 21:14:31 UTC
Description of problem:
External firewire disk won't mount anymore

Version-Release number of selected component (if applicable):
F8
uname -a
Linux sputnik.theory.org 2.6.23.9-85.fc8 #1 SMP Fri Dec 7 15:49:36 EST 2007
x86_64 x86_64 x86_64 GNU/Linux

upgraded to latest packets as of 21 jan 2008

How reproducible:
Always

Steps to Reproduce:
1. Plug the external disk into firewire
2. Switch it on
3. Won't mount anymore
  
Actual results:
External firewire disk won't mount anymore. FC6 used to work fine, same hardware.

dmesg sample:

Dec  3 03:03:02 sputnik kernel: ieee1394: Error parsing configrom for node 0-00:1023
Dec  3 03:03:10 sputnik kernel: scsi12 : SBP-2 IEEE-1394
Dec  3 03:03:11 sputnik kernel: ieee1394: sbp2: Logged into SBP-2 device
Dec  3 03:03:11 sputnik kernel: ieee1394: sbp2: Node 0-00:1023: Max speed [S400]
- Max payload [2048]

Expected results:
Should be seen and  mounted as used to in FC6.

Additional info:
The same unit mounted on usb works (it's a dual firewire/usb external disk)

Comment 1 Jarod Wilson 2008-01-23 05:11:30 UTC
Those dmesg bits aren't valid for the Fedora 8 firewire stack. The ieee1394
stack isn't built or shipped, we ship the stack that prints 'firewire' instead,
and fw-sbp2 instead of just sbp2. Please get the appropriate firewire stack
running and we can certainly dig into this (might even already be fixed in the
latest koji kernels).

Comment 2 Jarod Wilson 2008-01-23 05:13:03 UTC
Hrm, in re-reading, perhaps that was supposed to be an example of the working
case. To have a chance of getting things fixed, what I need is the non-working
case, as described in comment #1.

Comment 3 Nicola 2008-01-23 13:40:25 UTC
Hi Jarod. No, it can't be an example of a working case since it's F8 as I
reported. With F8 firewire interface doesn't work.
You asked me to use the appropriate firewire stack. How do I do that in F8 in
order to report problems?

Comment 4 Nicola 2008-01-23 14:37:15 UTC
Oops! Sorry, you're right, I've picked up the old messages file!
Here we are:


Jan 23 15:33:23 sputnik kernel: firewire_core: phy config: card 0, new
root=ffc1, gap_count=5
Jan 23 15:33:30 sputnik kernel: firewire_core: phy config: card 0, new
root=ffc1, gap_count=5
Jan 23 15:33:54 sputnik kernel: firewire_core: giving up on config rom for node
id ffc0

and that's all. fdisk -l cant' see any device, like automounter and gnome-mount.

Comment 5 Jarod Wilson 2008-01-23 14:48:39 UTC
Yep, that's the bits I need. Basically, we're failing to read the configuration
rom on the drive, which means the firewire stack doesn't have a clue what sort
of capabilities the device has -- doesn't know if its a drive, a camera, or what
-- so we never set up the sbp2 layer for storage.

As it happens, I'm actually working on this very problem right now with the
upstream firewire maintainer, Stefan Richter (cc'ing). I can reproduce this on
my laptop, and really want to use some firewire drives with it, so you can be
sure I plan to get it working reliably... ;)

Comment 6 Nicola 2008-01-23 15:26:58 UTC
Excellent, thanks a lot.

So is this a kernel fault only or udev is involved as well?

Comment 7 Stefan Richter 2008-01-23 15:36:49 UTC
It's purely a fault in the kernel drivers.  Once we got them fixed, udev and
friends will do the right thing.

Comment 8 Jarod Wilson 2008-01-23 21:31:04 UTC
I've got a patch based on some of Stefan's work and some review comments that is
working quite well now on multiple system and drive combinations that were
previously hitting the 'giving up on config rom' problem, which I've added to
rawhide, and after a touch more testing, will get it into F8 and F7.

Comment 9 Jarod Wilson 2008-02-06 00:04:51 UTC
In 2.6.23.14-130.fc8 (and later), 2.6.23.14-74.fc7 (and later) and current
rawhide kernels. F8 and F7 builds not yet actually underway, but should happen RSN.

Comment 10 Jarod Wilson 2008-02-25 20:13:36 UTC
These fixes are available in the latest rawhide, f8 and f7 kernels, closing bug.

Comment 11 George N. White III 2008-03-01 15:18:19 UTC
What is "latest"?  I installed 2.6.24.3-12.fc8 from "testing', but the 
problem is unchanged.

Under FC6 I was using firewire with an early ipod, but it has never worked with
F7 or F8.  It just goes to the "OK to disconnect" screen. There have been
postings that ALi firewire needed DMA disabled in the old spb2, but I don't see
the option with firewire_sbp2 -- does this
card need one of the other workarounds?

# lspci -v -s 00:07.4
00:07.4 FireWire (IEEE 1394): ALi Corporation M5253 P1394 OHCI 1.1 Controller
(prog-if 10 [OHCI])
	Subsystem: ALi Corporation M5253 P1394 OHCI 1.1 Controller
	Flags: 66MHz, medium devsel, IRQ 16
	Memory at fe124000 (32-bit, non-prefetchable) [size=2K]
	Expansion ROM at fe000000 [disabled] [size=64K]
	Capabilities: [80] Power Management version 2
	Kernel modules: firewire-ohci

$ uname -a
Linux cerberus.cwmannwn.nowhere 2.6.24.3-12.fc8 #1 SMP Tue Feb 26 14:58:29 EST
2008 i686 i686 i386 GNU/Linux
$ dmesg | grep firewire
firewire_ohci: Added fw-ohci device 0000:00:07.4, OHCI version 1.10
firewire_core: created device fw0: GUID 0090e639000005df, S400
firewire_core: phy config: card 0, new root=ffc1, gap_count=5
firewire_core: giving up on config rom for node id ffc0



Comment 12 Stefan Richter 2008-03-01 16:48:02 UTC
> There have been postings that ALi firewire needed DMA disabled
> in the old spb2

This is a compile-time option of the sbp2 driver and it would only do something
if also a load-time option of ohci1394 was re-configured.  I suppose that these
postings refer to a bug which was fixed in Linux 2.6.16.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=35bdddb83f62978b5fad82a14fbfd78cc3a5a60c

This bug is not present in firewire-{ohci,core,sbp2}.  Also, that bug was about
the drivers not seeing requests from the device, whereas your problem is about
requests from the drivers to the device.  These requests fail or don't return
what the drivers want.  We need to put some diagnostics into the drivers to
better understand the failure mode.

When you used the iPod with FC6, did it become operational as a storage device
sooner or later than 30 seconds after plugin?

Comment 13 Stefan Richter 2008-03-01 16:52:46 UTC
Hmm, comment #11 says

> What is "latest"?  I installed 2.6.24.3-12.fc8 from "testing', but the
> problem is unchanged.

and comment #9 says

> In 2.6.23.14-130.fc8 (and later), 2.6.23.14-74.fc7 (and later) and
> current rawhide kernels

about a driver update mentioned in comment #8.  Could you try to get your hands
on this or any later kernel package?

Comment 14 Stefan Richter 2008-03-01 16:59:11 UTC
On the other hand, the 2.6.24.3-12.fc8 build date is one month after the patch
was posted, so I would expect it to be in there.  (It appeared in 2.6.25-rc1 in
mainline though.)

Comment 15 George N. White III 2008-03-01 18:51:56 UTC
Comment #11 asks:

> When you used the iPod with FC6, did it become operational as a 
> storage device sooner or later than 30 seconds after plugin?

Hard to be sure, but I'd say a bit less than 30s -- I could plug 
it in and see the little apple, and see it come online in 
/var/log/messages or dmesg, then use the info to type some 
'mount -t hfsplus ...' without feeling that I was waiting 
for things to happen.  It now takes 10s to get past the litle 
apple to the "OK to disconnect" screen which seems like about
the time it used to come alive.

Comment #14 -- I expected the patches to be there too, but 
the Murphy of the law is a busy guy.  I have the sources,
and there are two files of patches:

  -rw-r--r-- 1 gwhite bod     40388 2008-02-15 19:58
linux-2.6-firewire-git-pending.patch
  -rw-r--r-- 1 gwhite bod     39433 2008-02-15 19:58
linux-2.6-firewire-git-update.patch

The git-pending-patch includes the "delay inquiry = 0x10" option to
the workarounds parameter.  I've tried that without success, if that
is behind your query about the length of time it takes for the iPod 
to become mountable.




Comment 16 Jarod Wilson 2008-03-01 19:01:37 UTC
2.6.24.3-12.fc8 definitely has the config rom read fixups in it, I just
triple-checked. FWIW, my own FireWire iPod works just fine w/the new stack, as
do a few different models krh has. There are still some controller and device
combinations I run into from time to time that simply don't play nice with the
new stack yet though. :\

Comment 17 Stefan Richter 2008-03-01 19:20:29 UTC
George:
> The git-pending-patch includes the "delay inquiry = 0x10" option to
> the workarounds parameter.

OK, this is indeed very recent and hence includes the "giving up on config rom"
patch (as Jarod confirmed).

BTW, switching the few firewire-sbp2 module options won't work for you.  The
firewire-core doesn't come far enough to hand over control to firewire-sbp2; the
problem happens earlier.

Comment 18 Jarod Wilson 2008-03-01 19:46:07 UTC
One thing that often works when I do occasionally hit 'giving up on config rom'
these days is to simply unload and reload the firewire-ohci module with the
device already plugged in.

Probably worth opening a new bug to track the cases where we still hit 'giving
up on config rom'...

Comment 19 George N. White III 2008-03-01 19:57:13 UTC
I tried a different external firewire device and it also has the problem. I also
tried reloading "firewire-ohci module with the device already plugged in"
(both devices) and no success.  I should get another cable just to be sure
that isn't the problem.

Comment 20 Stefan Richter 2008-03-03 01:58:20 UTC
Could http://lkml.org/lkml/2008/3/2/80 be related?  That one is only relevant
though if two or more config ROM reads happen simultaneously and are spread over
two or more workqueue threads (i.e. [events/*] threads).

Comment 21 Jarod Wilson 2008-03-05 03:36:22 UTC
If so, said patch is included in the latest f8 kernel build in koji:

http://koji.fedoraproject.org/packages/kernel/2.6.24.3/17.fc8/

Comment 22 Nicola 2008-03-17 13:10:57 UTC
Still no joy...

2.6.24.3-34.fc8 #1 SMP Wed Mar 12 16:51:49 EDT 2008 x86_64 x86_64 x86_64 
GNU/Linux

dmesg:
virbr0: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature.
virbr0: starting userspace STP failed, starting kernel STP
ip_tables: (C) 2000-2006 Netfilter Core Team
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
firewire_core: giving up on config rom for node id ffc1
firewire_core: phy config: card 0, new root=ffc0, gap_count=5


Comment 23 Jarod Wilson 2008-03-17 13:44:32 UTC
Nicola, how much memory does your system have? This *could* actually be the
coherent dma problem that was biting my laptop, which is x86_64 w/4GB of RAM. If
the system has memory mapped over the 4GB boundary, we'll sometimes try to use
that for dma buffers, which causes problems (on my laptop, regular config rom
read failures like you're seeing). This particular fix went into
2.6.24.3-37.fc8, and there's a subsequent 2.6.24.3-38.fc8. Give one of those a
spin, please!

http://koji.fedoraproject.org/packages/kernel/2.6.24.3/38.fc8/

Comment 24 Nicola 2008-03-17 13:55:35 UTC
Actually my x86_64 has 4G:

free
             total       used       free     shared    buffers     cached
Mem:       4061872    2058908    2002964          0     105800     862696
-/+ buffers/cache:    1090412    2971460
Swap:      4192880          0    4192880

Experimenting with another kernel is however another issue, as I'm using nvidia
server, thus I'm supposed to recompile kmod and server. That is a no go for a
working machine.

Could I instead reboot and tell grub I have, say, 2G of ram in order to try this
hypothesis?

Comment 25 Jarod Wilson 2008-03-17 14:11:09 UTC
Not sure if booting with mem=2G would help or not... However, if you were to
simply boot into run-level 3 w/the new kernel and plug in the drive, we would
still at least see stuff logged about the disk and be able to verify whether or
not this is indeed the fix for your issue. I suspect now that it is.

Comment 26 Jarod Wilson 2008-03-17 14:18:45 UTC
Oh, its also possible to simply rebuild the firewire kernel modules from -38.fc8
on top of -34.fc8 and drop them in place of the provided -34.fc8 ones. Not
terribly complex to do either:

1) yum install kernel-devel
2) grab kernel-2.6.24.3-38.fc8.src.rpm
3) rpm -ivh kernel-2.6.24.3-38.fc8.src.rpm
4) rpmbuild -bp kernel.spec (will be in /usr/src/redhat/SPECS by default)
5) cd /usr/src/redhat/BUILD/kernel-2.6.24/linux-2.6.24.x86_64/drivers/firewire
6) make -C /usr/src/kernels/2.6.24.3-34.fc8-x86_64/ M=`pwd` modules
7) cp *.ko /lib/modules/2.6.24.3-34.fc8/kernel/drivers/firewire/


Comment 27 Nicola 2008-03-17 19:47:31 UTC
Again no luck. Telinit 3, then:

Linux version 2.6.24.3-38.fc8 (mockbuild.redhat.com) (gcc
version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Fri Mar 14 19:26:21 EDT 2008

ip_tables: (C) 2000-2006 Netfilter Core Team
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
firewire_core: phy config: card 0, new root=ffc1, gap_count=5
firewire_core: giving up on config ROM for node id ffc0 (returned 17)

This time the two lines are inverted, the latter show up only after
trying the mount. I used gnome-gmount, which used to work.

fdisk -l won't see the new disk.


I would like to stress again that FC6 worked flawlessly on the same
hardware. Trouble began with the new firewire stack in the kernel.

Comment 28 Jarod Wilson 2008-03-17 20:00:21 UTC
Oh crap. There's a number of patches I thought I'd put into the F8 kernel build,
including the one that I thought would help your system, which are actually NOT
included at the moment. D'oh. I'll get that fixed shortly and let you know when
an updated kernel is available.

Also, for the record, what sort of controller is this with? Output of "lspci |
grep Fire" should be sufficient to tell. We have some issues with a few
controllers still, but I still suspect yours are fixed by the patch that isn't
actually in the F8 kernels that I mistakenly thought was... :(



Comment 29 Nicola 2008-03-17 20:23:31 UTC
Here is the lspci -v output. 
The mb is an abit IP-35 pro, the firewire is the embedded port.

04:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000
Controller (PHY/Link) (prog-if 10 [OHCI])
        Subsystem: ABIT Computer Corp. Unknown device 1083
        Flags: bus master, medium devsel, latency 68, IRQ 21
        Memory at fddfd000 (32-bit, non-prefetchable) [size=2K]
        Memory at fddf8000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [44] Power Management version 2
        Kernel driver in use: firewire_ohci
        Kernel modules: firewire-ohci


Comment 30 Jarod Wilson 2008-03-18 03:45:05 UTC
That controller should definitely be functional. I've got a Fedora 8 kernel
building right now that carries the patch I'd meant for ya to test with.

http://koji.fedoraproject.org/koji/taskinfo?taskID=520390



Comment 31 Nicola 2008-03-18 15:31:55 UTC
Finally kernel-2.6.24.3-40 did it:

firewire_core: phy config: card 0, new root=ffc1, gap_count=5
firewire_core: created device fw2: GUID 0050770e00000003, S400, 3 config ROM retries
scsi12 : SBP-2 IEEE-1394
firewire_sbp2: fw2.0: logged in to LUN 0000 (0 retries)
scsi 12:0:0:0: Direct-Access-RBC WDC WD30 00JB-00KFA0           PQ: 0 ANSI: 4
sd 12:0:0:0: [sdh] 586072368 512-byte hardware sectors (300069 MB)
sd 12:0:0:0: [sdh] Write Protect is off
sd 12:0:0:0: [sdh] Mode Sense: 11 00 00 00
sd 12:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
sd 12:0:0:0: [sdh] 586072368 512-byte hardware sectors (300069 MB)
sd 12:0:0:0: [sdh] Write Protect is off
sd 12:0:0:0: [sdh] Mode Sense: 11 00 00 00
sd 12:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sdh: sdh1
sd 12:0:0:0: [sdh] Attached SCSI disk
sd 12:0:0:0: Attached scsi generic sg9 type 14

fdisk -l saw the disk. I mounted it, made a couple of ls -lR in order
to see if it detaches, then performed an rsync backup. I think
it's a reasonable test for the interface.

What was the cause of all those problems?

A big thanks to all the developers involved in the fix!

Comment 32 Jarod Wilson 2008-03-18 15:50:27 UTC
Excellent, glad to hear we finally got it workin'. :)

There was a problem with x86_64 systems with memory mapped over the 4GB mark,
which is common even with machines having only 4GB of RAM, as there's a hole
somewhere in the 3 to 4GB range that is used for PCI/PCIe devices. To get at the
remaining memory, the BIOS remaps memory over the 4GB mark (if you're fortunate
-- I have one nForce 4 system w/4GB of physical memory installed that can only
use 3.2GB or so, since the BIOS doesn't remap memory).

Well, we get into a situation where we're trying to use this memory for our
firewire DMA buffers, and it simply doesn't work reliably for DMA for assorted
technical reasons. So now we've made changes to ensure we're not using that
memory, we're only using memory we know works reliably for DMA. The specific
upstream commit that provides this fix is here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bde1709aaa98f5004ab1580842c422be18eb4bc3


Comment 33 Nicola 2008-03-18 18:08:42 UTC
Thanks for the explanation and the solution.