Description of problem: For the enjoyment of Jarod and Stefan... I know, I'm a pain in the neck... OK, the SBP2 device is always the same, the Datafab MD2-FW2. Accessing the /dev/sdX device causes read errors and device off-lining. Version-Release number of selected component (if applicable): kernel-2.6.24.3-17.fc8 How reproducible: Always... Steps to Reproduce: 1. Connect the SBP2 bay and check which device is assigned 2. Let's say the SBP is /dev/sdb, then type: dd if=/dev/sdb of=/dev/null bs=1024k count=1k or hdparm -tT /dev/sdb 3. Actual results: Read errors are returned Expected results: Well read data, return the disk performance... Additional info: A device reset (off-on cycle) restore the thing to work. Note that the device, formatted as NTFS and accessed using ntfs-3g (fuse), works fine for files... This could be the same issue as bug #429430 /var/log/messages says: ... Mar 10 22:24:19 lazy kernel: end_request: I/O error, dev sdb, sector 0 Mar 10 22:24:19 lazy kernel: printk: 62 messages suppressed. Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 0 Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 1 Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 2 Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 3 Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 4 Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 5 Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 6 Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 7 Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 8 Mar 10 22:24:19 lazy kernel: Buffer I/O error on device sdb, logical block 9 Mar 10 22:24:19 lazy kernel: end_request: I/O error, dev sdb, sector 0 ...
Would # echo 1 > /sys/module/firewire_sbp2/parameters/workarounds # <plug device in> improve it? This reduces the maximum data size per SCSI request and is said to be necessary for some bridges from Symbios (bought by LSI). The Datafab enclosures contain a Symbios bridge AFAIK.
(In reply to comment #1) > Would > # echo 1 > /sys/module/firewire_sbp2/parameters/workarounds > # <plug device in> > improve it? This reduces the maximum data size per SCSI request and is said to Uhm, no, same as before. > be necessary for some bridges from Symbios (bought by LSI). The Datafab > enclosures contain a Symbios bridge AFAIK. The chipset has LSI on it, so it could be Symbios. OK, I tried to enable the SCSI debug with: echo 9216 > /sys/module/scsi_mod/parameters/scsi_logging_level Apart of flooding the logfile, in this condition the device seems to works, with or without workaround, and it continues to work after disabling the SCSI debug logging. BTW, it seems there are only "Read(10)" in the logfile. If I cycle off-on, then the problem (on the next test) reappears. Does this help? pg
This smells like a duplicate of bug 434830, which should be resolved by patches added to rawhide and f8 kernel 2.6.24.3-23.fc8, which just finished building in koji earlier today. Give that a spin, if you would... http://koji.fedoraproject.org/packages/kernel/2.6.24.3/23.fc8/
(In reply to comment #3) > This smells like a duplicate of bug 434830, which should be resolved by patches > added to rawhide and f8 kernel 2.6.24.3-23.fc8, which just finished building in > koji earlier today. Give that a spin, if you would... > > http://koji.fedoraproject.org/packages/kernel/2.6.24.3/23.fc8/ Uhm, I installed the -28, which should have the same patches (I need also WiFi updates). A first test with "hdparm" and "dd" had no visible improvements. One thing I forgot to mention is that also removing and reloading the firewire-sbp.ko module brings the device back to life. Other ideas? pg
(In reply to comment #1) > Would > # echo 1 > /sys/module/firewire_sbp2/parameters/workarounds > # <plug device in> > improve it? This reduces the maximum data size per SCSI request and is said to > be necessary for some bridges from Symbios (bought by LSI). The Datafab > enclosures contain a Symbios bridge AFAIK. OK, I repeated the suggest operation, starting from device off (I was reading too fast and I miss the <plug device in> concept, sorry). A first test, with the new kernel, seems to be successful, "hdparm" and "dd" were working fine, without causing errors. So it seems this is the real issue. Should we close this one or wait a little bit? Thanks a lot! pg
(In reply to comment #1) > Would > # echo 1 > /sys/module/firewire_sbp2/parameters/workarounds > # <plug device in> > improve it? This reduces the maximum data size per SCSI request and is said to > be necessary for some bridges from Symbios (bought by LSI). The Datafab > enclosures contain a Symbios bridge AFAIK. I forgot to thank you: so thank you very much! pg
Re comment #5: If you are reasonably sure that the workaround fixes the heavy transfers, then check dmesg for a message from firewire_sbp2 about the activated workaround + firmware revision + model ID of the drive. We can then use these values to permanently enable the request size limit for this device, so that it will work out of the box. (BTW, I have two devices which appear to be LSI based too, but these are both CD-RWs and they are sealed devices so that I can't swap in HDDs for test purposes. I occasionally use them for tests like CDDA extraction or CD burning, but they have never shown a problem like this. Apparently those applications never reach the dangerous request sizes or the devices have other revisions of the bridge or of the firmware.)
(In reply to comment #7) > If you are reasonably sure that the workaround fixes the heavy transfers, then > check dmesg for a message from firewire_sbp2 about the activated workaround + > firmware revision + model ID of the drive. We can then use these values to > permanently enable the request size limit for this device, so that it will work > out of the box. I guess these are the lines you might need: firewire_sbp2: Workarounds for fw1.0: 0x1 (firmware_revision 0x002600, model_id 0x000000) firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries) scsi 17:0:0:0: Direct-Access LSILogic SYM13FW500-Disk 1.00 PQ: 0 ANSI: 0 I'm going to use this device, so if something new happens I'll report. I guess in few days this bug could be closed. > (BTW, I have two devices which appear to be LSI based too, but these are both > CD-RWs and they are sealed devices so that I can't swap in HDDs for test > purposes. I occasionally use them for tests like CDDA extraction or CD burning, > but they have never shown a problem like this. Apparently those applications > never reach the dangerous request sizes or the devices have other revisions of > the bridge or of the firmware.) For the casual observer: why the old stack does not show the problem (I mean the udev one, bug #429430)? Is it automatically limiting the transfer size? pg
> firewire_sbp2: Workarounds for fw1.0: 0x1 (firmware_revision 0x002600, > model_id 0x000000) > firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries) > scsi 17:0:0:0: Direct-Access LSILogic SYM13FW500-Disk 1.00 PQ: 0 ANSI: 0 Thanks, we will add these markers to fw-sbp2's internal quirks list. > For the casual observer: why the old stack does not show the problem > (I mean the udev one, bug #429430)? Is it automatically limiting the > transfer size? The old sbp2 driver always had the limit set to the low level of that workaround. Therefore we never noticed whether there may be more devices out there which need the transfer size limitation. However, I sent a patch in to Linux 2.6.25-rc1 which adjusts sbp2 to use the same default limit which firewire-sbp2 does (i.e. the Linux SCSI stack's defaults). So, if it weren't for your report, all users with HDDs with SYM13FW500 bridge would encounter that problem from Linux 2.6.25-rc1 onwards with the old drivers too. (Note to self: My two presumably LSI based CD-RWs have firmware_revision 0x000038, model_id 0x000000, and firmware_revision 0x000035, model_id 0x000000 respectively.)
Patch posted: http://lkml.org/lkml/2008/3/11/372
Side note: LSILogic (Symbios) bridges with the 128k request size limit should all have the firmware revision 0x0a27?? according to this document: http://ftp2.de.freebsd.org/pub/misc/specs/symbios/symchips/1394/UPDATED_configuration_ROMs/ReadMe.doc (But some don't, as we learned now.)
Patch merged into rawhide, will merge into F8 shortly as well. Should show up in rawhide kernel-2.6.25-0.119.rc5.git3.fc9 and f8 kernel-2.6.24.3-37.fc8 and later, respectively.
Hi Jarod, I'm just trying the kernel-2.6.26.3-38, and I see that, without explicitly using the option "workarounds=0x1", the problem seems to be still there. Any ideas? Thanks a lot! pg
> kernel-2.6.26.3-38 2.6.24.3-38? Is there a message from firewire_sbp2 that it automatically switched the workaround on for this device?
(In reply to comment #14) > > kernel-2.6.26.3-38 > > 2.6.24.3-38? Yeah, a typo... It would be amazing (and sad) to have a .26 kernel with the problem still open... :-) > Is there a message from firewire_sbp2 that it automatically switched the > workaround on for this device? Without explicit workaround I can see this in the logs: ... firewire_core: phy config: card 0, new root=ffc2, gap_count=7 firewire_core: created device fw1: GUID 0030ffa046010076, S400 scsi11 : SBP-2 IEEE-1394 firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries) scsi 11:0:0:0: Direct-Access LSILogic SYM13FW500-Disk 1.00 PQ: 0 ANSI: 0 sd 11:0:0:0: [sdb] 117210240 512-byte hardware sectors (60012 MB) ... With the workaround: ... firewire_core: phy config: card 0, new root=ffc2, gap_count=7 firewire_core: created device fw1: GUID 0030ffa046010076, S400 scsi12 : SBP-2 IEEE-1394 firewire_sbp2: Please notify linux1394-devel.net if you need the workarounds parameter for fw1.0 firewire_sbp2: Workarounds for fw1.0: 0x1 (firmware_revision 0x002600, model_id 0x000000) firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries) scsi 12:0:0:0: Direct-Access LSILogic SYM13FW500-Disk 1.00 PQ: 0 ANSI: 0 sd 12:0:0:0: [sdb] 117210240 512-byte hardware sectors (60012 MB) ... Which is the same situation as the previous kernel, I think. pg
> firewire_sbp2: Workarounds for fw1.0: 0x1 (firmware_revision 0x002600, > model_id 0x000000) This message should automatically appear (without specifying a module parameter) if the kernel contains the patch from comment #10.
(In reply to comment #16) > This message should automatically appear (without specifying a module parameter) > if the kernel contains the patch from comment #10. Well, it does not, so do I see the following possibilities: 1) I got the wrong kernel 2) The patch is not there/not properly 3) The patch is wrong 4) Somewhere else: for example the firmware revision is not propagated properly and what we see in the logs is not what the workarounds section receives. I can double check 1) :-)... In the rpm changelog I see (note the -37, I've the -38, but this includes the previous one, I hope): * Fri Mar 14 2008 Jarod Wilson <jwilson> 2.6.24.3-37 - Resync firewire patches w/linux1394-2.6.git - Add firewire selfID/AT/AR debug support via optional module parameters - firewire: fix DMA coherence on x86_64 systems w/memory mapped over the 4GB boundry (#434830) One more thing, in order to re-read the option "workarounds", it is not enough to detach/attach the sbp2 bay, I've also to remove the firewire-sbp module. If I remember correctly, previously was enough to reset the 1394 bus (cycle power on the sbp2 bay or detach/attach). Is there any way I can check/confirm 2) and or 3)? Some debugging option or similar? I think 4) is beyond my possibilities. Thanks. pg
D'oh, my mistake. I didn't check closely enough. The patch with the workaround for your device didn't actually make it into the f8 kernel yet. Its definitely there for rawhide, and I'll get it into F8 properly soonish.
> One more thing, in order to re-read the option "workarounds", > it is not enough to detach/attach the sbp2 bay, I've also to > remove the firewire-sbp module. The echo command from comment #1 should be effective during runtime for each plugged in (or re-plugged) device. If you alter the module parameter by means of /etc/modprobe.d/some_file or /etc/modprobe.conf, then the module needs indeed to be reloaded to become aware of the altered parameter.
(In reply to comment #18) > D'oh, my mistake. I didn't check closely enough. The patch with the workaround > for your device didn't actually make it into the f8 kernel yet. Its definitely > there for rawhide, and I'll get it into F8 properly soonish. Oh, OK, no problem. As soon as it pops up I'll give it a try and report here, so you can close this bug (or can I close it?). pg
Got the patch into 2.6.24.3-39.fc8, but I haven't fired up a build (waiting for other kernel guys to add more stuff before a new build). I think that as the original bug reporter, you do have permissions to close the bug. I'm somewhat inclined to just close it anyhow, we're about 99.999% sure this fixes the problem, given that manually specifying the workaround fixes it.
(In reply to comment #21) > Got the patch into 2.6.24.3-39.fc8, but I haven't fired up a build (waiting for > other kernel guys to add more stuff before a new build). I think that as the Thank you very much! I'll look closely the koji page :-) > original bug reporter, you do have permissions to close the bug. I'm somewhat OK, good. > inclined to just close it anyhow, we're about 99.999% sure this fixes the > problem, given that manually specifying the workaround fixes it. Are you in hurry? :-) Actually, I like to close bugs, too... ;-) Anyway, if it is OK with you, I would prefer to give at least one try to the coming kernel, then I'll close this bug and the other, #429430. Thanks again for your support! If you need some help, testing something else in the firewire stack (or others), just let me know. pg
(In reply to comment #22) > > inclined to just close it anyhow, we're about 99.999% sure this fixes the > > problem, given that manually specifying the workaround fixes it. > > Are you in hurry? :-) Nah, I just like to try to keep the bug list I'm looking at as free of stuff that I'm pretty sure is already fixed, or I get easily distracted. :) > Actually, I like to close bugs, too... ;-) > > Anyway, if it is OK with you, I would prefer to give at least one try to the > coming kernel, then I'll close this bug and the other, #429430. Works for me. I'll just put the bug into NEEDINFO for now (then its not closed, but its also off my main bug tracking view :). > Thanks again for your support! > If you need some help, testing something else in the firewire stack (or others), > just let me know. Absolutely. I think we're in pretty damned good shape on the storage side now, might be getting back over to the dv side of the house soon... :)
Okay, I've got 2.6.24.3-40.fc8 building right now, definitely *does* have the patch. http://koji.fedoraproject.org/koji/taskinfo?taskID=520390
OK, I tested quickly the new kernel, 2.6.24.3-40.fc8, and it seems to work fine, hdparm & dd did not caused any errors and the workaround activation is reported by dmesg (of course, without the modprobe.conf entry). I'll close the bug, BUT one thing I noticed is that: cat /sys/bus/firewire/drivers/sbp2/module/parameters/workarounds returns 0, I hope this is correct. pg
The 0 is correct there, because there are no globally enabled workarounds, they're only being activated for the one device. Glad we finally got it taken care of! :)
(In reply to comment #26) > The 0 is correct there, because there are no globally enabled workarounds, > they're only being activated for the one device. Glad we finally got it taken > care of! :) Yep, now you'll have to fix the DV thing! :-) Anyhow, thank you very much for your support, well done! pg
kernel-2.6.24.3-50.fc8 has been submitted as an update for Fedora 8
kernel-2.6.24.3-50.fc8 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report.