Bug 434830
Summary: | [firewire] disk can't be used due to buffer I/O errors | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jarod Wilson <jarod> |
Component: | kernel | Assignee: | Jarod Wilson <jarod> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | cebbert, davej, fedora, stefan-r-rhbz |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 2.6.24.3-50.fc8 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-03-23 03:18:14 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Jarod Wilson
2008-02-25 18:31:45 UTC
Nb: these I/O buffer problems look identical to what I'm frequently seeing with a drive in a case with a Prolific PL-3507 (rev c) bridge chip... Might be interesting to know what the bridge chip is in Ed's case. Okay, so I think we've figured out the root cause of my I/O problems on the PL-3507, and I posted patches to fix 'em to the linux1394-devel list just a bit ago[*]. I've added them to rawhide, so kernel-2.6.25-0.94.rc4.fc9 and later should carry 'em. I'm guessing Ed's I/O issues will also be resolved... [*] http://sourceforge.net/mailarchive/forum.php?thread_name=200803060015.40357.jwilson%40redhat.com&forum_name=linux1394-devel Also added to F8, in kernel 2.6.24.3-23.fc8, building in koji right now. http://koji.fedoraproject.org/koji/taskinfo?taskID=508508 Ed, please give this f8 build (or a later one) or a rawhide kernel a try, I think you should be all set. Thanks -- will give it a try and report back. No luck with koji kernel 2.6.24.3-23.fc8. No step forward, and unfortunately it actually seems to be a step backward. The drive is no longer automatically detected on startup. Once I boot, I can execute "modprobe -r firewire-ohci && modprobe firewire-ohci" to force a scan for them. The drive is then detected, but it takes about 30 seconds to do so (see attached messages log snippet). Once the drive was recognized, I again tried the rsync from a remote system. It ran for about 1 minute, paused for another 30 seconds or so, then started pushing out the usual warnings to the messages log (also attached). Regarding Jarod's question on the bridge chip, do I need to crack the case open for that, or is there a s/w utility that will tell me? Please let me know if there's anything else I can try to debug this. I'm not much of a coder, but I'll do whatever I can to help source the problem. Thanks, Ed Created attachment 297562 [details]
message log from mounting firewire drive
Created attachment 297563 [details]
message log showing errors
message log showing errors when drive stops responding, following by messages
when unmounting the drive
Ed, do you by chance use long cables, excessively bent cables, front panel or back panel breakout connectors, or unventilated enclosures? Could you install the old ieee1394 kernel modules from ATRPMs and see how they work with the very same hardware configuration? (Load ohci1394 and sbp2 instead of firewire-ohci and firewire-sbp2.) Jarod, the selfID complete event logging patch would be nice to have here to check whether there are unexpected bus resets going on. > Regarding Jarod's question on the bridge chip, do I need to crack the > case open for that, or is there a s/w utility that will tell me? You could attach /sys/bus/firewire/devices/fwX/config_rom here so we could hazard a guess. (Insert the correct device name for "fwX"; it has to be one for which also an fwX.Y exists to which firewire-sbp2 is bound. In your last log, this was fw1.) The config_rom is build up by firmware though and hence may lack or even provide false information about the hardware. OxSemi chips have further firmware identifiers and also hardware identifiers outside of the config_rom: http://marc.info/?l=linux1394-user&m=114485393227904 A few not too difficult ways exist to access these from userspace, but it would take some time to explain how. :-) Damn, I was hoping that build was going to fix things... Looks like a LaCie hard disk drive (vendor oui 00d04b == LaCie). I believe they typically use Oxford bridges -- at least one of the LaCie drives I have here that I just poked at is an OXFW911+ bridge. I'll work on getting the selfID logging patch added to a 2.6.24 f8 build sometime this week, but there should be a version of it available in rawhide even sooner... > Looks like a LaCie hard disk drive
Ah, I missed that. From what I read on the internet (and it can only be true
then :-), Europeans are usually rather fond of their LaCie disks while there
seem to be many Americans having complaints about LaCie disks. So it would be
nice if Ed, who I assume is American, could do some stress tests with the old
drivers from ATRPMs to check the extent of guilt of the new drivers.
Hey, I'm American, and I have no complaints with either of the LaCie disks I have here! Actually quite fond of both of 'em -- one is designed to sit perfectly undre a Mac Mini, the other is a nice little 2.5" drive in a case that can be used and powered over either USB or FireWire... :) Hi Jarod and Stefan, I'm American and can even say "y'all" if you need me to ;-) This is indeed a LaCie disk -- it's their 120GB Porsche but it's a couple of years old -- it only has the FW 400 connector (no USB). The disk did work fine under Fedora 6. I will try the old ieee1394 modules from ATRPMS and report back, along with the config rom dump. To Stefan's earlier questions: * I'm using short cables (1 meter or so) and there are no sharp turns. The cable attaches to a soldered connector on a Gigabyte P965 mainboard (model GA- 965P-DQ6). * I believe the drive enclosure is ventilated, but will check for sure. I've pulled a copy of the config_rom and attached it here. This was done under koji kernel 2.6.24.3-23.fc8 with the newer firewire_ohci drivers -- I will downgrade to the older drivers in the next day or so. I tried using gscanbus to get some more info on the drive, but it looks like I had a kernel panic or oops. I will give that another try too. Created attachment 297712 [details]
config_rom output for LaCie 120GB Porsche external drive
Created attachment 297715 [details]
config_rom output for LaCie 120GB Porsche external drive
I was able to get gscanbus working on the drive. Output is attached. Based on the link Stefan provided to http://marc.info/?l=linux1394-user&m=114485393227904, I was able to determine the chip version. Quadlet Read from 0xFFFF F0050000 (firmware ID) = 0x88000738 which indicates OXFW911. Quadlet Read from 0xFFFF F0090020 (hardware ID) = 0x159E96FD which didn't match any of the hardware IDs listed. Created attachment 297718 [details]
Gscanbus output
Ed, last night I remembered a firewire-ohci bug which became known at the beginning of this year. firewire-ohci is broken on machines with physical memory addresses above the 4GB mark. If I read the first few lines of your dmesg from https://bugzilla.redhat.com/show_bug.cgi?id=271801#c5 correctly, your system is an affected machines. Jarod is working on the issue. Whether this bug actually causes your I/O errors is not clear. However, it is at least possible. Your errors start with a SCSI request timeout (indicated by "firewire_sbp2: fw1.0: sbp2_scsi_abort"). Perhaps the device properly completed the request and wrote status in firewire-sbp2's status FIFO, but firewire-ohci failed to properly process the AR DMA event which results from the status write. PS: Yes, all these firmware markers tell that the bridge chip is indeed an OXFW911. Just posted the fix for my own problems on x86_64 w/>= 4GB of RAM a bit ago: http://lkml.org/lkml/2008/3/12/356 Patch also added to rawhide, should be a kernel started building soonish... Also added to an F8 kernel build now: http://koji.fedoraproject.org/packages/kernel/2.6.24.3/37.fc8/ Ed, please give that a spin and see if we don't finally have things playing nice for you... Gah. I screwed up, and the patch is NOT in the -37 kernel. Its in the currently building -40 kernel though. Should be ready by morning... http://koji.fedoraproject.org/koji/taskinfo?taskID=520390 Thanks -- I should be able to try it out later this week. kernel-2.6.24.3-50.fc8 has been submitted as an update for Fedora 8 kernel-2.6.24.3-50.fc8 has been pushed to the Fedora 8 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-2630 Jarod et al., Looks like the last batch of fixes in kernel-2.6.24.3-50.fc8 has solved everything. The drive and all partitions are recognized at startup; I'm NOT getting the 'giving up on config rom' errors; some major stress-testing of the drive failed to turn up any problems. In short, I think we're good to close out this report. Thank you all for your help in resolving this problem!!! Regards, Ed Excellent, glad to hear we finally got this one licked! kernel-2.6.24.3-50.fc8 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report. |