Description of problem: When utilizing the external hard drive for a c++ compilation (make -j2), the compilation stops and all IO to the FW disk ceases to function for around 30 seconds. The kernel then throws up the message "status write for unknown orb" and sbp2_scsi_abort. At this point the IO can continue. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Plug in drive 2. Use drive Actual results: Kernel prints messages: fw_sbp2: status write for unknown orb fw_sbp2: sbp2_scsi_abort ..and drive is unusable for about 30 seconds, or until I stop the IO (by pressing control C during the make) and even then I have to wait a while before I can read or write from the drive. Expected results: Should just work. Additional info: System is SMP, dual opteron: AMD Opteron(tm) Processor 246 lspci of the systems firewire devices 01:08.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) the disk is a Lacie 80GB RUGGED HD U2&FW&FW8 5400RPM 8MB dmesgs relating to fw_sbp2 leading up to an event fw_core: created new fw device fw1 (1 config rom retries) fw_sbp2: logged in to sbp2 unit fw1.0 (0 retries) fw_sbp2: - management_agent_address: 0xfffff0030000 fw_sbp2: - command_block_agent_address: 0xfffff0100000 fw_sbp2: - status write address: 0x000100000000 scsi7 : SBP-2 IEEE-1394 scsi 7:0:0:0: Direct-Access-RBC WDC WD80 0VE-08HDT0 10.0 PQ: 0 ANSI: 4 SCSI device sdc: 156301488 512-byte hdwr sectors (80026 MB) sdc: Write Protect is off sdc: Mode Sense: 11 00 00 00 SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support DPO or FUA SCSI device sdc: 156301488 512-byte hdwr sectors (80026 MB) sdc: Write Protect is off sdc: Mode Sense: 11 00 00 00 SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdc: sdc1 sd 7:0:0:0: Attached scsi disk sdc sd 7:0:0:0: Attached scsi generic sg3 type 14 fw_core: phy config: card 0, new root=ffc0, gap_count=63 fw_sbp2: management write failed, rcode 0x12 fw_sbp2: reconnected to unit fw1.0 (1 retries) fw_sbp2: status write for unknown orb fw_sbp2: sbp2_scsi_abort fw_sbp2: status write for unknown orb fw_sbp2: sbp2_scsi_abort fw_sbp2: status write for unknown orb fw_sbp2: sbp2_scsi_abort fw_sbp2: status write for unknown orb fw_sbp2: sbp2_scsi_abort fw_sbp2: status write for unknown orb fw_sbp2: sbp2_scsi_abort
To my observation, fw_sbp2 (and lower layers) generally reacts even less gracefully on occasional bus resets than the old sbp2 (and lower layers). Bugs in bus generation tracking? Or/and with filtered physical DMA? To make matters more interesting, this may come in unfortunate combination with firmware flaws.
As more information, I think that this is a bug somewhere in the new fw* code for Fedora 7 as opposed to a firmware problem. I have used this disk on F7 USB, FC5 FW, FC6 FW, Windows XP FW, and Max OS 10.3 & 10.4 FW. I haven't had a problem with the disk except for F7 on FireWire.
If you have only this disk on the bus (like your dmesg output indicates), then there shouldn't be the later bus reset event which is indicated by the lines fw_core: phy config: card 0, new root=ffc0, gap_count=63 fw_sbp2: management write failed, rcode 0x12 fw_sbp2: reconnected to unit fw1.0 (1 retries) It may even be a small series of bus reset events in fast succesion. Be it one event or several, they shouldn't be there according to the protocols involved. This means: The bus is probably electrically unstable. Nonetheless, the drivers are supposed to survive occasional bus reset events, so there is clearly a flaw in the drivers. I have seen such problems with fw-sbp2 myself, notably with a 2.5" disk connected to a front panel connector which is internally connected to a motherboard header via a "jumper" cable. Such hardware tends to be electrically noisy, and bus resets may happen out of the blue. Due to whatever bug in fw-sbp2, the risk of command timeouts or more serious transport failures after such events is higher as with the old driver. (To make matters worse with that 2.5" disk of mine, it is HFS+ formatted and the hfsplus filesystem is quick to oops after a transport error.)
I found a way to occasionally reproduce the fw_sbp2: status write for unknown orb fw_sbp2: sbp2_scsi_abort sequence (but without any 'reconnected...' in between) and will start debugging this. (Don't expect fast progress, I'm doing this in spare time.)
Kristian, does the kernel build mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=242254#c20 also contain your 'status write for unknown orb' fix?
Yes, I meant to update bugzilla when I did it, but it was down for maintenance over the weekend. In the meantime, davej rebase to a more recent git snapshot that has the latest fixes from your tree, including the "unknown orb" fix. David, if you can give the latest rawhide kernel or the kernel from this build: http://koji.fedoraproject.org/koji/taskinfo?taskID=131769 a try, that'd be great. Thanks.
Kristian, Is it too late for me to try this? I'm getting married next week and have been really busy at work, so this is the first chance I have to be able to build a kernel. Better yet, is this update in the kernel I just got from up2date?: Linux 2.6.22.4-65.fc7 #1 SMP Tue Aug 21 21:50:50 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux If I can still test, where is the best place for me to get the kernel source?
According to the link in comment #6, 2.6.23* builds from Fri Aug 24 2007 and later contain the 'unknown orb' fix. I'm not a Fedora user myself but I guess the first thing to try would be the latest or close to latest from http://koji.fedoraproject.org/koji/packageinfo?packageID=8 . Also, the changelog of the latest 2.6.22.* package there indicates that Kristian's fw-sbp2 patch is not in the 2.6.22.* Fedora kernels.
I tested this out using kernel-2.6.23-0.164.rc5.fc8.x86_64 from that page. My problem appears to be fixed in that build. I was able to copy my entire source tree twice without the 'unknown orb' message or any hangs. Sorry it took so long for me to test. Thank you all for your hard work. - Dave
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage This bug appears to be resolved and I am therefore closing it. If I have erred, please forgive me and re-open with any additional information you are able to give. I will then try and assist you if I can. Cheers Chris