Bug 193168

Summary:	kernel triggers Oxford 911 firewire problem
Product:	[Fedora] Fedora	Reporter:	Robert Story <rs>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5	CC:	jonstanley, stefan-r-rhbz, wtogami
Target Milestone:	---
Target Release:	---
Hardware:	powerpc
OS:	Linux
Whiteboard:	MassClosed
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-01-20 04:37:39 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Robert Story 2006-05-25 20:15:25 UTC

Description of problem:
In late 2003 there was a big stink about an update to Mac OS X 10.3 that had a
bad interaction w/firewire drives with the Oxford 911 chipset. It seems that
sometime in the recent past, the kernel has run into this same issue. When a
drive with the old/buggy firmware is inserted, syslog logs:

May 23 11:44:30 spx kernel: ieee1394: Error parsing configrom for node 0-00:1023
May 23 11:44:30 spx kernel: ieee1394: sbp2: Driver forced to serialize I/O
(serialize_io=1)
May 23 11:44:30 spx kernel: ieee1394: sbp2: Try serialize_io=0 for better
performance
May 23 11:44:30 spx kernel: scsi0 : SBP-2 IEEE-1394
May 23 11:44:31 spx kernel: ieee1394: sbp2: Logged into SBP-2 device
May 23 11:44:31 spx kernel:   Vendor: IC25N080  Model: ATMR04-0          Rev:
May 23 11:44:31 spx kernel:   Type:   Direct-Access-RBC                  ANSI
SCSI revision: 04
May 23 11:44:31 spx kernel:  0:0:0:0: Attached scsi generic sg0 type 14
May 23 11:44:31 spx kernel: SCSI device sda: 156301488 512-byte hdwr sectors
(80026 MB)
May 23 11:44:31 spx kernel: sda: Write Protect is off
May 23 11:44:31 spx kernel: SCSI device sda: drive cache: write back
May 23 11:44:31 spx kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002
May 23 11:44:31 spx kernel: sda: Current: sense key: Aborted Command
May 23 11:44:31 spx kernel:     Additional sense: Logical block address out of range
May 23 11:44:31 spx kernel: end_request: I/O error, dev sda, sector 156301480
May 23 11:44:31 spx kernel: Buffer I/O error on device sda, logical block 19537685

which repeats a dozen or so times. Using the disk will eventually result in:

May  4 19:26:44 dhcp202 kernel: ieee1394: sbp2: aborting sbp2 command
May  4 19:26:44 dhcp202 kernel: sd 0:0:0:0:
May  4 19:26:44 dhcp202 kernel:         command: Read (10): 28 00 03 30 9f c1 00
00 08 00
May  4 19:26:54 dhcp202 kernel: ieee1394: sbp2: aborting sbp2 command
May  4 19:26:54 dhcp202 kernel: sd 0:0:0:0:
May  4 19:26:54 dhcp202 kernel:         command: Test Unit Ready: 00 00 00 00 00 00
May  4 19:26:54 dhcp202 kernel: ieee1394: sbp2: reset requested
May  4 19:26:54 dhcp202 kernel: ieee1394: sbp2: Generating sbp2 fetch agent reset
May  4 19:27:04 dhcp202 kernel: ieee1394: sbp2: aborting sbp2 command
May  4 19:27:04 dhcp202 kernel: sd 0:0:0:0:
May  4 19:27:04 dhcp202 kernel:         command: Test Unit Ready: 00 00 00 00 00 00
May  4 19:27:04 dhcp202 kernel: sd 0:0:0:0: scsi: Device offlined - not ready
after error recovery
May  4 19:27:04 dhcp202 kernel: sd 0:0:0:0: SCSI error: return code = 0x50000
May  4 19:27:04 dhcp202 kernel: end_request: I/O error, dev sda, sector 53518273
May  4 19:27:04 dhcp202 kernel: printk: 50 messages suppressed.
May  4 19:27:04 dhcp202 kernel: Buffer I/O error on device sda8, logical block
900856
May  4 19:27:04 dhcp202 kernel: sd 0:0:0:0: rejecting I/O to offline device
May  4 19:27:04 dhcp202 kernel: Buffer I/O error on device sda8, logical block
900856

which goes on and on till device access stops.


After the device firmware is upgraded, device is accessible with no problems.

Given that this can cause data loss, if the bogus firmware version can be
detected and blacklisted, that would probably be a good thing...

Version-Release number of selected component (if applicable):
unfortunately, the fc4 system I was using this drive on died and had to be
re-installed. Me earliest log tells me that the issue was present in the
2069_FC4 kernel, and persists in the current kernel. I do know that the same
drive, plugged in to a up-to-date RHEL i386 system does not exhibit this problem.

How reproducible:
Always.

Steps to Reproduce:
1. connect firewire drive w/bad firmware
2.
3.
  
Actual results:
error message, errors accessing drive

Expected results:
warning message about firmware, maybe require some option to force mount.

Additional info:
Page on updating firmware, from Mac site:
http://eshop.macsales.com/Reviews/Framework.cfm?page=/hardwareandnews/oxford/oxfordandpanther.html

Comment 1 Stefan Richter 2006-05-31 19:52:51 UTC

As someone from Oxford Semiconductor kindly posted at linux1394-user, the way to
properly detect chip and firmware of OxSemi based devices is to read at a
certain offset from the configuration ROM. This can be done for example

- with gscanbus:
Click on the device icon to see its "Physical ID".
Use the menu "Transactions/ Read Quadlet".
In the dialogue, enter the ID as destination and 0xFFFFF0050000 as memory offset
and hit OK. A result should appear in the third text box.

- with 1394commander:
Enter the command
: i
to get some basic information about the bus. Guess the disk's physical ID from
it or from syslog.
Enter the command
: r . # 0xFFFFF0050000 4
with # replaced by the disk's physical ID (e.g. 0 if there are only two nodes
and the local node has ID 1). A success message and 4 read bytes should appear.

We would need the thereby obtained value from affected firmwares, and ideally
also from unaffected firmwares to cross-check. Then it is possible to add some
code to sbp2 to warn about these devices or perhaps even activate a workaround
to avoid the "SCSI error... Logical block address out of range", in case there
is such a workaround.

I have one enclosure with OXFW911 which does not show the signs you described.
It's magic number is 0x88000731. The last byte, 31, is firmware revision
information, all other bytes denote the chip type OXFW911.

But that said, I would rather like somebody wrote a Linux utility for firmware
uploads than to add these workarounds to the kernel driver. That would of course
require additional information from Oxford Semiconductor (and from any other
SBP-2 bridge manufacturer whose chips we wanted to support).

BTW, the problem with Oxford chips under OS X 10.3 was about OXUF922 (FireWire
800 bridge), not the OXFW911.

Furthermore, the "sbp2: aborting sbp2 command" during later disk access may be
unrelated to the initial "SCSI error... Logical block address out of range" and
may be a driver bug instead of a firmware bug. There are conceptual problems in
sbp2 which I hope to resolve eventually. (Don't hold your breath, I am already
half a year behind my plans with sbp2 due to lack of time.)

Comment 2 Dave Jones 2006-10-16 20:24:45 UTC

A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 3 Jon Stanley 2008-01-20 04:37:39 UTC

(this is a mass-close to kernel bugs in NEEDINFO state)

As indicated previously there has been no update on the progress of this bug
therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue
still occurs for you and I will try to assist in its resolution. Thank you for
taking the time to report the initial bug.

If you believe that this bug was closed in error, please feel free to reopen
this bug.