Bug 137861 - SCSI sym53c8xx domain validation causes boot failure
Summary: SCSI sym53c8xx domain validation causes boot failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 3
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-11-02 10:11 UTC by Hal Hansen
Modified: 2015-01-04 22:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-07-15 23:23:05 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Hal Hansen 2004-11-02 10:11:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040207 Firefox/0.8

Description of problem:
When attempting to boot, the sym53c8xx SCSI module is loaded. 
During initialization, domain validation fails. The sym53c8xx
module is uninitialized and boot fails with a panic of not being
able to find root partition. It appears similar to Bug 122572.       
            

See below for patch to driver which provides a fix for this problem.

Equipment:

Host:
Chip sym53c825, device id 0x3, revision id 0x2
At PCI address 0000:01:08.0, IRQ 11
Min. period factor 25, Wide SCSI BUS
Max. started commands 448, max. commands per LUN 64

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: SEAGATE  Model: SX423451W        Rev: 9E21
Type:   Direct-Access                    ANSI SCSI revision: 02


Version-Release number of selected component (if applicable):
kernel-2.6.9-1.469

How reproducible:
Always

Steps to Reproduce:
Attempt to boot kernel with sym53c825 controller with     
LSI 4.18 SCSI BIOS, no nvram, and 23 Gigabyte Seagate 
SX423451W. Initrd loads the following list of modules 
(in this order) during boot:
   
scsi_mod.ko
sd_mod.ko  
scsi_transport_spi.ko
sym53c8xx.ko
jbd.ko
ext3.ko

Actual Results:  Domain validation errors flood the console 
screen, eventually stopping. The jbd and ext3 modules are then 
loaded from initrd but the kernel panics when it can't mount root
partition.

Expected Results:  Fedora boots and produces login prompt.

Additional info:

"Domain Validation" is the probing of the scsi bus to verify
that it can handle its design speed. SCSI specs only define DV
for Ultra3, Ultra160, and Ultra160+ SCSI. DV is undefined for
earlier versions of SCSI such as SCSI 1 and SCSI 2, and may
produce unexpected results in those cases (such as mine).

In truth, DV provides no benefit to correctly engineered working
systems. The only benefit of DV is for high SCSI speed systems
with poor bus termination or poor cabling. In such cases,  DV
will reduce bus speed in order to establish communication.
In effect, DV is a sort of kludge work-around for hardware
problems.  It is no substitute for correct engineering practices.

The Linux sym53c8xx driver has a long history of reliable
operation (which I can personally vouch for). All the NCR/Symbios 
8xx drivers are now unified under the 2.6.x kernels. This 
one driver is supposed to handle the whole gamut of sym53c8xx 
controller chips from the earliest to the latest. 

Unfortunately, automatically enabling domain validation, (instead
of making it a special feature with a boot time command line
option), defeats the purpose of general purpose unified device  
driver. And as most literature on the subject correctly points    
out, DV is not useful for single ended SCSI or for systems with
bus speeds of 20 MHz or less.                       

Until such time that a boot time option is provided, I suggest
the following patch which fixes the problem. This causes no
change in behavior other than disabling domain validation for
the sym53c8xx driver. Since the sym53c8xx driver only calls DV
during initialization anyway, disabling it will in no way affect
post-initialization driver behavior. 

Those who own broken high speed hardware and wish to throttle
back SCSI bus speed can still manually choose their SCSI bus
speed with the existing sym53c8xx "sync:" command line option
at boot time. 

---------------------- cut here 8< -----------------------------

diff -udwr linux-2.6.9-original/drivers/scsi/sym53c8xx_2/sym_glue.c
kernel-2.6.9/linux-2.6.9/drivers/scsi/sym53c8xx_2/>
--- linux-2.6.9-original/drivers/scsi/sym53c8xx_2/sym_glue.c   
2004-11-02 02:50:50.447399288 -0500
+++ kernel-2.6.9/linux-2.6.9/drivers/scsi/sym53c8xx_2/sym_glue.c   
2004-11-02 01:23:24.000000000 -0500
@@ -1118,8 +1118,15 @@
    lp->s.scdev_depth = depth_to_use;
    sym_tune_dev_queuing(np, device->id, device->lun, reqtags);

+/*
+ * Uncomment this if you have Ultra160 or Ultra3 SCSI
+ * and you would like to enable domain validation.
+ * Domain validation is not defined for earlier versions
+ * of SCSI and may or may not work. You have been warned.
+ */
+#if 0
    spi_dv_device(device);
-
+#endif
    return 0;
 }

Comment 1 Jeff Patterson 2005-02-11 19:04:31 UTC
Please fix this bug. Many scsi drives (my Atlas Quantum 18 for
example) do not support DV. This caused me no end of grief in
attempting to upgrade to FC2 from RH9.

Comment 2 Hal Hansen 2005-07-08 05:03:47 UTC
(In reply to comment #1)
> Please fix this bug. Many scsi drives (my Atlas Quantum 18 for
> example) do not support DV. 

SYM53C8XX UPDATE: 

I'm finally able to boot FC3 without any special hacks to the kernel.

Since the release of kernel 2.6.11-1.35 for FC3, I've been able
to boot FC3 without any problems. On FC4, kernels 2.6.11-1.1369
and 2.6.12-1.1387 work fine as well.

PERSONAL SUMMARY:

 FC3:  2.6.10-1.741    FAILURE
       2.6.11-1.27     ??? (untested)
       2.6.11-1.35     SUCCESS

 FC4:  2.6.11-1.1369   SUCCESS
       2.6.12-1.1387   SUCCESS

I hope others have the same new found success. Whatever the bug was,
for me it first appeared late in FC2 cycle, and remained through FC3
until the release of kernel 2.6.11-1.35. 

- Hal

Comment 3 Dave Jones 2005-07-15 18:59:31 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 4 Dave Jones 2005-07-15 23:23:05 UTC
ignore previous comment, it happened as part of a mass-update.

I'll close this based on your previous comment.

Thanks for testing.



Note You need to log in before you can comment on or make changes to this bug.