Bug 139307 - Hal causes SCSI errors on a sym53c8xx card
Hal causes SCSI errors on a sym53c8xx card
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
3
athlon Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-14 20:49 EST by Mathieu Chouquet-Stringer
Modified: 2015-01-04 17:12 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-08-30 21:44:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Log file for haldaemon (50.80 KB, text/plain)
2004-11-16 19:58 EST, Mathieu Chouquet-Stringer
no flags Details
output of "hald --daemon=no --verbose=yes" (87.75 KB, text/plain)
2005-03-10 07:53 EST, Joachim Selke
no flags Details
output of "lshal" (49.00 KB, text/plain)
2005-03-10 07:55 EST, Joachim Selke
no flags Details

  None (edit)
Description Mathieu Chouquet-Stringer 2004-11-14 20:49:38 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20041027 Galeon/1.3.18

Description of problem:
When haldaemon starts, I get the following:
sym0: SCSI parity error detected: SCR1=132 DBC=50000000 SBCL=0
sym0:0: ERROR (81:0) (8-0-0) (1f/9f/0) @ (scripta 38:f31c0004).
sym0: script cmd = e21c0004
sym0: regdump: da 00 00 9f 47 1f 00 02 00 08 80 00 80 00 0f 02 ff ff
ff 00 02 ff ff ff.
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.

This is 100% reproducible but the OS survives the problem. However, if
I try to cycle the service, the scsi card gets into abort/reset loops:
at this stage the system is more or less useless (ie no more IOs to disk).

Here's my version of the SCSI driver:
sym0: <895> rev 0x1 at pci 0000:00:0b.0 irq 10
sym0: Tekram NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.1.18k

I run kernel 2.6.10-rc1 (latest bk as of 11/14/2004).

If you have any questions or if you want me to try patches, let me know.

- Mathieu

Version-Release number of selected component (if applicable):
hal-0.4.0-10

How reproducible:
Always

Steps to Reproduce:
1. boot the system
2. watch a scsi bus reset
3. the boot process continues
4. try to restart hald
5. system dies
    

Additional info:
Comment 1 David Zeuthen 2004-11-15 14:28:26 EST
Please attach the output of running 'hald --daemon=no --verbose=yes' -
it looks like a kernel bug but that output might tell us more. Thanks.
Comment 2 Mathieu Chouquet-Stringer 2004-11-16 19:58:22 EST
Created attachment 106861 [details]
Log file for haldaemon

Here is the log file.
Interestingly enough, if I start hald at the command line, i only get this:
Nov 16 19:45:47 mcs kernel: sym0: SCSI parity error detected: SCR1=132
DBC=50000000 SBCL=0
instead of the whole thing.

/sys/devices/pci0000:00/0000:00:0b.0 is my sym based SCSI card:
00:0b.0 SCSI storage controller: LSI Logic / Symbios Logic 53c895 (rev 01)

- Mathieu
Comment 3 David Zeuthen 2004-11-16 21:38:32 EST
ok, HAL is known to work with other SCSI host adapters so I'm
reassigning this to the kernel.
Comment 4 Lonni J Friedman 2005-01-13 20:47:05 EST
This bug is awfully painful.  Its still present in 2.6.10-1.737_FC3. 
Is there any hope of getting this fixed any time soon?  I've got a box
completely incapacitated by this.
Comment 5 Don Scales 2005-01-31 05:52:43 EST
I seem to have the same problem.
I have a system using a Gigabyte 7VT600-RZ motherboard and an
AMD K7 2800 MHz processor.
The disks are all scsi and attached via a LSI SYM8951U 32bit PCI
host bus adapter.
The system is multi boot and Windows 2K and RH 9 both boot and
run without any problems.
I have just installed Fedora Core 3 from CDs (2.6.9-1.667) and 
this system takes a very long time to boot (timeouts ?) 
and then reports
SCSI parity error detected
SCR1=3 DBC=5000000 SBCL=0
SCSI phase error fixup:CCB already dequeued
SCSI BUS reset detected
SCSI BUS has been reset
SCSI BUS mode changed from SE to SE
SCSI BUS has been reset

I have also seen to longer form error messages
sym0:0: Error (81:0)(8-0-0)(f/bf/0) @ (scripta 50:f81c0400
sym0: script cmd=90080000
etc etc

The system will do simple commands but attempts to run other
than command line cause scsi resets and hangs.

This is not the same driver as used in RH9 but is the sym53c8xx_2
driver. It is possible that this driver has not had much exposure
to the large selection of older LSI scsi hbas.
I have an additional problem in that this is the same driver that is
available in RHEL4 and I need this to work with scsi tape drives.

Many Regards
Don
Comment 6 Joachim Selke 2005-03-10 07:32:45 EST
Same problem here.

OS: Fedora Core 3

Mainboard: Tyan Thunder K8W
CPUs: 2 x AMD Opteron processor 246, 2 GHz
SCSI host adapter: Tekram DC-390U2B
Tape drive: Sony SDX-450V

The tape drive is the only device attached to the host adapter. SCSI termination
is correct.


The error messages:

Kernel 2.6.9-1.667smp x86_64
============================
sym0: SCSI parity error detected: SCR1=132 DBC=50000000 SBCL=0
sym0: ERROR (81:0) (8-0-0) (f/bf/0) @ (scripta 48:f3100004).
sym0: script cmd=f31c0004
sym0: regdump [... some hex codes ...]
sym0: SCSI BUS reset detected
sym0: SCSI BUS has been reset

Kernel 2.6.10-1.770_FC3smp x86_64
==================================
sym0: SCSI parity error detected: SCR1=132 DBC=50000000 SBCL=0

(With this kernel I can do "mt -f /dev/st0 status" and "mt -f /dev/st0 eject"
without error, but I haven't tried reading or writing yet.)


With some kernel release between 2.6.9-1.667smp and 2.6.10-1.770_FC3smp (I don't
remember which one it was) the system freezed while booting oder immediately
after booting. Perhaps this helps locating the bug.
Comment 7 Joachim Selke 2005-03-10 07:53:13 EST
Created attachment 111850 [details]
output of "hald --daemon=no --verbose=yes"
Comment 8 Joachim Selke 2005-03-10 07:55:45 EST
Created attachment 111851 [details]
output of "lshal"
Comment 9 Dave Jones 2005-07-15 14:00:43 EDT
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.
Comment 10 Joachim Selke 2005-07-18 11:51:45 EDT
The problem still remains with kernel-2.6.12-1.1372_FC3.

I am going to update the system to FC4 in the next weeks. I will report on this
then.
Comment 11 Lonni J Friedman 2005-07-18 11:56:01 EDT
This problem still exists in FC4.  

In fact, I can't even install FC4 on a system that has a sym53c8xx supported
card in it.  I get an Oops very early in the installation process.  If I remove
that card, the Oops never happens, and I can complete the installation.  This is
rather frustrating as nearly all of my SCSI controllers are sym53c8xx, and I
can't use them, or the SCSI disks.
Comment 12 Joachim Selke 2005-08-30 18:24:14 EDT
I updated my system to Fedora Core 4 (fresh install, no "real" update) today,
and the problem is gone.

I tested the kernel Fedora Core 4 ships with (2.6.11-1.1369_FC4smp x86_64) and
the current one (2.6.12-1.1447_FC4smp x86_64). Both work, there is no "SCSI
parity error detected" message.
Comment 13 Dave Jones 2005-08-30 21:44:41 EDT
this should be fixed in the latest kernel errata for both fc3/ and fc4.


Note You need to log in before you can comment on or make changes to this bug.