Bug 197470 - sym53c8xx causes endless sym0:0:0:M_REJECT to send for : 1-2-3-1 messages
Summary: sym53c8xx causes endless sym0:0:0:M_REJECT to send for : 1-2-3-1 messages
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Tom Coughlan
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-07-02 16:18 UTC by Peter Bieringer
Modified: 2012-06-20 16:08 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 16:08:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Peter Bieringer 2006-07-02 16:18:52 UTC
+++ This bug was initially created as a clone of Bug #156158 +++


Description of problem:
Apr 27 18:41:15 perelandra kernel: sym0: detaching ...
Apr 27 18:41:15 perelandra kernel: sym0: resetting chip
Apr 27 18:41:21 perelandra kernel: SCSI subsystem initialized
Apr 27 18:41:22 perelandra kernel: PCI: Found IRQ 11 for device 0000:00:12.0
Apr 27 18:41:22 perelandra kernel: PCI: Sharing IRQ 11 with 0000:01:00.0
Apr 27 18:41:22 perelandra kernel: sym0: <810a> rev 0x23 at pci 0000:00:12.0 irq 11
Apr 27 18:41:22 perelandra kernel: sym0: No NVRAM, ID 7, Fast-10, SE, parity
checking
Apr 27 18:41:22 perelandra kernel: sym0: SCSI BUS has been reset.
Apr 27 18:41:22 perelandra kernel: scsi0 : sym-2.1.18n
Apr 27 18:41:25 perelandra kernel:   Vendor: COMPAQ    Model: BD03663622       
Rev: BDC4
Apr 27 18:41:25 perelandra kernel:   Type:   Direct-Access                     
ANSI SCSI revision: 02
Apr 27 18:41:25 perelandra kernel: sym0:0:0: tagged command queuing enabled,
command queue depth 16.
Apr 27 18:41:25 perelandra kernel:  target0:0:0: Beginning Domain Validation
Apr 27 18:41:25 perelandra last message repeated 12 times
Apr 27 18:41:25 perelandra kernel:  target0:0:0: Ending Domain Validation
Apr 27 18:41:25 perelandra scsi.agent[5695]: disk at
/devices/pci0000:00/0000:00:12.0/host0/target0:0:0/0:0:0:0
Apr 27 18:41:25 perelandra kernel: SCSI device sda: 71132000 512-byte hdwr
sectors (36420 MB)
Apr 27 18:41:26 perelandra kernel: SCSI device sda: drive cache: write through
Apr 27 18:41:26 perelandra kernel: SCSI device sda: 71132000 512-byte hdwr
sectors (36420 MB)
Apr 27 18:41:26 perelandra kernel: SCSI device sda: drive cache: write through
Apr 27 18:41:26 perelandra kernel:  sda1
Apr 27 18:41:26 perelandra kernel: Attached scsi disk sda at scsi0, channel 0,
id 0, lun 0
sym0:0:0:M_REJECT to send for : 1-2-3-1.
sym0:0:0:M_REJECT to send for : 1-2-3-1.
sym0:0:0:M_REJECT to send for : 1-2-3-1.
sym0:0:0:M_REJECT to send for : 1-2-3-1.
sym0:0:0:M_REJECT to send for : 1-2-3-1.
Apr 27 18:52:37 perelandra last message repeated 68 times
Apr 27 18:53:39 perelandra last message repeated 65 times
Apr 27 18:54:40 perelandra last message repeated 73 times
Apr 27 18:55:41 perelandra last message repeated 68 times
Apr 27 18:56:42 perelandra last message repeated 69 times
Apr 27 18:57:43 perelandra last message repeated 69 times
Apr 27 18:58:45 perelandra last message repeated 70 times
Apr 27 18:59:46 perelandra last message repeated 74 times


Version-Release number of selected component (if applicable):
2.6.11-1.14_FC3

+++ clone end ++++
Version-Release number of selected component (if applicable):
2.6.9-34.0.1.EL

This must be definitly a bug in the SCSI module, not drive. 

Drive: ST34572WC, which has no narrow/wide jumper, connected via SCA->narrow
SCSI to controller.

After searching for that problem, I found a interesting posting:
http://kerneltrap.org/node/3518

So I tried newer kernels from Fedora Core on RHEL4
kernel-2.6.12-1.1381_FC3 -> same problem
kernel-2.6.17-1.2139_FC4 -> no problem

So between 2.6.12 and 2.6.17, this bug was fixed.

Comment 2 Ryan Powers 2006-07-24 22:04:57 UTC
This patch (http://marc.theaimsgroup.com/?l=linux-scsi&m=114122900200987&w=2)
that went in to 2.6.16 may solve your problem. It addresses a missed
initialization for cards that don't have NVRAM (and I see that yours doesn't)
and this may result in improper negotiation.

If you can't build and test this potential fix, let me know specifically which
kernel and type and I can build a module for you to test.

--Ryan

Comment 3 Peter Bieringer 2006-07-24 22:29:17 UTC
Yes, indeed this is a (more or less) older card without NVRAM (but supported by
BIOS...some kind of hardware recycling...why trash away old SCSI hardware...it's
working well 24x7x365...). 

Building a special module would be a good idea for tests but there are currently
2 issues:
1) server is running in production a little bit far away, would be able to test
it in around 2 months or so

2) server is currently running an i586 kernel from FC4
(kernel-2.6.17-1.2139_FC4), problem occurs on kernel-2.6.9-34.0.1.EL (provided
by CentOS  for i586). I still don't understand, why i586 instead of i686 is
required, because CPU is a Celeron 366 (and I thought, this is i686 compatible)

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 6
model name      : Celeron (Mendocino)
stepping        : 0
cpu MHz         : 367.561
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36
mmx fxsr

Perhaps "rpm" has a problem detecting proper arch and rely more on information
about installed kernels than on capable ones.

The original bug https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156158  
occurs on FC4 on a local machine here. It's upgraded to FC5 since some time. But
if you provide a module for kernel-2.6.9-34.0.2.EL (current RHEL4 kernel), I
would setup a fresh installation here and run some tests in a timeframe of two
to four weeks because I think you won't be willing to provide a module for i586,
an unsupported RHEL platform...

Comment 4 Ryan Powers 2006-08-21 22:02:00 UTC
I did some more research on this issue. The message that is being rejected is a
WDTR (Wide Data Transfer Request). This makes sense that it's being rejected
given your configuration. Kernel 2.6.12 introduced the 2.2.0 version of this
driver, with 2.2.3 being the current upstream version. This issue must have been
addressed between 2.2.1 through 2.2.3 inclusive. It looks like 2.2.2 had the
biggest impact on negotiation, which was introduced in 2.6.15.

I looked back through both bugs, and perhaps I missed it, but does this issue
prevent access to the disk?


Comment 5 Peter Bieringer 2006-08-31 14:10:55 UTC
Yes, after few seconds, access to the disk is gone.

Comment 6 Ryan Powers 2007-01-19 18:46:04 UTC
I may have come across a fix for this issue. Would it be possible to get you to
test this either via a RHEL4 kernel or if I supply you with the patch?

Comment 7 Peter Bieringer 2007-01-20 08:26:57 UTC
Currently, I have no RHEL4 build system, so please provide a patched kernel and
I will run tests.

Comment 8 Ryan Powers 2007-01-22 18:54:47 UTC
It looks as though the patch was put into 2.6.9-42. The official information
regarding this kernel can be found here:
http://rhn.redhat.com/errata/RHSA-2006-0575.html
Though this has been outdated by 2.6.9-42.0.3, which can be found here:
http://rhn.redhat.com/errata/RHSA-2006-0689.html

The patch that was applied was for a slightly different error message
(reportedly "23-1", rather than "1-2-3-1"), but it appears to be a similar
cause. Hopefully this resolves the issue.

Comment 9 Peter Bieringer 2007-01-27 14:45:52 UTC
I've setup a new test box (using a different SCSI disk, but same cable scenario)
and the error still occurs:

sym0:0:0:M_REJECT to send for : 1-2-3-1
(on each access, e.g. fdisk -l /dev/sda)

kernel-2.6.9-42.0.3.EL

# cat /proc/scsi/scsi 
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: SEAGATE  Model: ST32171W         Rev: 0484
  Type:   Direct-Access                    ANSI SCSI revision: 02

00:0d.0 SCSI storage controller: LSI Logic / Symbios Logic 53c810 (rev 12)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (2000ns min, 16000ns max), Cache Line Size 08
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at e800 [size=256]
        Region 1: Memory at febffc00 (32-bit, non-prefetchable) [size=256]



Comment 10 Tom Coughlan 2007-02-08 18:27:29 UTC
This thread looks relevant:

http://www.mail-archive.com/debian-alpha@lists.debian.org/msg22571.html

Try setting the board to talk narrow mode to the SCSI target that is giving
trouble.  In the RHEL 4 kernel source tree, take a look at:

Documentation/scsi/sym53c8xx_2.txt

in particular: 

8.2 Set wide size

    setwide <target> <size>

    target:    target number
    size:      0=8 bits, 1=16bits

So for target 0:

echo setwide 0 0 >/proc/scsi/sym53c8xx/0

Let us know if this prevents the problem. 

Comment 11 Peter Bieringer 2007-02-11 12:53:39 UTC
Yes, this prevents the kernel messages. How to setup this on cmdline before
booting? Or would this be included in a new kernel?

Comment 12 Red Hat Bugzilla 2007-10-23 15:30:35 UTC
User rpowers's account has been closed

Comment 13 Jiri Pallich 2012-06-20 16:08:18 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.


Note You need to log in before you can comment on or make changes to this bug.