+++ This bug was initially created as a clone of Bug #156158 +++ Description of problem: Apr 27 18:41:15 perelandra kernel: sym0: detaching ... Apr 27 18:41:15 perelandra kernel: sym0: resetting chip Apr 27 18:41:21 perelandra kernel: SCSI subsystem initialized Apr 27 18:41:22 perelandra kernel: PCI: Found IRQ 11 for device 0000:00:12.0 Apr 27 18:41:22 perelandra kernel: PCI: Sharing IRQ 11 with 0000:01:00.0 Apr 27 18:41:22 perelandra kernel: sym0: <810a> rev 0x23 at pci 0000:00:12.0 irq 11 Apr 27 18:41:22 perelandra kernel: sym0: No NVRAM, ID 7, Fast-10, SE, parity checking Apr 27 18:41:22 perelandra kernel: sym0: SCSI BUS has been reset. Apr 27 18:41:22 perelandra kernel: scsi0 : sym-2.1.18n Apr 27 18:41:25 perelandra kernel: Vendor: COMPAQ Model: BD03663622 Rev: BDC4 Apr 27 18:41:25 perelandra kernel: Type: Direct-Access ANSI SCSI revision: 02 Apr 27 18:41:25 perelandra kernel: sym0:0:0: tagged command queuing enabled, command queue depth 16. Apr 27 18:41:25 perelandra kernel: target0:0:0: Beginning Domain Validation Apr 27 18:41:25 perelandra last message repeated 12 times Apr 27 18:41:25 perelandra kernel: target0:0:0: Ending Domain Validation Apr 27 18:41:25 perelandra scsi.agent[5695]: disk at /devices/pci0000:00/0000:00:12.0/host0/target0:0:0/0:0:0:0 Apr 27 18:41:25 perelandra kernel: SCSI device sda: 71132000 512-byte hdwr sectors (36420 MB) Apr 27 18:41:26 perelandra kernel: SCSI device sda: drive cache: write through Apr 27 18:41:26 perelandra kernel: SCSI device sda: 71132000 512-byte hdwr sectors (36420 MB) Apr 27 18:41:26 perelandra kernel: SCSI device sda: drive cache: write through Apr 27 18:41:26 perelandra kernel: sda1 Apr 27 18:41:26 perelandra kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 sym0:0:0:M_REJECT to send for : 1-2-3-1. sym0:0:0:M_REJECT to send for : 1-2-3-1. sym0:0:0:M_REJECT to send for : 1-2-3-1. sym0:0:0:M_REJECT to send for : 1-2-3-1. sym0:0:0:M_REJECT to send for : 1-2-3-1. Apr 27 18:52:37 perelandra last message repeated 68 times Apr 27 18:53:39 perelandra last message repeated 65 times Apr 27 18:54:40 perelandra last message repeated 73 times Apr 27 18:55:41 perelandra last message repeated 68 times Apr 27 18:56:42 perelandra last message repeated 69 times Apr 27 18:57:43 perelandra last message repeated 69 times Apr 27 18:58:45 perelandra last message repeated 70 times Apr 27 18:59:46 perelandra last message repeated 74 times Version-Release number of selected component (if applicable): 2.6.11-1.14_FC3 +++ clone end ++++ Version-Release number of selected component (if applicable): 2.6.9-34.0.1.EL This must be definitly a bug in the SCSI module, not drive. Drive: ST34572WC, which has no narrow/wide jumper, connected via SCA->narrow SCSI to controller. After searching for that problem, I found a interesting posting: http://kerneltrap.org/node/3518 So I tried newer kernels from Fedora Core on RHEL4 kernel-2.6.12-1.1381_FC3 -> same problem kernel-2.6.17-1.2139_FC4 -> no problem So between 2.6.12 and 2.6.17, this bug was fixed.
This patch (http://marc.theaimsgroup.com/?l=linux-scsi&m=114122900200987&w=2) that went in to 2.6.16 may solve your problem. It addresses a missed initialization for cards that don't have NVRAM (and I see that yours doesn't) and this may result in improper negotiation. If you can't build and test this potential fix, let me know specifically which kernel and type and I can build a module for you to test. --Ryan
Yes, indeed this is a (more or less) older card without NVRAM (but supported by BIOS...some kind of hardware recycling...why trash away old SCSI hardware...it's working well 24x7x365...). Building a special module would be a good idea for tests but there are currently 2 issues: 1) server is running in production a little bit far away, would be able to test it in around 2 months or so 2) server is currently running an i586 kernel from FC4 (kernel-2.6.17-1.2139_FC4), problem occurs on kernel-2.6.9-34.0.1.EL (provided by CentOS for i586). I still don't understand, why i586 instead of i686 is required, because CPU is a Celeron 366 (and I thought, this is i686 compatible) # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 6 model name : Celeron (Mendocino) stepping : 0 cpu MHz : 367.561 cache size : 128 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36 mmx fxsr Perhaps "rpm" has a problem detecting proper arch and rely more on information about installed kernels than on capable ones. The original bug https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156158 occurs on FC4 on a local machine here. It's upgraded to FC5 since some time. But if you provide a module for kernel-2.6.9-34.0.2.EL (current RHEL4 kernel), I would setup a fresh installation here and run some tests in a timeframe of two to four weeks because I think you won't be willing to provide a module for i586, an unsupported RHEL platform...
I did some more research on this issue. The message that is being rejected is a WDTR (Wide Data Transfer Request). This makes sense that it's being rejected given your configuration. Kernel 2.6.12 introduced the 2.2.0 version of this driver, with 2.2.3 being the current upstream version. This issue must have been addressed between 2.2.1 through 2.2.3 inclusive. It looks like 2.2.2 had the biggest impact on negotiation, which was introduced in 2.6.15. I looked back through both bugs, and perhaps I missed it, but does this issue prevent access to the disk?
Yes, after few seconds, access to the disk is gone.
I may have come across a fix for this issue. Would it be possible to get you to test this either via a RHEL4 kernel or if I supply you with the patch?
Currently, I have no RHEL4 build system, so please provide a patched kernel and I will run tests.
It looks as though the patch was put into 2.6.9-42. The official information regarding this kernel can be found here: http://rhn.redhat.com/errata/RHSA-2006-0575.html Though this has been outdated by 2.6.9-42.0.3, which can be found here: http://rhn.redhat.com/errata/RHSA-2006-0689.html The patch that was applied was for a slightly different error message (reportedly "23-1", rather than "1-2-3-1"), but it appears to be a similar cause. Hopefully this resolves the issue.
I've setup a new test box (using a different SCSI disk, but same cable scenario) and the error still occurs: sym0:0:0:M_REJECT to send for : 1-2-3-1 (on each access, e.g. fdisk -l /dev/sda) kernel-2.6.9-42.0.3.EL # cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: SEAGATE Model: ST32171W Rev: 0484 Type: Direct-Access ANSI SCSI revision: 02 00:0d.0 SCSI storage controller: LSI Logic / Symbios Logic 53c810 (rev 12) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (2000ns min, 16000ns max), Cache Line Size 08 Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at e800 [size=256] Region 1: Memory at febffc00 (32-bit, non-prefetchable) [size=256]
This thread looks relevant: http://www.mail-archive.com/debian-alpha@lists.debian.org/msg22571.html Try setting the board to talk narrow mode to the SCSI target that is giving trouble. In the RHEL 4 kernel source tree, take a look at: Documentation/scsi/sym53c8xx_2.txt in particular: 8.2 Set wide size setwide <target> <size> target: target number size: 0=8 bits, 1=16bits So for target 0: echo setwide 0 0 >/proc/scsi/sym53c8xx/0 Let us know if this prevents the problem.
Yes, this prevents the kernel messages. How to setup this on cmdline before booting? Or would this be included in a new kernel?
User rpowers's account has been closed
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.