Description of problem: Installing rhel4 onto pseries (p630) works fine. Booting normally fails with: Kernel panic: Attempted to kill init! Notable failure messages for each scsi device: sym0:0:0: HOST RESET operation timed-out. scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0 Version-Release number of selected component (if applicable): kernel-2.6.8-1.528.2.5.ppc64 How reproducible: Always Steps to Reproduce: 1. Install onto pseries 2. Reboot Actual results: Fails Expected results: Boots Additional info: Adding maxcpus=1 to boot enables boot to procede normally. Working dmesg fragment: sym0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31) Vendor: IBM Model: IC35L073UCDY10-0 Rev: S28C Type: Direct-Access ANSI SCSI revision: 03 sym0:8:0: tagged command queuing enabled, command queue depth 16. scsi(0:0:8:0): Beginning Domain Validation sym0:8: asynchronous. sym0:8: wide asynchronous. sym0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31) scsi(0:0:8:0): Ending Domain Validation SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 >
Created attachment 103044 [details] Here is the diff for the sym2 driver in CVS head for the last two months.
There's nothing useful in that diff -- it's mostly the pointless s/0/NULL/ which sparse addicts seem to like -- and one local variable moving from .data to .bss by removing its explicit initialisation to zero. It shouldn't have any effect on behaviour.
ubject: [Bug 10800] New: - RH130783-SMP boot fails with sym53c8xx Importance: normal References: <10800.bugzilla.com> In-Reply-To: <10800.bugzilla.com> X-Bugzilla-Reason: AssignedTo Reporter X-Bugzilla-Family: Distro Development Message-Id: <20040825151855.8108D93B69.ibm.com> Date: Wed, 25 Aug 2004 11:18:55 -0400 (EDT) Do not reply to this note. It was sent by a machine. Instead append your comments to the bug at the URL below. https://bugzilla.linux.ibm.com/show_bug.cgi?id=10800 Summary: RH130783-SMP boot fails with sym53c8xx Vendor: Red Hat Linux Version: RHEL4_Alpha4 Platform: pSeries Architecture: PPC-64 Submitting Project: LTC Change Team Customer Priority: -- Owning Team: LTC OSC Acceptance: N/S Customer Status: N/S Required Date: 0000-00-00 00:00:00 Target Date: 2000-00-00 00:00:00 Make External: NO Status: ASSIGNED Technical Severity: high Engineer Priority: P2 Component: Kernel Owner: gjlynx.com SubmittedBy: spwoods.com QAContact: thinh.com Hardware Environment: Software Environment: Steps to Reproduce: 1. 2. 3. Actual Results: Expected Results: Additional Information: From RedHat bug 130783: Opened by Paul Nasrat (pnasrat) on 2004-08-24 13:40 Description of problem: Installing rhel4 onto pseries (p630) works fine. Booting normally fails with: Kernel panic: Attempted to kill init! Notable failure messages for each scsi device: sym0:0:0: HOST RESET operation timed-out. scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0 Version-Release number of selected component (if applicable): kernel-2.6.8-1.528.2.5.ppc64 How reproducible: Always Steps to Reproduce: 1. Install onto pseries 2. Reboot Actual results: Fails Expected results: Boots Additional info: Adding maxcpus=1 to boot enables boot to procede normally. Working dmesg fragment: sym0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31) Vendor: IBM Model: IC35L073UCDY10-0 Rev: S28C Type: Direct-Access ANSI SCSI revision: 03 sym0:8:0: tagged command queuing enabled, command queue depth 16. scsi(0:0:8:0): Beginning Domain Validation sym0:8: asynchronous. sym0:8: wide asynchronous. sym0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31) scsi(0:0:8:0): Ending Domain Validation SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 > ------- Additional Comment #6 From Tom Coughlan (coughlan) on 2004-08-24 18:47 ------- Here is the diff for the sym2 driver in CVS head for the last two months. ------- Additional Comment #7 From David Woodhouse (dwmw2) on 2004-08-24 18:57 ------- There's nothing useful in that diff -- it's mostly the pointless s/0/NULL/ which sparse addicts seem to like -- and one local variable moving from .data to .bss by removing its explicit initialisation to zero. It shouldn't have any effect on behaviour.
----- Additional Comments From mike.anderson.com(prefers email via andmike.com) 2004-08-25 12:04 ------- Can you provide more of the working dmesg as the failure message is for id 0 lun 0? Your working dmesg fragment does not contain an inquiry string for this device.
Created attachment 103084 [details] console output from sym53c8xx blowing up I've attached an example of console output when the sym53c8xx blows up.
----- Additional Comments From mike.anderson.com(prefers email via andmike.com) 2004-08-25 15:07 ------- ok, I just grabbed the kernel-2.6.8-1.528.2.5, compiled it, and booted it on my p630. I am not seeing issues, but my device configs is not exactly the same as yours. You could try to adding inq_timeout=30 as a scsi_mod module options to see if increasing the scan timeout makes a difference. This is a long shot as this should only be needed for cdrom type devices on the other side of a acard chip. Do you want me to make a patch of the sym_2 update that James did against the 2.6.8-1.528.2.5 kernel and attach it to the bug?
It sure looks like interrupts are not being delivered. The driver is sending Inquiries to probe each SCSI ID and it is never getting a response from the adapter. This used to work and now it doesn't, and there are no apparent changes to the driver, so the problem is likely to be elsewhere. James' patch couldn't hurt, but it does not look like a likely fix.
----- Additional Comments From khoa.com 2004-08-27 14:40 ------- Mike - thanks for looking into this!
----- Additional Comments From mike.anderson.com(prefers email via andmike.com) 2004-08-27 15:08 ------- Tan I have attached a patch that contains the updates James Bottomley made for the sym2 ported to the RedHat kernel. I will be out next week on vacation. I will send email to eserverio to get someone to take over.
Created attachment 103177 [details] p00001_rh_sym2_update.patch
----- Additional Comments From mike.anderson.com(prefers email via andmike.com) 2004-08-27 15:09 ------- sym2 updates ported to rh kernel Here is a patch that applies James's sym2 updates to a rh kernel.
----- Additional Comments From bjking1.com(prefers email via brking.com) 2004-08-31 12:37 ------- Can you #define DEBUG_FLAGS 0xffff in sym_hipd.h and boot the kernel? This should give us a better idea what is going on, if interrupts are occurring at all, etc. What kernel version did this work on?
The attached patch seems to be in the upstream kernels, but does not actually let me boot my p630 box without maxcpus=1. Should the patch be reversed before application? David
----- Additional Comments From mike.anderson.com(prefers email via andmike.com) 2004-09-15 12:29 EDT ------- Yes, I would not apply the patch anymore. Did you run with any of the flags suggested in previous comments? The DEBUG flags would be nice to have some more info during the failure.
Created attachment 103876 [details] Console log when debugging with DEBUG_FLAGS set to 0xffff This is the entire kernel console log. After this it just hung as far as I can tell. I still see processors 1-3 being stuck and it only bringing up one CPU - I don't know whether this is relevant. David
----- Additional Comments From mike.anderson.com(prefers email via andmike.com) 2004-09-16 14:34 EDT ------- I checked into the processor stuck message. It should not happen, but I did not get a response. I cannot say if this is effecting the problem. I have a system here that is generating the same processor stuck message and the sym2 driver appears to be working ok. I am still investigating how to get all processors to come online.
----- Additional Comments From mike.anderson.com(prefers email via andmike.com) 2004-09-16 14:56 EDT ------- ok I guess I should have waited before creating the response. Anton just sent a patch to linuxppc64-dev and arjanv about ppc64 dont use state == SYSTEM_BOOTING. One this patch is included that processor stuck messages should go away, but still unknown about the effect on the sym driver.
Yay! I built Arjan's kernel, and now I see: | Processor 1 found. | Processor 2 found. | Processor 3 found. | Brought up 4 CPUs | ... | Loading sym53c8xx.ko module | sym0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31) | Vendor: IBM Model: IC35L073UCDY10-0 Rev: S28C | Type: Direct-Access ANSI SCSI revision: 03 | ... | SCSI device sda: drive cache: write through | sda: sda1 sda2 sda3 sda4 < sda5 > | ... | Kernel 2.6.8-1.581.dhowells on an ppc64 | | pseries.cambridge.redhat.com login: I can see four CPUs in /proc/cpuinfo, and the computer booted of discs through the sym2 driver (it used to hang before). David
The RPMs I've tried this with are: http://people.redhat.com/~dhowells/.pickup/ibm/kernel-2.6.8-1.581.dhowells.src.rpm http://people.redhat.com/~dhowells/.pickup/ibm/kernel-2.6.8-1.581.dhowells.ppc64.rpm David
----- Additional Comments From khoa.com 2004-09-19 11:38 EDT ------- Following our new bugzilla process here, I'd like to put this bug into SUBMITTED state because: - A patch has been submitted to Red Hat - Red Hat has built a private kernel RPM containing the patch - Red Hat (David Howels) has tested the patch OK So the next step is to wait for confirmation from Red Hat on when this fix will make an official RHEL4 build - maybe beta2 ? Thanks.
----- Additional Comments From khoa.com 2004-10-11 01:35 EDT ------- David (Red Hat) - please confirm if this will be fixed in beta2. Thanks.
----- Additional Comments From khoa.com 2004-10-18 02:24 EDT ------- Red Hat has marked the bug as MODIFIED on their side, which indicates that the fix has been accepted into RHEL4 CVS (according to Bob Johnson at Red Hat). So I'd like to go ahead and mark this bug as ACCEPTED. We still need to verify when beta2 comes out before closing the bug report. Thanks.
According to dhowells, this fix is included in RHEL4-Beta2-RC-re1027.0. Closing this issue as FIXED CURRENTRELEASE. khoa: Please re-open this issue if problems surface
----- Additional Comments From AVenkat.com 2004-11-12 18:16 EDT ------- On a p650 with RHEL4.0-beta2 there is no problem in rebooting (though some minor errors were displayed on the console) and sym53c8xx gets loaded. (see bug#10763). On a p630 also rebooting after RHEL4.0-beta2 (installed on SCSI drive) seems to be no problem. (see bug#10540)