Bug 253538
Summary: | Can't boot 2.6.9-56-smp with Vmware | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Marcus Alves Grando <marcus> |
Component: | kernel | Assignee: | Chip Coldwell <coldwell> |
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.5 | CC: | clalance, coldwell, coughlan, cww, divyanshu.verma, emcnabb, eric.moore, eva, jbaron, jon.stanley, larry.stephens, sathya.prakash |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2007-0791 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-11-15 16:31:42 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 279571 | ||
Attachments: |
Description
Marcus Alves Grando
2007-08-20 14:24:33 UTC
# lspci 00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 01) 00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 01) 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 08) 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01) 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 00:0f.0 VGA compatible controller: VMware Inc [VMware SVGA II] PCI Display Adapter 00:10.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01) 00:11.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 10) 00:12.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 10) # tune2fs -l /dev/sda1 tune2fs 1.35 (28-Feb-2004) Filesystem volume name: /boot Last mounted on: <not available> Filesystem UUID: d4ad6eb2-4174-44ac-af3e-053c2c75155a Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 16064 Block count: 64228 Reserved block count: 3211 Free blocks: 47701 Free inodes: 16022 First block: 1 Block size: 1024 Fragment size: 1024 Reserved GDT blocks: 250 Blocks per group: 8192 Fragments per group: 8192 Inodes per group: 2008 Inode blocks per group: 251 Filesystem created: Fri Mar 2 11:36:20 2007 Last mount time: Mon Aug 20 11:16:54 2007 Last write time: Mon Aug 20 11:16:54 2007 Mount count: 35 Maximum mount count: -1 Last checked: Fri Mar 2 11:36:20 2007 Check interval: 0 (<none>) Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 2820ce24-b251-463e-b163-14ce7e03177d Journal backup: inode blocks # tune2fs -l /dev/sda3 tune2fs 1.35 (28-Feb-2004) Filesystem volume name: / Last mounted on: <not available> Filesystem UUID: b9c41902-fbef-4978-a58e-a94a3cbc363f Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 1048576 Block count: 2094474 Reserved block count: 104723 Free blocks: 1414838 Free inodes: 957609 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 511 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16384 Inode blocks per group: 512 Filesystem created: Fri Mar 2 11:36:17 2007 Last mount time: Mon Aug 20 11:16:53 2007 Last write time: Mon Aug 20 11:16:53 2007 Mount count: 35 Maximum mount count: -1 Last checked: Fri Mar 2 11:36:17 2007 Check interval: 0 (<none>) Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 101a0a14-cce3-48b2-aadb-060377c91f4b Journal backup: inode blocks (In reply to comment #0) > Description of problem: > > I install jbarton 2.6.9-56-smp to test and i can't boot in vmware 3.0.2 ESX. > 2.6.9-55.0.2-smp works fine. s/jbarton/Jason Baron (jbaron)/ This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP. If possible, please provide a crash dump, or at least a stack trace. I'll attach the serial console output (which is missing the userspace LVM errors, but shows the kernel messages). The relevant part here seems to be this: Fusion MPT base driver 3.02.99.00rh Copyright (c) 1999-2007 LSI Logic Corporation Fusion MPT SPI Host driver 3.02.99.00rh ACPI: PCI Interrupt 0000:00:10.0[A] -> GSI 17 (level, low) -> IRQ 169 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator} scsi0 : ioc0: LSI53C1030, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=169 Fusion MPT SAS Host driver 3.02.99.00rh device-mapper: 4.5.5-ioctl (2006-12-01) initialised: dm-devel cdrom: open failed. cdrom: open failed. Kernel panic - not syncing: Attempted to kill init! Note the section for scsi0; in the working kernel, this looks like: scsi0 : ioc0: LSI53C1030, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=169 Vendor: VMware, Model: VMware Virtual S Rev: 1.0 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sda: 16777216 512-byte hdwr sectors (8590 MB) sda: cache data unavailable sda: assuming drive cache: write through SCSI device sda: 16777216 512-byte hdwr sectors (8590 MB) sda: cache data unavailable sda: assuming drive cache: write through sda: sda1 sda2 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 So it looks like it is not finishing finding the drive, scanning the partitions, and attaching it. Chris Lalancette Created attachment 189101 [details]
Serial console output for failing boot
(In reply to comment #6) > > So it looks like it is not finishing finding the drive, scanning the partitions, > and attaching it. Do you get the same "attempting to kill init" message if you try to boot the system with no attached storage (using the same initrd)? Chip Created attachment 191831 [details]
upstream patch (rejected by Christoph Hellwig).
This patch works around a bug in the VMWare emulated 1030 mptspi adapter.
*** Bug 282411 has been marked as a duplicate of this bug. *** Created attachment 196621 [details]
screenshot of vmware server console showing stack trace.
Not sure if this should be considered a separate bug or not - however an install of 4.6 beta fails due to not finding disks. After the install fails, the system panics, stack trace provided in attached screenshot. If this needs to be a separate bug, let me know - however it seems to be a similar profile, and since no stack trace has yet been produced on this one, figured this would be something new to add. (In reply to comment #16) > Created an attachment (id=196621) [edit] > screenshot of vmware server console showing stack trace. > Looks like the module was loaded at address 0xe0925000 and the EIP on the panic was 0xe0929b1f which was this chunk of assembly (addresses are offsets from the start of mptscsih_synchronize_cache) 4b0e: 8b 44 96 60 mov 0x60(%esi,%edx,4),%eax 4b12: 8b 3c a8 mov (%eax,%ebp,4),%edi 4b15: 0f 84 92 00 00 00 je 4bad <mptscsih_synchronize_cache+0x1e5> 4b1b: 85 ff test %edi,%edi 4b1d: 74 0c je 4b2b <mptscsih_synchronize_cache+0x163> 4b1f: 80 7f 0c 00 cmpb $0x0,0xc(%edi) I've figured out that the corresponding bit of source code is line 4744 of mptscsi.c, while (bus < ioc->NumberOfBuses) { iocmd.bus = bus; iocmd.id = id; pMptTarget = ioc->Target_List[bus]; pTarget = pMptTarget->Target[id]; if (doConfig) { /* Set the negotiation flags */ if (pTarget && !pTarget->raidVolume) { <===== panic here flags = pTarget->negoFlags; } else { It looks like dereferencing pTarget is the problem. Judging from the registers in the panic message, that pointer was holding the value 0x00000010 at the time, so it was both non-NULL and also an invalid address. Still digging. Chip committed in stream U6 build 60. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ (In reply to comment #20) > committed in stream U6 build 60. A test kernel with this patch is available from > http://people.redhat.com/~jbaron/rhel4/ > I take latest 2.6.9-59 kernel and not found yet. I think that you apply wrong fix. Based on your src.rpm your fix is: -- @@ -2771,6 +2913,20 @@ GetPortFacts(MPT_ADAPTER *ioc, int portnum, int sleepFlag) pfacts->IOCLogInfo = le32_to_cpu(pfacts->IOCLogInfo); pfacts->MaxDevices = le16_to_cpu(pfacts->MaxDevices); pfacts->PortSCSIID = le16_to_cpu(pfacts->PortSCSIID); + + max_id = (ioc->bus_type == SAS) ? pfacts->PortSCSIID : + pfacts->MaxDevices; + ioc->DevicesPerBus = (max_id > 255) ? 256 : max_id; + ioc->NumberOfBuses = (ioc->DevicesPerBus < 256) ? 1 : max_id/256; + if ( ioc->NumberOfBuses > MPT_MAX_BUSES ) { + dinitprintk((MYIOC_s_WARN_FMT "NumberOfBuses=%d > MPT_MAX_BUSES=%d\n", + ioc->name, ioc->NumberOfBuses, MPT_MAX_BUSES)); + ioc->NumberOfBuses = MPT_MAX_BUSES; + } + + dinitprintk((MYIOC_s_WARN_FMT "Buses=%d MaxDevices=%d DevicesPerBus=%d\n", + ioc->name, ioc->NumberOfBuses, max_id, ioc->DevicesPerBus)); + pfacts->ProtocolFlags = le16_to_cpu(pfacts->ProtocolFlags); pfacts->MaxPostedCmdBuffers = le16_to_cpu(pfacts->MaxPostedCmdBuffers); pfacts->MaxPersistentIDs = le16_to_cpu(pfacts->MaxPersistentIDs); -- And still does not work. Proposed patch in attachment works fine. I'll attach boot screen. Regards I'll attach boot screen as soon as bugzilla works. Login page does not work. Regards Created attachment 207501 [details]
Problem persist in kernel 2.6.9-59 (jbarton)
In jbarton kernel 2.6.9-59 the related bug still persist. Maybe wrong patch.
(In reply to comment #23) > Created an attachment (id=207501) [edit] > Problem persist in kernel 2.6.9-59 (jbarton) > > In jbarton kernel 2.6.9-59 the related bug still persist. Maybe wrong patch. Comment #20 says this patch is in build 60, so you should not expect to find it in 2.6.9-59. Please test 2.6.9-60 when it becomes available. Thank-you, Chip (In reply to comment #24) > (In reply to comment #23) > > Created an attachment (id=207501) [edit] [edit] > > Problem persist in kernel 2.6.9-59 (jbarton) > > > > In jbarton kernel 2.6.9-59 the related bug still persist. Maybe wrong patch. > > Comment #20 says this patch is in build 60, so you should not expect to find it > in 2.6.9-59. Please test 2.6.9-60 when it becomes available. > > Thank-you, > > Chip I know. I'm not crazy yet. But in ~jbarton does not have 2.6.9-60 and i see in date/md5_with_old_kernel_src that 2.5.9-59 are modified. Then i think that update 2.5.9-59 and not bump kernel version. I'm wrong? Regards (In reply to comment #25) > > I know. I'm not crazy yet. But in ~jbarton does not have 2.6.9-60 and i see in > date/md5_with_old_kernel_src that 2.5.9-59 are modified. Then i think that > update 2.5.9-59 and not bump kernel version. I'm wrong? Yes. Please test 2.6.9-60 when it becomes available. Chip (In reply to comment #26) > (In reply to comment #25) > > > > I know. I'm not crazy yet. But in ~jbarton does not have 2.6.9-60 and i see in > > date/md5_with_old_kernel_src that 2.5.9-59 are modified. Then i think that > > update 2.5.9-59 and not bump kernel version. I'm wrong? > > Yes. Please test 2.6.9-60 when it becomes available. 2.6.9-60 is now available at http://people.redhat.com/~jbaron/rhel4/RPMS.kernel/ Chip (In reply to comment #29) > (In reply to comment #26) > > (In reply to comment #25) > > > > > > I know. I'm not crazy yet. But in ~jbarton does not have 2.6.9-60 and i see in > > > date/md5_with_old_kernel_src that 2.5.9-59 are modified. Then i think that > > > update 2.5.9-59 and not bump kernel version. I'm wrong? > > > > Yes. Please test 2.6.9-60 when it becomes available. > > 2.6.9-60 is now available at > > http://people.redhat.com/~jbaron/rhel4/RPMS.kernel/ > > Chip > Works fine. Thanks all. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html |