470202 – Kernel Panic at pci_scan_bus_parented+0xa/0x1f with "acpi=off" or "acpi=ht" options

Bug 470202 - Kernel Panic at pci_scan_bus_parented+0xa/0x1f with "acpi=off" or "acpi=ht" options

Summary: Kernel Panic at pci_scan_bus_parented+0xa/0x1f with "acpi=off" or "acpi=ht"...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Prarit Bhargava
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	480914 494697 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-11-06 09:11 UTC by Qian Cai
Modified:	2018-10-20 02:29 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-09-02 08:52:11 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg from xw8600 in RHTS (21.66 KB, application/octet-stream) 2008-11-10 16:53 UTC, Prarit Bhargava	no flags	Details
IA-32 bare metal Kernel was panicking. (14.89 KB, text/plain) 2008-11-11 09:16 UTC, Qian Cai	no flags	Details
x86-64 Xen Domain 0 was panicking. (10.59 KB, text/plain) 2008-11-11 09:16 UTC, Qian Cai	no flags	Details
IA-32 -123.el5 bare metal Kernel with acpi=off is panicking. (14.87 KB, text/plain) 2008-11-11 09:32 UTC, Qian Cai	no flags	Details
IA-32 -123.el5 bare metal Kernel without acpi=off is working. (19.81 KB, text/plain) 2008-11-11 09:32 UTC, Qian Cai	no flags	Details
IA-32 -123.el5 bare metal Kdump is working. (15.51 KB, text/plain) 2008-11-11 09:55 UTC, Qian Cai	no flags	Details
xw9400 is panicking with non-PAE Kernel. (8.88 KB, text/plain) 2008-11-11 15:46 UTC, Qian Cai	no flags	Details
xw9400 is panicking with PAE Kernel. (8.82 KB, text/plain) 2008-11-11 15:47 UTC, Qian Cai	no flags	Details
RHEL5 fix for this issue (2.03 KB, patch) 2008-11-11 19:37 UTC, Prarit Bhargava	no flags	Details \| Diff
RHEL5 fix for this issue (2.53 KB, patch) 2008-11-12 19:31 UTC, Prarit Bhargava	no flags	Details \| Diff
proposed patch (3.16 KB, patch) 2009-04-02 10:49 UTC, Veaceslav Falico	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1243	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update	2009-09-01 08:53:34 UTC

Description Qian Cai 2008-11-06 09:11:39 UTC

Description of problem:
On HP xw8600 an xw9400, Kernel refuse to boot on both IA-32 and x86-64 with "acpi=off".

kernel /vmlinuz-2.6.18-121.el5PAE ro root=/dev/VolGroup00/LogVol00 console=ttyS
0,115200 acpi=off
   [Linux-bzImage, setup=0x1e00, size=0x1befb4]
initrd /initrd-2.6.18-121.el5PAE.img
   [Linux-initrd @ 0x37cd8000, 0x317f55 bytes]

Linux version 2.6.18-121.el5PAE (brewbuilder.redhat.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Mon Oct 27 22:03:07 EDT 2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009d000 (usable)
 BIOS-e820: 000000000009d000 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e8e00 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000dffc7000 (usable)
 BIOS-e820: 00000000dffc7000 - 00000000e0000000 (reserved)
 BIOS-e820: 00000000f0000000 - 00000000f8000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000120000000 (usable)
3712MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe700
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
disabling kdump
NX (Execute Disable) protection: active
DMI 2.5 present.
Using APIC driver default
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: HP       Product ID: Workstation  APIC at: 0xFEE00000
Processor #0 15:1 APIC version 16
Processor #2 15:1 APIC version 16
Processor #1 15:1 APIC version 16
Processor #3 15:1 APIC version 16
I/O APIC #8 Version 17 at 0xFEC00000.
I/O APIC #9 Version 17 at 0xFA400000.
Enabling APIC mode:  Flat.  Using 2 I/O APICs
Processors: 4
Allocating PCI resources starting at e1000000 (gap: e0000000:10000000)
Detected 2600.075 MHz processor.
Built 1 zonelists.  Total pages: 1179648
Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS0,115200 acpi=off
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c0759000 soft=c0739000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 4147548k/4718592k available (2134k kernel code, 45112k reserved, 892k data, 228k init, 3276572k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5202.38 BogoMIPS (lpj=2601193)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(2) -> Core 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
CPU0: AMD Dual-Core AMD Opteron(tm) Processor 2218 stepping 02
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 3000
CPU 1 irqstacks, hard=c075a000 soft=c073a000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5199.28 BogoMIPS (lpj=2599644)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(2) -> Core 1
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: AMD Dual-Core AMD Opteron(tm) Processor 2218 stepping 02
SMP alternatives: switching to SMP code
Booting processor 2/2 eip 3000
CPU 2 irqstacks, hard=c075b000 soft=c073b000
Initializing CPU#2
Calibrating delay using timer specific routine.. 5199.30 BogoMIPS (lpj=2599650)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 2(2) -> Core 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#2.
CPU2: AMD Dual-Core AMD Opteron(tm) Processor 2218 stepping 02
SMP alternatives: switching to SMP code
Booting processor 3/3 eip 3000
CPU 3 irqstacks, hard=c075c000 soft=c073c000
Initializing CPU#3
Calibrating delay using timer specific routine.. 5199.28 BogoMIPS (lpj=2599642)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 3(2) -> Core 1
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#3.
CPU3: AMD Dual-Core AMD Opteron(tm) Processor 2218 stepping 02
Total of 4 processors activated (20800.25 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
checking TSC synchronization across 4 CPUs: 
CPU#0 had -131 usecs TSC skew, fixed it up.
CPU#1 had -131 usecs TSC skew, fixed it up.
CPU#2 had 131 usecs TSC skew, fixed it up.
CPU#3 had 131 usecs TSC skew, fixed it up.
Brought up 4 CPUs
migration_cost=506
checking if image is initramfs... it is
Freeing initrd memory: 3167k freed
NET: Registered protocol family 16
ACPI Exception (utmutex-0262): AE_BAD_PARAMETER, Thread C3518AA0 could not acquire Mutex [2] [20060707]
No dock devices found.
ACPI Exception (utmutex-0262): AE_BAD_PARAMETER, Thread C3518AA0 could not acquire Mutex [2] [20060707]
PCI: PCI BIOS revision 2.20 entry at 0xef3da, last bus=107
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: disabled
xen_mem: Initialising balloon driver.
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
HP xw9400 Workstation detected: disabling PCI segments
PCI: Transparent bridge - 0000:00:06.0
PCI: Discovered peer bus 40
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c04eda77
*pde = 00734001
Oops: 0000 [#1]
SMP 
last sysfs file: 
Modules linked in:
CPU:    0
EIP:    0060:[<c04eda77>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18-121.el5PAE #1) 
EIP is at pci_create_bus+0x47/0x19a
eax: 00000000   ebx: f7f9e600   ecx: 00000000   edx: 00000040
esi: f7f9e400   edi: c06aa030   ebp: 00000040   esp: c3517f68
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 1, ti=c3517000 task=c3518aa0 task.ti=c3517000)
Stack: 00000000 00000000 c06a9e04 00000040 00000000 00000000 c04ee864 00000000 
       c06a9e04 c0717675 00000000 c065d9b6 00000040 000010de 00000000 c0729a24 
       00000000 c06fa5a8 c06f5fd8 c0404e06 00000202 c06fa42b 00000000 00000000 
Call Trace:
 [<c04ee864>] pci_scan_bus_parented+0xa/0x1f
 [<c0717675>] pci_legacy_init+0xb6/0xdf
 [<c06fa5a8>] init+0x17d/0x24a
 [<c0404e06>] ret_from_fork+0x6/0x1c
 [<c06fa42b>] init+0x0/0x24a
 [<c06fa42b>] init+0x0/0x24a
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
Code: 00 00 a1 b4 a1 68 c0 ba d0 00 00 00 e8 53 fe f7 ff 85 c0 89 c6 0f 84 51 01 00 00 8b 44 24 1c 89 ea 89 7b 40 89 43 44 8b 4c 24 1c <8b> 01 e8 00 41 00 00 85 c0 89 04 24 0f 85 28 01 00 00 b8 28 22 
EIP: [<c04eda77>] pci_create_bus+0x47/0x19a SS:ESP 0068:c3517f68
 <0>Kernel panic - not syncing: Fatal exception

Version-Release number of selected component (if applicable):
kernel-2.6.18-121.el5

How reproducible:
always

Additional info:
Maybe interesting to check with bug 463418 -  [5.3] Kdump Kernel Panic at pci_create_bus+0x59/0x1f3.

Comment 1 Prarit Bhargava 2008-11-10 16:17:58 UTC

Cai, this seems to work on the xw9400 in my cube.  Which xw9400 did you test on?

P.

Comment 2 Prarit Bhargava 2008-11-10 16:51:27 UTC

Cai, this WORKSFORME with 122.el5 on the xw9400 in my cube, and hp-xw8600-01.rhts.bos.redhat.com in rhts.

I'll attach a dmesg from the xw8600,

P.

Comment 3 Prarit Bhargava 2008-11-10 16:53:49 UTC

Created attachment 323094 [details]
dmesg from xw8600 in RHTS

Comment 4 Qian Cai 2008-11-11 07:11:21 UTC

I have seen it on hp-xw9400-02.rhts.bos.redhat.com, although that is a on -121 Kernel. I would like to try -122 Kernel on it, but the machine is unavailable at the moment.

Comment 5 Qian Cai 2008-11-11 09:14:15 UTC

Prarit, the problem is still there. Yes, looks like it is working on x86-64 bare metal Kernel, but on both hp-xw9400-02.rhts.bos.redhat.com and hp-xw8600-01.rhts.bos.redhat.com, IA-32 bare metal and x86-64 Xen Domain 0 Kernel both are not working (IA-32 Xen Domain 0 Kernel is not tested). Please see attachments for boot logs.

Comment 6 Qian Cai 2008-11-11 09:16:09 UTC

Created attachment 323158 [details]
IA-32 bare metal Kernel was panicking.

Comment 7 Qian Cai 2008-11-11 09:16:54 UTC

Created attachment 323159 [details]
x86-64 Xen Domain 0 was panicking.

Comment 8 Qian Cai 2008-11-11 09:29:27 UTC

I have also tried -123.el5 Kernel on IA-32 bare metal. and it has the same problem -- working without "acpi=off"; panicking with it. Both boot logs have also been attached.

Comment 9 Qian Cai 2008-11-11 09:32:18 UTC

Created attachment 323162 [details]
IA-32 -123.el5 bare metal Kernel with acpi=off is panicking.

Comment 10 Qian Cai 2008-11-11 09:32:59 UTC

Created attachment 323163 [details]
IA-32 -123.el5 bare metal Kernel without acpi=off is working.

Comment 11 Qian Cai 2008-11-11 09:54:52 UTC

For your information, Kdump is working on hp-xw8600-01.rhts.bos.redhat.com with -123.el5 IA-32 Kernel now.

- # readelf -a /var/crash/127.0.0.1-2008-11-11-04:47:37/vmcore
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              CORE (Core file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         5
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no sections in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  NOTE           0x0000000000000158 0x0000000000000000 0x0000000000000000
                 0x00000000000004b4 0x00000000000004b4         0
  LOAD           0x000000000000060c 0x00000000c0000000 0x0000000000000000
                 0x00000000000a0000 0x00000000000a0000  RWE    0
  LOAD           0x00000000000a060c 0x00000000c0100000 0x0000000000100000
                 0x0000000000f00000 0x0000000000f00000  RWE    0
  LOAD           0x0000000000fa060c 0x00000000c9000000 0x0000000009000000
                 0x000000002f000000 0x000000002f000000  RWE    0
  LOAD           0x000000002ffa060c 0xffffffffffffffff 0x0000000038000000
                 0x0000000047fc2840 0x0000000047fc2840  RWE    0

There is no dynamic section in this file.

There are no relocations in this file.

There are no unwind sections in this file.

No version information found in this file.

Notes at offset 0x00000158 with length 0x000004b4:
  Owner         Data size       Description
  CORE          0x00000090      NT_PRSTATUS (prstatus structure)
  CORE          0x00000090      NT_PRSTATUS (prstatus structure)
  VMCOREINFO            0x00000354      Unknown note type: (0x00000000)

Comment 12 Qian Cai 2008-11-11 09:55:43 UTC

Created attachment 323164 [details]
IA-32 -123.el5 bare metal Kdump is working.

Comment 13 Prarit Bhargava 2008-11-11 11:32:23 UTC

Thanks Cai -- I'll get this back on my list.

P.

Comment 14 Prarit Bhargava 2008-11-11 13:39:46 UTC

(In reply to comment #6)
> Created an attachment (id=323158) [details]
> IA-32 bare metal Kernel was panicking.

Cai, this isn't panicking like the description says.  It looks like some other issue has caused the system install to fail...

P.

Comment 15 Prarit Bhargava 2008-11-11 14:44:58 UTC

(In reply to comment #7)
> Created an attachment (id=323159) [details]
> x86-64 Xen Domain 0 was panicking.

Cai,

Ostensibly this is happening because the fix for BZ 463418 has not been applied to the xen-specific code.

I'll get a system up-and-running and see if I can reproduce this.

P.

Comment 16 Qian Cai 2008-11-11 15:42:12 UTC

(In reply to comment #14)
> (In reply to comment #6)
> > Created an attachment (id=323158) [details] [details]
> > IA-32 bare metal Kernel was panicking.
> 
> Cai, this isn't panicking like the description says.  It looks like some other
> issue has caused the system install to fail...
> 
> P.

Yes, this is different form of panic, but it also only happen with acpi=off. Looks like it causes some problems for SATA devices.

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1: failed to recover some devices, retrying in 5 secs
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1: failed to recover some devices, retrying in 5 secs
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1: failed to recover some devices, retrying in 5 secs

Comment 17 Qian Cai 2008-11-11 15:45:43 UTC

I have reserved hp-xw9400-01.rhts.bos.redhat.com, and found both PAE and non-PAE Kernels are panicking like the description says. I'll attach boot logs of them here.

Comment 18 Qian Cai 2008-11-11 15:46:51 UTC

Created attachment 323184 [details]
xw9400 is panicking with non-PAE Kernel.

Comment 19 Qian Cai 2008-11-11 15:47:22 UTC

Created attachment 323185 [details]
xw9400 is panicking with PAE Kernel.

Comment 21 Prarit Bhargava 2008-11-11 19:37:42 UTC

Created attachment 323223 [details]
RHEL5 fix for this issue

Comment 22 Anton Arapov 2008-11-11 21:06:35 UTC

just fyi. might be related crash:
https://www.redhat.com/archives/rhelv5-list/2008-November/msg00033.html

Comment 23 Prarit Bhargava 2008-11-12 00:36:33 UTC

(In reply to comment #22)
> just fyi. might be related crash:
> https://www.redhat.com/archives/rhelv5-list/2008-November/msg00033.html

Yeah Anton -- that is the same issue.  I'm putting this back into ASSIGNED for the moment.  I'm going to rework the patch to come up with a more comprehensive solution.

P.

Comment 24 Prarit Bhargava 2008-11-12 19:31:28 UTC

This is a regression from previous behavior.  This also breaks kdump on some systems.

P.

Comment 25 Prarit Bhargava 2008-11-12 19:31:59 UTC

Created attachment 323378 [details]
RHEL5 fix for this issue

Comment 26 RHEL Program Management 2008-11-12 21:10:43 UTC

This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 28 RHEL Program Management 2009-01-27 20:38:41 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 29 Prarit Bhargava 2009-02-10 13:38:52 UTC

*** Bug 480914 has been marked as a duplicate of this bug. ***

Comment 30 RHEL Program Management 2009-02-16 15:26:38 UTC

Updating PM score.

Comment 32 Veaceslav Falico 2009-03-27 14:01:50 UTC

Patch from #25 doesn't fix the issue.


Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:  
[<ffffffff803433bd>] pci_create_bus+0x59/0x1f3
PGD 0  
Oops: 0000 [1] SMP  
last sysfs file:  
CPU 0  
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-128.el5.it274645xen #1
RIP: e030:[<ffffffff803433bd>]  [<ffffffff803433bd>] pci_create_bus+0x59/0x1f3
RSP: e02b:ffff880006141d50  EFLAGS: 00010286
RAX: ffff88002ff8a000 RBX: ffff88002ff93200 RCX: 0000000000000000
RDX: ffffffffff578000 RSI: 0000000000000005 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff88002ff93400 R09: 0000000000000000
R10: ffff880006141da0 R11: 0000000000000100 R12: ffff88002ff8a000
R13: 0000000000000005 R14: ffffffff80543a70 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process swapper (pid: 1, threadinfo ffff880006140000, task ffff8800000297a0)
Stack:  0000000000000005  0000000000000005  0000000000000004  0000000000000000  
0000000000000000  0000000000000000  0000000000000000  ffffffff8034428d  
0000000000000005  ffffffff8065031d  
Call Trace:
[<ffffffff8034428d>] pci_scan_bus_parented+0x6/0x21
[<ffffffff8065031d>] pcibios_irq_init+0x177/0x491
[<ffffffff806347e5>] init+0x1f9/0x2fe
[<ffffffff8025fb2c>] child_rip+0xa/0x12
[<ffffffff806345ec>] init+0x0/0x2fe
[<ffffffff8025fb22>] child_rip+0x0/0x12


Code: 8b 7d 00 e8 e2 43 00 00 48 85 c0 0f 85 68 01 00 00 48 c7 c7  
RIP  [<ffffffff803433bd>] pci_create_bus+0x59/0x1f3
RSP <ffff880006141d50>
CR2: 0000000000000000
<0>Kernel panic - not syncing: Fatal exception
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.

Comment 33 Veaceslav Falico 2009-04-02 10:49:02 UTC

Created attachment 337771 [details]
proposed patch



Customer tested and confirmed the patch works. It's the same patch as yours, based on your idea, only it's adding modifications to mach-xen pci.h files.

Could you also say when the patch will be applied? 

Thank you!

Comment 34 Prarit Bhargava 2009-04-02 12:37:33 UTC

(In reply to comment #33)
> Created an attachment (id=337771) [details]
> proposed patch
> 
> 
> 
> Customer tested and confirmed the patch works. It's the same patch as yours,
> based on your idea, only it's adding modifications to mach-xen pci.h files.
> 

Patch looks good.  I didn't even think about virt kernels.

> Could you also say when the patch will be applied? 

Whenever dzickus gets around to applying it :)

P.


> 
> Thank you!

Comment 35 Chris Lalancette 2009-04-06 15:35:37 UTC

*** Bug 494114 has been marked as a duplicate of this bug. ***

Comment 36 Don Zickus 2009-04-06 21:16:30 UTC

in kernel-2.6.18-138.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 38 Hiroto Shibuya 2009-04-13 20:47:48 UTC

My system started crashing during boot up after installing 5.3 with similar 
stack trace and verified that kernel-2.6.18-138.el5 fixes the problem.

Comment 39 Prarit Bhargava 2009-04-15 14:31:23 UTC

*** Bug 494697 has been marked as a duplicate of this bug. ***

Comment 40 Shad L. Lords 2009-04-15 15:51:26 UTC

Bug 494697 that I opened wasn't booting with acpi=off but acpi=ht.  Not sure if this makes a difference but wanted to let you know.

Haven't had a chance to test the patch.

Comment 41 Prarit Bhargava 2009-04-15 15:54:00 UTC

(In reply to comment #40)
> Bug 494697 that I opened wasn't booting with acpi=off but acpi=ht.  Not sure if
> this makes a difference but wanted to let you know.
> 
> Haven't had a chance to test the patch.  

Hi Shad,

The actual problem is an issue with multiple PCI domains, not ACPI.  The change in ACPI causes the system to go from single to multiple PCI domains.

I have a good feeling that this patch will fix your problem :)

P.

Comment 42 Hiroto Shibuya 2009-04-15 16:01:10 UTC

I was also crashing while booting with acpi=ht and this patch fixed it.

You should remove 'with "acpi=off"' from thebug summary.

Comment 43 Chris Ward 2009-07-03 18:12:25 UTC

~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 45 Juha Tuomala 2009-07-13 17:48:01 UTC

I got bitten by this with old Tyan PIII motherboard. 5.2 boot.iso appears to boot fine. Also tried to remove Adaptec SCSI card and then system booted with the 5.3 installation image too.

Comment 47 errata-xmlrpc 2009-09-02 08:52:11 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Comment 50 david ahern 2009-12-21 17:24:56 UTC

The patch (linux-2.6-x86-fix-calls-to-pci_scan_bus.patch) shows:
+               struct pci_sysdata *sd;
+
+               sd = kzalloc(sizeof(&sd), GFP_KERNEL);
+               if (!sd)
+                       panic("Cannot allocate PCI domain sysdata");


Should the sizeof(&sd) be sizeof(*sd)? ie., allocate space for struct pci_sysdata as opposed to space for a pointer

Note You need to log in before you can comment on or make changes to this bug.