Bug 481500
Summary: | RHEL5.3 fail to boot with Xen kernel with kernel panic on pci_create_bus | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | jpc <100p100> |
Component: | xen | Assignee: | Don Dutile (Red Hat) <ddutile> |
Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | urgent | Docs Contact: | |
Priority: | low | ||
Version: | 5.3 | CC: | 100p100, clalance, rhelbugzilla, riel, xen-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-29 20:24:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
jpc
2009-01-25 18:18:09 UTC
Thanks for the report, sorry it does not work for you. Did you just update or do a fresh install (wondering if the bare metal kernel works on this hardware). We just updated. The system have been installed on 52. Hardware is Supermicro X5DP8 with 2Gb Ram, 2xXeon 2.8Ghz, 0 channel raid controller 2010s from Adaptec. I dont understand '(wondering if the bare metal kernel works on this hardware)'. The non-Xen Kernel works. FYI, here's the lscpi command output, to have a idea of the hardware : 00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) 00:00.1 Class ff00: Intel Corporation E7500/E7501 Host RASUM Controller (rev 01) 00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01) 00:03.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface C PCI-to-PCI Bridge (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB Controller #3 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) 00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02) 01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 03:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01) 03:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01) 04:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 04:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 04:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 04:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 06:01.0 RAID bus controller: Adaptec (formerly DPT) SmartRAID V Controller (rev 01) 07:01.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) By bare metal I meant the non-Xen kernel. Thanks for the info on the hardware. We will look into this. Du you want any additionnal information ? (dmidecode, complete list of installed packages, lsmod for non-Xen working kernel, initrd content...) Any news? I am sorry, but bugzilla is not a support mechanism. Bugs in here are generally prioritized by customer support ranking. If there is urgency in getting this bug fixed, please open a ticket with customer support. Otherwise it will have to wait its turn with the other engineering TODO items. jpc: odd.... I had posted a reply for needinfo, but it seems like bugzilla didn't take it, or I messed up the commit. so, here's what I requested: (a) pls send the addiitonal info you mentioned above: (i) installed pkg's (ii) dmidecode (iii) lsmod as well as: (iv) boot log of bare-metal/non-xen kernel. (b) the machine appears to be exporting 5 ioapics but the kernel is being booted with noacpi; i'm not sure how well 5 ioapics can be supported w/o acpi tables. so, trying a boot w/o noacpi set would be a good test. (c) the boot log shows: Using x86 segment limits to approximate NX protection which implies the NX/NoExec (data segment as code segment) is not turned on in the BIOS; if so, try turning it on & booting the i686 kernel-xen kernel; I recall a problem w/i686-kernel-xen with NX bit not set, but I cannot find a BZ with that in it, or a log of it in upstream xen. thus the reason to try/check this bios setting. (d) also, ensure your system has the most up-to-date BIOS code on it; a number of kernel-xen problems have been cured with proper BIOS (acpi tables), and the boot log also shows: ACPI Exception (utmutex-0262): AE_BAD_PARAMETER, Thread C0E7DAA0 could not acquire Mutex [2] [20060707] No dock devices found. ACPI Exception (utmutex-0262): AE_BAD_PARAMETER, Thread C0E7DAA0 could not acquire Mutex [2] [20060707] Dear Don, With your indications, I've checked the Bios, and ACPI was no activated. My grub config was not disactivating ACPI. I've activated it on the bios, and the system is now booting fine with a Xen kernel. Bios was up to date, with latests ACPI tables. I'm just suprised it was booting with the previous Kernel (i checked again this moning, it just hangs when starting DOMU, but i think it's normal since a lot of libraries have been updated in the meantime...). It seems on kernel-xen-2.6.18-128.el5, ACPI is MANDATORY.... Thanks a lot for your help. jpc: Thanks for the feedback. It's interesting that there's a diff in this area. I will close the BZ, but I'll be monitoring xen-maint bz's for any other instances of it. btw -- did you check whether your BIOS let's you turn on/off NX, and if it does, what state it is in now (and maybe, what state it was in before?). Oh, and I wouldn't expect a library update to cause a previously booting DOMU on rhel5.3 to stop booting. as long as the virt tools & libraries were all updated, then they should be in synch, and a previous DOMU should boot on a previous dom0. - Don Don: I didn't check the NX status on BIOS, and I don't remember if there was a config for it it the bios... Unfortunately, the system is in a data center. I'll try to check it on my next trip to it. For the other point, DOMU was fully updated, and previous DOM0 was trying to boot latest DOMU... I've seen also within the latest update there where dependencies from latest kernel with Glib, libvirt, etc.. That's why i suspected to boot with a fully updated system with only the previous kernel on DOM0 was not a good idea ;). JPC I can confirm that I'm also seeing this panic behavior with ONLY the .128 kernel revisions. What's uniquie is this happens with either the stock or Xen-enabled kernels. All previous Centos 5.x kernels (5.0, 5.1, 5.2) work fine. The affected system is an old 2x1ghz P3 and though power management is enabled, there isn't any options to enable or disable ACPI. Are there any recommendations/changes to recompile the kernel to get things working with this new kernel? It's actually rather important that I get this system working as it serves an important role. --David root (hd0,0) Filesystem type is ext2fs, partition type 0x83 kernel /xen.gz-2.6.18-128.1.6.el5 com1=9600,8n1 [Multiboot-elf, <0x100000:0xae4c4:0x51b3c>, shtab=0x200078, entry=0x100000] module /vmlinuz-2.6.18-128.1.6.el5xen ro root=/dev/VolGroup00/LogVol00 console= xvc console=tty xencons=xvc [Multiboot-module @ 0x201000, 0x47e0a0 bytes] module /initrd-2.6.18-128.1.6.el5xen.img [Multiboot-module @ 0x680000, 0x6c7800 bytes] __ __ _____ _ ____ _ ____ ___ _ __ _ ____ \ \/ /___ _ __ |___ / / | |___ \ / |___ \( _ ) / | / /_ ___| | ___| \ // _ \ '_ \ |_ \ | | __) |__| | __) / _ \ | || '_ \ / _ \ |___ \ / \ __/ | | | ___) || |_ / __/|__| |/ __/ (_) || || (_) | __/ |___) | /_/\_\___|_| |_| |____(_)_(_)_____| |_|_____\___(_)_(_)___(_)___|_|____/ http://www.cl.cam.ac.uk/netos/xen University of Cambridge Computer Laboratory Xen version 3.1.2-128.1.6.el5 (mockbuild) (gcc version 4.1.2 2008079 Latest ChangeSet: unavailable (XEN) Command line: com1=9600,8n1 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: none; EDID transfer time: 2 seconds (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009fc00 (usable) (XEN) 000000000009fc00 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 0000000060000000 (usable) (XEN) 00000000fec00000 - 00000000fec02000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000fff80000 - 0000000100000000 (reserved) (XEN) System RAM: 1535MB (1572476kB) (XEN) ACPI: Unable to locate RSDP (XEN) Xen heap: 9MB (10184kB) (XEN) Domain heap initialised: DMA width 32 bits (XEN) PAE enabled, limit: 16 GB (XEN) Processor #0 6:8 APIC version 17 (XEN) Processor #1 6:8 APIC version 17 (XEN) Enabling APIC mode: Flat. Using 2 I/O APICs (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 996.884 MHz processor. (XEN) CPU0: Intel Pentium III (Coppermine) stepping 0a (XEN) Booting processor 1/1 eip 90000 (XEN) CPU1: Intel Pentium III (Coppermine) stepping 0a (XEN) Total of 2 processors activated. (XEN) ExtINT not setup in hardware but reported by MP table (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC (XEN) ..... (found pin 0) ...works. (XEN) Platform timer overflows in 2 jiffies. (XEN) Platform timer is 1.193MHz PIT (XEN) Brought up 2 CPUs (XEN) ACPI is disabled, notifying Domain 0 (acpi=off) (XEN) *** LOADING DOMAIN 0 *** (XEN) elf_parse_binary: phdr: paddr=0xc0400000 memsz=0x2756cc (XEN) elf_parse_binary: phdr: paddr=0xc0676000 memsz=0x161000 (XEN) elf_parse_binary: memory: 0xc0400000 -> 0xc07d7000 (XEN) elf_xen_parse_note: GUEST_OS = "linux" (XEN) elf_xen_parse_note: GUEST_VERSION = "2.6" (XEN) elf_xen_parse_note: XEN_VERSION = "xen-3.0" (XEN) elf_xen_parse_note: VIRT_BASE = 0xc0000000 (XEN) elf_xen_parse_note: PADDR_OFFSET = 0xc0000000 (XEN) elf_xen_parse_note: ENTRY = 0xc0400000 (XEN) elf_xen_parse_note: HYPERCALL_PAGE = 0xc0401000 (XEN) elf_xen_parse_note: FEATURES = "writable_page_tables|writable_descriptor_" (XEN) elf_xen_parse_note: PAE_MODE = "yes" (XEN) elf_xen_parse_note: LOADER = "generic" (XEN) elf_xen_addr_calc_check: addresses: (XEN) virt_base = 0xc0000000 (XEN) elf_paddr_offset = 0xc0000000 (XEN) virt_offset = 0x0 (XEN) virt_kstart = 0xc0400000 (XEN) virt_kend = 0xc07d7000 (XEN) virt_entry = 0xc0400000 (XEN) Xen kernel: 32-bit, PAE, lsb (XEN) Dom0 kernel: 32-bit, PAE, lsb, paddr 0xc0400000 -> 0xc07d7000 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 000000003c000000->000000003e000000 (330720 pages to be al) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: c0400000->c07d7000 (XEN) Init. ramdisk: c07d7000->c0e9e800 (XEN) Phys-Mach map: c0e9f000->c0fe9f80 (XEN) Start info: c0fea000->c0fea46c (XEN) Page tables: c0feb000->c0ffa000 (XEN) Boot stack: c0ffa000->c0ffb000 (XEN) TOTAL: c0000000->c1400000 (XEN) ENTRY ADDRESS: c0400000 (XEN) Dom0 has maximum 2 VCPUs (XEN) elf_load_binary: phdr 0 at 0xc0400000 -> 0xc06756cc (XEN) elf_load_binary: phdr 1 at 0xc0676000 -> 0xc071fc64 (XEN) Initrd len 0x6c7800, start at 0xc07d7000 (XEN) Scrubbing Free RAM: ..done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xe. (XEN) Freed 96kB init memory. Linux version 2.6.18-128.1.6.el5xen (mockbuild.org) (gcc versi9 BIOS-provided physical RAM map: Xen: 0000000000000000 - 00000000533e0000 (usable) 603MB HIGHMEM available. 727MB LOWMEM available. Using x86 segment limits to approximate NX protection found SMP MP-table at 000ff780 DMI 2.3 present. Intel MultiProcessor Specification v1.4 Virtual Wire compatibility mode. OEM ID: AMI Product ID: CNB30LE APIC at: 0xFEE00000 I/O APIC #4 Version 17 at 0xFEC00000. I/O APIC #5 Version 17 at 0xFEC01000. Enabling APIC mode: Flat. Using 2 I/O APICs Processors: 2 Built 1 zonelists. Total pages: 340960 Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=xvc console=tty xf Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 CPU 0 irqstacks, hard=c0744000 soft=c0724000 PID hash table entries: 4096 (order: 12, 16384 bytes) Xen reported: 1390.116 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Software IO TLB enabled: Aperture: 2 megabytes Kernel range: 0x00000000c1b52000 - 0x00000000c1d52000 vmalloc area: ee000000-f4ffe000, maxmem 2d7fe000 Memory: 1327284k/1363840k available (2125k kernel code, 27400k reserved, 876k d) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 3493.73 BogoMIPS (lpj=6987477) Security Framework v1.0.0 initialized SELinux: Initializing. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 512 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K (XEN) traps.c:1761:d0 Domain attempted WRMSR 00000119 from 00000000:9e9db4b7 to. CPU serial number disabled. Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code CPU 1 irqstacks, hard=c0745000 soft=c0725000 ExtINT not setup in hardware but reported by MP table ENABLING IO-APIC IRQs (XEN) ioapic_guest_write: apic=0, pin=2, old_irq=-1, new_irq=0 (XEN) ioapic_guest_write: old_entry=00010000, new_entry=000009f0 (XEN) ioapic_guest_write: Attempt to add IO-APIC pin for in-use IRQ! SMP alternatives: switching to SMP code Initializing CPU#1 Brought up 2 CPUs Warning Timer ISR/0: Time went backwards: delta=-25438772 delta_cpu=10561228 sh5 0: 1612003935 1: 1648003935 Warning Timer ISR/0: Time went backwards: delta=-27316815 delta_cpu=12683185 sh5 0: 1620003935 1: 1660003935 Warning Timer ISR/0: Time went backwards: delta=-25319235 delta_cpu=10680765 sh5 0: 1632003935 1: 1668003935 Warning Timer ISR/0: Time went backwards: delta=-27157333 delta_cpu=12842667 sh5 0: 1640003935 1: 1680003935 Warning Timer ISR/0: Time went backwards: delta=-29082799 delta_cpu=10917201 sh5 0: 1652003935 1: 1692003935 Warning Timer ISR/0: Time went backwards: delta=-38600364 delta_cpu=13399636 sh5 0: 1660003935 1: 1712003935 Warning Timer ISR/0: Time went backwards: delta=-40058188 delta_cpu=11941812 sh5 0: 1672003935 1: 1724003935 Warning Timer ISR/0: Time went backwards: delta=-37776661 delta_cpu=14223339 sh5 0: 1680003935 1: 1732003935 Warning Timer ISR/0: Time went backwards: delta=-39626137 delta_cpu=12373863 sh5 0: 1692003935 1: 1744003935 Warning Timer ISR/0: Time went backwards: delta=-37519617 delta_cpu=10480383 sh5 0: 1704003935 1: 1752003935 migration_cost=896 checking if image is initramfs... it is Freeing initrd memory: 6942k freed Grant table initialized NET: Registered protocol family 16 ACPI Exception (utmutex-0262): AE_BAD_PARAMETER, Thread C0FE9AA0 could not acqu] No dock devices found. ACPI Exception (utmutex-0262): AE_BAD_PARAMETER, Thread C0FE9AA0 could not acqu] PCI: Using configuration type 1 Setting up standard PCI resources Allocating PCI resources starting at 70000000 (gap: 60000000:9ec00000) ACPI: Interpreter disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI: disabled xen_mem: Initialising balloon driver. usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Probing PCI hardware PCI: Firmware left 0000:00:04.0 e100 interrupts enabled, disabling PCI: Firmware left 0000:00:05.0 e100 interrupts enabled, disabling BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000 printing eip: c04ecdb3 00feb000 -> *pde = 00000000:3cfec001 00fec000 -> *pme = 00000000:00000000 Oops: 0000 [#1] SMP last sysfs file: Modules linked in: CPU: 0 EIP: 0061:[<c04ecdb3>] Not tainted VLI EFLAGS: 00010286 (2.6.18-128.1.6.el5xen #1) EIP is at pci_create_bus+0x47/0x19a eax: 00000000 ebx: c0e38000 ecx: 00000000 edx: 00000001 esi: c0e0ce00 edi: c06a0cd0 ebp: 00000001 esp: c0fe8e70 ds: 007b es: 007b ss: e021 Process swapper (pid: 1, ti=c0fe8000 task=c0fe9aa0 task.ti=c0fe8000) Stack: c060e575 00000000 00000001 000000c0 c0fe8f9f f57fe000 c04edb93 00000000 00000001 c070dd4e 00000000 01689d80 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Call Trace: [<c060e575>] wait_for_completion+0x8f/0x97 [<c04edb93>] pci_scan_bus_parented+0xa/0x1f [<c070dd4e>] pcibios_irq_init+0x153/0x432 [<c06f45a8>] init+0x17d/0x250 [<c04052ce>] ret_from_fork+0x6/0x1c [<c06f442b>] init+0x0/0x250 [<c06f442b>] init+0x0/0x250 [<c0403005>] kernel_thread_helper+0x5/0xb ======================= Code: 00 00 a1 94 24 68 c0 ba d0 00 00 00 e8 f5 e4 f7 ff 85 c0 89 c6 0f 84 51 0 EIP: [<c04ecdb3>] pci_create_bus+0x47/0x19a SS:ESP e021:c0fe8e70 <0>Kernel panic - not syncing: Fatal exception (XEN) Domain 0 crashed: rebooting machine in 5 seconds. (In reply to comment #12) > I can confirm that I'm also seeing this panic behavior with ONLY the .128 > kernel revisions. What's uniquie is this happens with either the stock or > Xen-enabled kernels. All previous Centos 5.x kernels (5.0, 5.1, 5.2) work > fine. The affected system is an old 2x1ghz P3 and though power management is > enabled, there isn't any options to enable or disable ACPI. Are there any > recommendations/changes to recompile the kernel to get things working with this > new kernel? It's actually rather important that I get this system working as > it serves an important role. This was actually prematurely closed as NOTABUG. See https://bugzilla.redhat.com/show_bug.cgi?id=494114 for further discussion of this issue, plus the patch that will be going into 5.4 to fix the issue. For now, you can use the virttest kernels at http://people.redhat.com/clalance/virttest (which have the patch applied), although they are not supported in any way. Chris Lalancette |