Hide Forgot
Description of problem: System Panic'd while booting Version-Release number of selected component (if applicable): kernel-xen 2.6.18-262.el5 Xen-3.1.2-262.el5 How reproducible: I have only seen this Panic once Actual results: Press any key to continue. Press any key to continue. Press any key to continue. Press any key to continue. Press any key to continue. GNU GRUB version 0.97 (635K lower / 2592272K upper memory) +-------------------------------------------------------------------------+ Use the ^ and v keys to select which entry is highlighted. Press enter to boot the selected OS, 'e' to edit the commands before booting, 'a' to modify the kernel arguments before booting, or 'c' for a command-line. Red Hat Enterprise Linux Server (2.6.18-262.el5xen) Red Hat Enterprise Linux Server (2.6.18-261.el5xen) Red Hat Enterprise Linux Server-base The highlighted entry will be booted automatically in 5 seconds. The highlighted entry will be booted automatically in 4 seconds. The highlighted entry will be booted automatically in 3 seconds. The highlighted entry will be booted automatically in 2 seconds. The highlighted entry will be booted automatically in 1 seconds. Booting 'Red Hat Enterprise Linux Server (2.6.18-262.el5xen)' root (hd0,0) Filesystem type is ext2fs, partition type 0x83 kernel /xen.gz-2.6.18-262.el5 com1=115200 [Multiboot-elf, <0x100000:0xee5c8:0x164a38>, shtab=0x353078, entry=0x100000] module /vmlinuz-2.6.18-262.el5xen ro root=/dev/VolGroup00/LogVol00 console=ttyS 0,115200 rhgb quiet crashkernel=128M@16M [Multiboot-module @ 0x354000, 0xa8d560 bytes] module /initrd-2.6.18-262.el5xen.img [Multiboot-module @ 0xde2000, 0x8e3e00 bytes] __ __ _____ _ ____ ____ __ ____ _ ____ \ \/ /___ _ __ |___ / / | |___ \ |___ \ / /_|___ \ ___| | ___| \ // _ \ '_ \ |_ \ | | __) |__ __) | '_ \ __) | / _ \ |___ \ / \ __/ | | | ___) || |_ / __/|__/ __/| (_) / __/ | __/ |___) | /_/\_\___|_| |_| |____(_)_(_)_____| |_____|\___/_____(_)___|_|____/ http://www.cl.cam.ac.uk/netos/xen University of Cambridge Computer Laboratory Xen version 3.1.2-262.el5 (mockbuild) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) Mon May 16 17:45:12 EDT 2011 Latest ChangeSet: unavailable (XEN) Command line: com1=115200 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: none; EDID transfer time: 2 seconds (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 2 MBR signatures (XEN) Found 2 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009ec00 (usable) (XEN) 000000000009ec00 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 000000009e484000 (usable) (XEN) 000000009e484000 - 000000009e53d000 (ACPI NVS) (XEN) 000000009e53d000 - 000000009fa42000 (usable) (XEN) 000000009fa42000 - 000000009fa9a000 (reserved) (XEN) 000000009fa9a000 - 000000009fad0000 (usable) (XEN) 000000009fad0000 - 000000009fb1a000 (ACPI NVS) (XEN) 000000009fb1a000 - 000000009fb2c000 (usable) (XEN) 000000009fb2c000 - 000000009fb3a000 (ACPI data) (XEN) 000000009fb3a000 - 000000009fc00000 (usable) (XEN) 000000009fc00000 - 00000000b0000000 (reserved) (XEN) 00000000ffe00000 - 00000000ffe0c000 (reserved) (XEN) 0000000100000000 - 0000000460000000 (usable) (XEN) System RAM: 16378MB (16771284kB) (XEN) Xen heap: 13MB (13440kB) (XEN) Domain heap initialised: DMA width 32 bits (XEN) Processor #0 6:15 APIC version 20 (XEN) Processor #6 6:15 APIC version 20 (XEN) Processor #1 6:15 APIC version 20 (XEN) Processor #7 6:15 APIC version 20 (XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 (XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47 (XEN) IOAPIC[2]: apic_id 10, version 32, address 0xfec84000, GSI 48-71 (XEN) IOAPIC[3]: apic_id 11, version 32, address 0xfec84400, GSI 72-95 (XEN) Enabling APIC mode: Flat. Using 4 I/O APICs (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 2660.056 MHz processor. (XEN) HVM: VMX enabled (XEN) VMX: MSR intercept bitmap enabled (XEN) I/O virtualisation disabled (XEN) CPU0: Intel Genuine Intel(R) CPU @ 2.66GHz stepping 04 (XEN) Booting processor 1/6 eip 90000 (XEN) Not responding. (XEN) Inquiring remote APIC #6... (XEN) ... APIC #6 ID: failed (XEN) ... APIC #6 VERSION: failed (XEN) ... APIC #6 SPIV: failed (XEN) CPU #6 not responding - cannot use it. (XEN) Booting processor 1/1 eip 90000 (XEN) CPU1: Intel Genuine Intel(R) CPU @ 2.66GHz stepping 04 (XEN) Booting processor 2/7 eip 90000 (XEN) CPU2: Intel Genuine Intel(R) CPU @ 2.66GHz stepping 04 (XEN) Total of 3 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) CPU#2 already initialized! (XEN) huh, phys CPU#7, CPU#2 already present?? (XEN) Xen BUG at smpboot.c:334 (XEN) ----[ Xen-3.1.2-262.el5 x86_64 debug=n Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff828c801e308f>] smp_callin+0x5f/0x1d0 (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff83009ea0ffff rcx: 0000000000000cca (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff828c803164a9 (XEN) rbp: 0000000000000002 rsp: ffff83009ea0fe80 r8: ffff8300000b8000 (XEN) r9: 0000000000000000 r10: 00000000fffffffc r11: ffff828c80121bb0 (XEN) r12: 0000000000000002 r13: 0000000000000002 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026b0 (XEN) cr3: 000000009ecfb000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff83009ea0fe80: (XEN) 83044e7e4000efff ffff83009ea0ffff 0000000000000002 ffff83009ea0ff28 (XEN) 0000000000000002 0000000000000000 0000000000000000 ffff828c801e3344 (XEN) 5555555555555555 5555555555555555 5555555555555555 5555555555555555 (XEN) 5555555555555555 5555555555555555 5555555555555555 0000000000000001 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 5555555555555555 5555555555555555 5555555555555555 (XEN) 5555555555555555 5555555555555555 5555555555555555 5555555555555555 (XEN) 5555555555555555 5555555555555555 5555555555555555 5555555555555555 (XEN) 5555555555555555 5555555555555555 5555555555555555 5555555555555555 (XEN) 5555555555555555 5555555555555555 5555555555555555 5555555555555555 (XEN) 5555555555555555 5555555555555555 5555555555555555 5555555555555555 (XEN) 5555555555555555 5555555555555555 5555555500000002 ffff83009f9fa080 (XEN) Xen call trace: (XEN) [<ffff828c801e308f>] smp_callin+0x5f/0x1d0 (XEN) [<ffff828c801e3344>]Platform timer overflows in 14998 jiffies. (XEN) start_secondary+0xa4/0x460 (XEN) (XEN) Platform timer is 14.318MHz HPET (XEN) (XEN) **************************************** (XEN) Panic on CPU 2: (XEN) Xen BUG at smpboot.c:334 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Expected results: System should boot. Additional info: Booting the baremetal kernel I do not see the CPU error messages.
Potential BIOS problem -- Jeff, can you please confirm if the machine has the most recent BIOS installed? We've seen a different bug (with possibly different background) that has the exact same symptoms -- failure to bring up a PCPU in a timely manner, then finding it later on-line, unexpectedly. Let's say any given PCPU has the same probability "p" to come up correctly, on these "problematic" systems, 0 << p < 1, and suppose PCPU's are independent. We have a problem if at least one PCPU fails to come up. P(we've got a problem) = P(at least one PCPU fails to come up) = P(not(all PCPU's okay)) = 1 - P(all PCPU's okay) = 1 - p ** numcpus Let's assume p = 0.99: numcpus P(problem) ------- ---------- 1 0.01 4 ~0.04 (this machine has 4 PCPUs) 32 ~0.28 (that other machine) I'm afraid if we'll insist enough, we'll see more panics on this machine. Do we have any hint whether this threatens to be a regression?
Hi Jeff, can you please check if the machine has the most recent BIOS installed? Also, have you seen earlier Xen builds boot on the machine? Thanks!
(In reply to comment #2) > P(we've got a problem) = P(at least one PCPU fails to come up) (We have seen cases when several PCPUs could not be brought up, but the boot succeded otherwise. I think this still qualifies as "we've got a problem" -- domains can't use the "missing" processors.)