Bug 706097

Summary: [RHEL5.7] [kernel-xen] (XEN) Xen BUG at smpboot.c:334
Product: Red Hat Enterprise Linux 5 Reporter: Jeff Burke <jburke>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.7CC: jstancek, jwilson, jzheng, leiwang, lersek, mrezanin, pbunyan, pcao, qwan, xen-maint, yuzhang, yuzhou
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-29 09:14:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 514490    

Description Jeff Burke 2011-05-19 13:36:40 UTC
Description of problem:
 System Panic'd while booting

Version-Release number of selected component (if applicable):
kernel-xen 2.6.18-262.el5
Xen-3.1.2-262.el5

How reproducible:
 I have only seen this Panic once
  
Actual results:

 Press any key to continue. 
 Press any key to continue. 
 Press any key to continue. 
 Press any key to continue. 
 Press any key to continue. 
 
    GNU GRUB  version 0.97  (635K lower / 2592272K upper memory) 
 
+-------------------------------------------------------------------------+
      Use the ^ and v keys to select which entry is highlighted. 
      Press enter to boot the selected OS, 'e' to edit the 
      commands before booting, 'a' to modify the kernel arguments 
      before booting, or 'c' for a command-line. 
 Red Hat Enterprise Linux Server (2.6.18-262.el5xen)
 Red Hat Enterprise Linux Server (2.6.18-261.el5xen)
 Red Hat Enterprise Linux Server-base 

The highlighted entry will be booted automatically in 5 seconds.
The highlighted entry will be booted automatically in 4 seconds.
The highlighted entry will be booted automatically in 3 seconds.
The highlighted entry will be booted automatically in 2 seconds.
The highlighted entry will be booted automatically in 1 seconds.

  Booting 'Red Hat Enterprise Linux Server (2.6.18-262.el5xen)' 
 
root (hd0,0) 
 Filesystem type is ext2fs, partition type 0x83 
kernel /xen.gz-2.6.18-262.el5 com1=115200 
   [Multiboot-elf, <0x100000:0xee5c8:0x164a38>, shtab=0x353078, entry=0x100000] 
module /vmlinuz-2.6.18-262.el5xen ro root=/dev/VolGroup00/LogVol00 console=ttyS 
0,115200 rhgb quiet crashkernel=128M@16M 
   [Multiboot-module @ 0x354000, 0xa8d560 bytes] 
module /initrd-2.6.18-262.el5xen.img 
   [Multiboot-module @ 0xde2000, 0x8e3e00 bytes] 
 
 __  __            _____  _   ____    ____   __  ____        _ ____   
 \ \/ /___ _ __   |___ / / | |___ \  |___ \ / /_|___ \   ___| | ___|  
  \  // _ \ '_ \    |_ \ | |   __) |__ __) | '_ \ __) | / _ \ |___ \  
  /  \  __/ | | |  ___) || |_ / __/|__/ __/| (_) / __/ |  __/ |___) | 
 /_/\_\___|_| |_| |____(_)_(_)_____| |_____|\___/_____(_)___|_|____/  
                                                                      
 http://www.cl.cam.ac.uk/netos/xen 
 University of Cambridge Computer Laboratory 
 
 Xen version 3.1.2-262.el5 (mockbuild) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) Mon May 16 17:45:12 EDT 2011 
 Latest ChangeSet: unavailable 
 
(XEN) Command line: com1=115200 
(XEN) Video information: 
(XEN)  VGA is text mode 80x25, font 8x16 
(XEN)  VBE/DDC methods: none; EDID transfer time: 2 seconds 
(XEN)  EDID info not retrieved because no DDC retrieval method detected 
(XEN) Disc information: 
(XEN)  Found 2 MBR signatures 
(XEN)  Found 2 EDD information structures 
(XEN) Xen-e820 RAM map: 
(XEN)  0000000000000000 - 000000000009ec00 (usable) 
(XEN)  000000000009ec00 - 0000000000100000 (reserved) 
(XEN)  0000000000100000 - 000000009e484000 (usable) 
(XEN)  000000009e484000 - 000000009e53d000 (ACPI NVS) 
(XEN)  000000009e53d000 - 000000009fa42000 (usable) 
(XEN)  000000009fa42000 - 000000009fa9a000 (reserved) 
(XEN)  000000009fa9a000 - 000000009fad0000 (usable) 
(XEN)  000000009fad0000 - 000000009fb1a000 (ACPI NVS) 
(XEN)  000000009fb1a000 - 000000009fb2c000 (usable) 
(XEN)  000000009fb2c000 - 000000009fb3a000 (ACPI data) 
(XEN)  000000009fb3a000 - 000000009fc00000 (usable) 
(XEN)  000000009fc00000 - 00000000b0000000 (reserved) 
(XEN)  00000000ffe00000 - 00000000ffe0c000 (reserved) 
(XEN)  0000000100000000 - 0000000460000000 (usable) 
(XEN) System RAM: 16378MB (16771284kB) 
(XEN) Xen heap: 13MB (13440kB) 
(XEN) Domain heap initialised: DMA width 32 bits 
(XEN) Processor #0 6:15 APIC version 20 
(XEN) Processor #6 6:15 APIC version 20 
(XEN) Processor #1 6:15 APIC version 20 
(XEN) Processor #7 6:15 APIC version 20 
(XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 
(XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47 
(XEN) IOAPIC[2]: apic_id 10, version 32, address 0xfec84000, GSI 48-71 
(XEN) IOAPIC[3]: apic_id 11, version 32, address 0xfec84400, GSI 72-95 
(XEN) Enabling APIC mode:  Flat.  Using 4 I/O APICs 
(XEN) Using scheduler: SMP Credit Scheduler (credit) 
(XEN) Detected 2660.056 MHz processor. 
(XEN) HVM: VMX enabled 
(XEN) VMX: MSR intercept bitmap enabled 
(XEN) I/O virtualisation disabled 
(XEN) CPU0: Intel Genuine Intel(R) CPU                  @ 2.66GHz stepping 04 
(XEN) Booting processor 1/6 eip 90000 
(XEN) Not responding. 
(XEN) Inquiring remote APIC #6... 
(XEN) ... APIC #6 ID: failed 
(XEN) ... APIC #6 VERSION: failed 
(XEN) ... APIC #6 SPIV: failed 
(XEN) CPU #6 not responding - cannot use it. 
(XEN) Booting processor 1/1 eip 90000 
(XEN) CPU1: Intel Genuine Intel(R) CPU                  @ 2.66GHz stepping 04 
(XEN) Booting processor 2/7 eip 90000 
(XEN) CPU2: Intel Genuine Intel(R) CPU                  @ 2.66GHz stepping 04 
(XEN) Total of 3 processors activated. 
(XEN) ENABLING IO-APIC IRQs 
(XEN)  -> Using new ACK method 
(XEN) CPU#2 already initialized! 
(XEN) huh, phys CPU#7, CPU#2 already present?? 
(XEN) Xen BUG at smpboot.c:334 
(XEN) ----[ Xen-3.1.2-262.el5  x86_64  debug=n  Not tainted ]---- 
(XEN) CPU:    2 
(XEN) RIP:    e008:[<ffff828c801e308f>] smp_callin+0x5f/0x1d0 
(XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor 
(XEN) rax: 0000000000000000   rbx: ffff83009ea0ffff   rcx: 0000000000000cca 
(XEN) rdx: 000000000000000a   rsi: 000000000000000a   rdi: ffff828c803164a9 
(XEN) rbp: 0000000000000002   rsp: ffff83009ea0fe80   r8:  ffff8300000b8000 
(XEN) r9:  0000000000000000   r10: 00000000fffffffc   r11: ffff828c80121bb0 
(XEN) r12: 0000000000000002   r13: 0000000000000002   r14: 0000000000000000 
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026b0 
(XEN) cr3: 000000009ecfb000   cr2: 0000000000000000 
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008 
(XEN) Xen stack trace from rsp=ffff83009ea0fe80: 
(XEN)    83044e7e4000efff ffff83009ea0ffff 0000000000000002 ffff83009ea0ff28 
(XEN)    0000000000000002 0000000000000000 0000000000000000 ffff828c801e3344 
(XEN)    5555555555555555 5555555555555555 5555555555555555 5555555555555555 
(XEN)    5555555555555555 5555555555555555 5555555555555555 0000000000000001 
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000 
(XEN)    0000000000000000 5555555555555555 5555555555555555 5555555555555555 
(XEN)    5555555555555555 5555555555555555 5555555555555555 5555555555555555 
(XEN)    5555555555555555 5555555555555555 5555555555555555 5555555555555555 
(XEN)    5555555555555555 5555555555555555 5555555555555555 5555555555555555 
(XEN)    5555555555555555 5555555555555555 5555555555555555 5555555555555555 
(XEN)    5555555555555555 5555555555555555 5555555555555555 5555555555555555 
(XEN)    5555555555555555 5555555555555555 5555555500000002 ffff83009f9fa080 
(XEN) Xen call trace: 
(XEN)    [<ffff828c801e308f>] smp_callin+0x5f/0x1d0 
(XEN)    [<ffff828c801e3344>]Platform timer overflows in 14998 jiffies. 
(XEN)  start_secondary+0xa4/0x460 
(XEN)     
(XEN) Platform timer is 14.318MHz HPET 
(XEN)  
(XEN) **************************************** 
(XEN) Panic on CPU 2: 
(XEN) Xen BUG at smpboot.c:334 
(XEN) **************************************** 
(XEN)  
(XEN) Reboot in five seconds... 

Expected results:
 System should boot.

Additional info:
 Booting the baremetal kernel I do not see the CPU error messages.

Comment 2 Laszlo Ersek 2011-05-19 15:08:52 UTC
Potential BIOS problem -- Jeff, can you please confirm if the machine has the most recent BIOS installed?

We've seen a different bug (with possibly different background) that has the exact same symptoms -- failure to bring up a PCPU in a timely manner, then finding it later on-line, unexpectedly.

Let's say any given PCPU has the same probability "p" to come up correctly, on these "problematic" systems, 0 << p < 1, and suppose PCPU's are independent. We have a problem if at least one PCPU fails to come up.

P(we've got a problem) = P(at least one PCPU fails to come up)
                       = P(not(all PCPU's okay))
                       = 1 - P(all PCPU's okay)
                       = 1 - p ** numcpus

Let's assume p = 0.99:

numcpus  P(problem)
-------  ----------
      1        0.01
      4       ~0.04 (this machine has 4 PCPUs)
     32       ~0.28 (that other machine)

I'm afraid if we'll insist enough, we'll see more panics on this machine.

Do we have any hint whether this threatens to be a regression?

Comment 3 Laszlo Ersek 2011-05-24 12:17:26 UTC
Hi Jeff,

can you please check if the machine has the most recent BIOS installed?

Also, have you seen earlier Xen builds boot on the machine?

Thanks!

Comment 4 Laszlo Ersek 2011-05-24 12:22:48 UTC
(In reply to comment #2)
> P(we've got a problem) = P(at least one PCPU fails to come up)

(We have seen cases when several PCPUs could not be brought up, but the boot succeded otherwise. I think this still qualifies as "we've got a problem" -- domains can't use the "missing" processors.)