I tried to turn my RHEL4 xen guest into a 4-way SMP image today for some testing. When I brought up the RHEL4 kernel, I noticed that I only had 1 CPU in /proc/cpuinfo. Looking back through dmesg, I saw this: Booting processor 1/2 rip 6000 rsp 1003f125f58 Not responding. Inquiring remote APIC #2... ... APIC #2 ID: failed ... APIC #2 VERSION: failed ... APIC #2 SPIV: failed Booting processor 1/4 rip 6000 rsp 1003f129f58 Not responding. Inquiring remote APIC #4... ... APIC #4 ID: failed ... APIC #4 VERSION: failed ... APIC #4 SPIV: failed Booting processor 1/6 rip 6000 rsp 1003f12bf58 Not responding. Inquiring remote APIC #6... ... APIC #6 ID: failed ... APIC #6 VERSION: failed ... APIC #6 SPIV: failed Only one processor found. activating NMI Watchdog ... done. testing NMI watchdog ... CPU#0: NMI appears to be stuck (0)! ...on the dom0 console these messages popped up: (XEN) instrlen.c:252:d1 Cannot read from address b8000 (eip b8000, mode 2) (XEN) vlapic.c:288:d1 Ignoring delivery mode 3 (XEN) vlapic.c:288:d1 Ignoring delivery mode 3 (XEN) vlapic.c:288:d1 Ignoring delivery mode 3 (XEN) instrlen.c:252:d1 Cannot read from address b8000 (eip b8000, mode 2) (XEN) vlapic.c:288:d1 Ignoring delivery mode 3 (XEN) vlapic.c:288:d1 Ignoring delivery mode 3 (XEN) vlapic.c:288:d1 Ignoring delivery mode 3 (XEN) instrlen.c:252:d1 Cannot read from address b8000 (eip b8000, mode 2) (XEN) vlapic.c:288:d1 Ignoring delivery mode 3 (XEN) vlapic.c:288:d1 Ignoring delivery mode 3 (XEN) vlapic.c:288:d1 Ignoring delivery mode 3 ...this is from a -84.el5 kernel. domU is running 2.6.9-68.16.EL.jtltest.31smp, but -55.0.16 fails the same way. When I boot the dom0 to -53.1.14, everything works fine. I can provide other info or access to the box if needed.
Jeff, Out of curiousity, what is the platform you are using (AMD or Intel)? Also, what does "service cpuspeed status" tell you? If it says something about "using ondemand governor", try "service cpuspeed stop", and see if that makes a difference with the -83 kernel. Thanks, Chris Lalancette
It's a 2x dual core Intel box (8-way when you count HT). Here's cpuinfo from the first CPU: $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Genuine Intel(R) CPU 3.46GHz stepping : 2 cpu MHz : 3458.024 cache size : 2048 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est cid cx16 xtpr lahf_lm bogomips : 8650.26 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: I presume you wanted me to check cpuspeed on dom0. I get an error from cpuspeed regardless of the kernel rev: # service cpuspeed status Frequency scaling not supported under xen kernels
This is also the case for RHEL5 SMP FV guests as well...
This problem came in as part of the hypervisor rebase, so it came in with kernel 2.6.18-59.el5. Digging deeper, it actually came in with c/s 15161 upstream, which subtly changed the interface between userland and the hypervisor. In particular, this affects the vmxassist part of the userland/HV interface, so we only see the problem on Intel. The -86 HV currently has one side of the change, but the userland tools do *not* have the second part of the change. So, I think I'm going to just post the second part (userland) of the patch. It seems to be the safest thing to do at this point; I'll attach it here. Chris Lalancette
Created attachment 298573 [details] Patch to fix FV SMP boot on intel This is the userland portion of the upstream xen-3.1-testing.hg c/s 15161, and it fixes fully virtualized SMP guests on Intel.
Set dev ack for Chris.
Fixed in xen-3.0.3-60.el5: * Thu Mar 27 2008 Chris Lalancette <clalance> - 3.0.3-60.el5 - Pull in the userland side of upstream c/s 15161 to fix fully virt SMP boot on Intel machines (rhbz #435744)
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0305.html