Bug 435744 - SMP FV xen guests failing to start more than 1 CPU
Summary: SMP FV xen guests failing to start more than 1 CPU
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Chris Lalancette
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-03-03 17:33 UTC by Jeff Layton
Modified: 2018-10-19 22:06 UTC (History)
3 users (show)

Fixed In Version: RHBA-2008-0305
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:21:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to fix FV SMP boot on intel (2.42 KB, patch)
2008-03-19 19:07 UTC, Chris Lalancette
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0305 0 normal SHIPPED_LIVE xen bug fix and enhancement update 2008-05-20 18:04:30 UTC

Description Jeff Layton 2008-03-03 17:33:54 UTC
I tried to turn my RHEL4 xen guest into a 4-way SMP image today for some
testing. When I brought up the RHEL4 kernel, I noticed that I only had 1 CPU in
/proc/cpuinfo. Looking back through dmesg, I saw this:

Booting processor 1/2 rip 6000 rsp 1003f125f58
Not responding.
Inquiring remote APIC #2...
... APIC #2 ID: failed
... APIC #2 VERSION: failed
... APIC #2 SPIV: failed
Booting processor 1/4 rip 6000 rsp 1003f129f58
Not responding.
Inquiring remote APIC #4...
... APIC #4 ID: failed
... APIC #4 VERSION: failed
... APIC #4 SPIV: failed
Booting processor 1/6 rip 6000 rsp 1003f12bf58
Not responding.
Inquiring remote APIC #6...
... APIC #6 ID: failed
... APIC #6 VERSION: failed
... APIC #6 SPIV: failed
Only one processor found.
activating NMI Watchdog ... done.
testing NMI watchdog ... CPU#0: NMI appears to be stuck (0)!

...on the dom0 console these messages popped up:

(XEN) instrlen.c:252:d1 Cannot read from address b8000 (eip b8000, mode 2)
(XEN) vlapic.c:288:d1 Ignoring delivery mode 3
(XEN) vlapic.c:288:d1 Ignoring delivery mode 3
(XEN) vlapic.c:288:d1 Ignoring delivery mode 3
(XEN) instrlen.c:252:d1 Cannot read from address b8000 (eip b8000, mode 2)
(XEN) vlapic.c:288:d1 Ignoring delivery mode 3
(XEN) vlapic.c:288:d1 Ignoring delivery mode 3
(XEN) vlapic.c:288:d1 Ignoring delivery mode 3
(XEN) instrlen.c:252:d1 Cannot read from address b8000 (eip b8000, mode 2)
(XEN) vlapic.c:288:d1 Ignoring delivery mode 3
(XEN) vlapic.c:288:d1 Ignoring delivery mode 3
(XEN) vlapic.c:288:d1 Ignoring delivery mode 3

...this is from a -84.el5 kernel. domU is running 2.6.9-68.16.EL.jtltest.31smp,
but -55.0.16 fails the same way.

When I boot the dom0 to -53.1.14, everything works fine.

I can provide other info or access to the box if needed.

Comment 1 Chris Lalancette 2008-03-03 23:23:40 UTC
Jeff,
     Out of curiousity, what is the platform you are using (AMD or Intel)? 
Also, what does "service cpuspeed status" tell you?  If it says something about
"using ondemand governor", try "service cpuspeed stop", and see if that makes a
difference with the -83 kernel.

Thanks,
Chris Lalancette

Comment 2 Jeff Layton 2008-03-04 00:25:38 UTC
It's a 2x dual core Intel box (8-way when you count HT). Here's cpuinfo from the
first CPU:

$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 6
model name      :                    Genuine Intel(R) CPU 3.46GHz
stepping        : 2
cpu MHz         : 3458.024
cache size      : 2048 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor
ds_cpl vmx est cid cx16 xtpr lahf_lm
bogomips        : 8650.26
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

I presume you wanted me to check cpuspeed on dom0. I get an error from cpuspeed
regardless of the kernel rev:

# service cpuspeed status
Frequency scaling not supported under xen kernels



Comment 3 Jeff Layton 2008-03-04 12:23:17 UTC
This is also the case for RHEL5 SMP FV guests as well...


Comment 5 Chris Lalancette 2008-03-19 19:06:24 UTC
This problem came in as part of the hypervisor rebase, so it came in with kernel
2.6.18-59.el5.  Digging deeper, it actually came in with c/s 15161 upstream,
which subtly changed the interface between userland and the hypervisor.  In
particular, this affects the vmxassist part of the userland/HV interface, so we
only see the problem on Intel.  The -86 HV currently has one side of the change,
but the userland tools do *not* have the second part of the change.

So, I think I'm going to just post the second part (userland) of the patch.  It
seems to be the safest thing to do at this point; I'll attach it here.

Chris Lalancette

Comment 6 Chris Lalancette 2008-03-19 19:07:31 UTC
Created attachment 298573 [details]
Patch to fix FV SMP boot on intel

This is the userland portion of the upstream xen-3.1-testing.hg c/s 15161, and
it fixes fully virtualized SMP guests on Intel.

Comment 7 Bill Burns 2008-03-19 19:28:18 UTC
Set dev ack for Chris.


Comment 10 Chris Lalancette 2008-03-27 13:46:35 UTC
Fixed in xen-3.0.3-60.el5:

* Thu Mar 27 2008 Chris Lalancette <clalance> - 3.0.3-60.el5
- Pull in the userland side of upstream c/s 15161 to fix fully virt SMP
  boot on Intel machines (rhbz #435744)


Comment 14 errata-xmlrpc 2008-05-21 15:21:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0305.html



Note You need to log in before you can comment on or make changes to this bug.