Bug 838921

Summary: some cpu features will cause guest kernel panic
Product: Red Hat Enterprise Linux 5 Reporter: EricLee <bili>
Component: kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.9CC: bsarathy, chayang, dallan, dyuan, juzhang, mkenneth, mzhan, rhod, rwu, virt-maint, weizhan, whuang, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Qemu-kvm does not check if a given CPU flag is really supported by the KVM kernel module. Attempting to enable the "acpi" flag can lead to a kernel panic on guest machines. To work around this problem, do not enable the "acpi" CPU flag in the configuration of a virtual machine.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-29 06:28:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description EricLee 2012-07-10 12:47:43 UTC
Description of problem:
some cpu features will cause guest kernel panic

Version-Release number of selected component (if applicable):
# rpm -qa libvirt kvm kernel
libvirt-0.8.2-27.el5
kernel-2.6.18-322.el5
kvm-83-254.el5

How reproducible:
always

Steps to Reproduce:
# virsh capabilities
Get cpu info of the host.

Fill the <cpu> info from virsh capabilities to a xml named baseline.xml.
# cat baseline.xml
<cpu>
       <arch>x86_64</arch>
       <model>core2duo</model>
       <topology sockets='1' cores='4' threads='1'/>
       <feature name='lahf_lm'/>
       <feature name='rdtscp'/>
       <feature name='popcnt'/>
       <feature name='x2apic'/>
       <feature name='sse4.2'/>
       <feature name='sse4.1'/>
       <feature name='xtpr'/>
       <feature name='cx16'/>
       <feature name='tm2'/>
       <feature name='est'/>
       <feature name='vmx'/>
       <feature name='ds_cpl'/>
       <feature name='pbe'/>
       <feature name='tm'/>
       <feature name='ht'/>
       <feature name='ss'/>
       <feature name='acpi'/>
       <feature name='ds'/>
</cpu>

# virsh cpu-baseline baseline.xml
<cpu match='exact'>
   <model>core2duo</model>
   <feature policy='require' name='lahf_lm'/>
   <feature policy='require' name='rdtscp'/>
   <feature policy='require' name='popcnt'/>
   <feature policy='require' name='x2apic'/>
   <feature policy='require' name='sse4.2'/>
   <feature policy='require' name='sse4.1'/>
   <feature policy='require' name='xtpr'/>
   <feature policy='require' name='cx16'/>
   <feature policy='require' name='tm2'/>
   <feature policy='require' name='est'/>
   <feature policy='require' name='vmx'/>
   <feature policy='require' name='ds_cpl'/>
   <feature policy='require' name='pbe'/>
   <feature policy='require' name='tm'/>
   <feature policy='require' name='ht'/>
   <feature policy='require' name='ss'/>
   <feature policy='require' name='acpi'/>
   <feature policy='require' name='ds'/>
</cpu>

Add all the above features to guest xml will cause guest kernel panic.
But can start guest normally adding the following two features to guest.xml
   <cpu match='exact'>
     <model>core2duo</model>
     <feature policy='require' name='lahf_lm'/>
     <feature policy='require' name='sse4.1'/>
   </cpu>
Not sure which features will cause guest kernel panic.

And with any features from virsh cpu-baseline, changing <cpu match='exact'> to <cpu match='minimum'> also will panic.

Actual results:
As steps

Expected results:
All features from # virsh cpu-baseline should not cause guest kernel panic.
And also for <cpu match='minimum'>.

Additional info:
RHEL6 works well.

Comment 1 EricLee 2012-07-10 12:51:20 UTC
It also can be reproduced with versions:
# rpm -qa kernel libvirt kvm
libvirt-0.8.2-27.el5
kernel-2.6.18-324.el5
kvm-83-256.el5

Comment 2 Dave Allan 2012-07-10 13:22:39 UTC
Eric, there's nothing libvirt can do about the guest kernel panicking.  I'm changing the component to kernel, and you will need to provide information about what guest OS you're running.

Comment 3 EricLee 2012-07-11 01:56:39 UTC
(In reply to comment #2)
> Eric, there's nothing libvirt can do about the guest kernel panicking.  I'm
> changing the component to kernel, and you will need to provide information
> about what guest OS you're running.

Hi Dave,

My guest is running with released rhel5u8.
And I am thinking that the cause of guest kernel panic maybe libvirt missing to estimate some cpu features when delivering them to qemu-kvm. Is that authentic?

Thanks.
EricLee

Comment 4 juzhang 2012-07-11 02:42:20 UTC
Hi, Ericlee

Would you please answer the following two questions?

1. Would you please provide qemu-kvm commandline when you hit this issue?
you can get the qemu-kvm command via "ps -aux | grep qemu-kvm" in your host

2. Would you please tell your host cpu info?
you  can cat /proc/cpuinfo in your host ,Thanks

Comment 5 EricLee 2012-07-11 04:43:27 UTC
(In reply to comment #4)
> Hi, Ericlee
> 
> Would you please answer the following two questions?
> 
> 1. Would you please provide qemu-kvm commandline when you hit this issue?
> you can get the qemu-kvm command via "ps -aux | grep qemu-kvm" in your host
> 

# ps aux | grep qemu-kvm
root      3697 99.9  0.5 1247512 45140 ?       Sl   Jul10 967:43 /usr/libexec/qemu-kvm -S -M rhel5.4.0 -cpu qemu64,+lahf_lm,+sse4.1,+xtpr,+cx16,+ssse3,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme,-svm -m 1024 -smp 2,sockets=2,cores=1,threads=1 -name rhel5u8 -uuid 9ff3fac3-3764-9a9a-9c88-586c28e08286 -monitor unix:/var/lib/libvirt/qemu/rhel5u8.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/var/lib/libvirt/images/kvm-rhel5u8-x86_64,if=virtio,boot=on,format=raw,cache=none -net none -serial none -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -balloon virtio

> 2. Would you please tell your host cpu info?
> you  can cat /proc/cpuinfo in your host ,Thanks

# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
stepping	: 10
cpu MHz		: 2992.131
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips	: 5984.26
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
stepping	: 10
cpu MHz		: 2992.131
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips	: 5984.24
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 6 Dave Allan 2012-07-11 13:31:46 UTC
(In reply to comment #3)
> My guest is running with released rhel5u8.
> And I am thinking that the cause of guest kernel panic maybe libvirt missing
> to estimate some cpu features when delivering them to qemu-kvm. Is that
> authentic?

Let's see why the guest is panicking before we speculate about a cause.

Comment 7 FuXiangChun 2012-07-12 07:39:20 UTC
I tested this issue with host and commandline in comments 5
testing result:
1.If boot guest with -cpu,+acpi, guest will kernel panic 
2.If remove acpi flag from commandline then guest can boot successful. and still have this flag in guest after guest boot(apci flag will be exposed to guest automatically without +acpi).

so, cpu acpi flag cause guest kernel panic.

Comment 8 Eduardo Habkost 2012-07-26 20:11:31 UTC
I don't know what exactly causes the kernel panic, but the "acpi" flag is not supported by the KVM kernel module, and is supposed to be filtered out by qemu-kvm (probably it is not being filtered out due to a qemu-kvm bug). An easy workaround is just to never ask the "acpi" CPU flag to ever be enabled on a virtual machine.

Comment 10 Eduardo Habkost 2012-07-26 20:25:50 UTC
That's interesting: RHEL-5 qemu-kvm doesn't even seem to use GET_SUPPORTED_CPUID, and will enable any flag the user asked for, even if the KVM module doesn't support it.

Comment 11 Eduardo Habkost 2012-07-26 20:25:50 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
  qemu-kvm doesn't check if a given CPU flag is really supported by the KVM kernel module.

Consequence:
  if the user asks the "acpi" flag to be enabled, Linux guests may panic.

Workaround:
  never enabling the "acpi" CPU flag on virtual machine configuration.

Result:
  Without the "acpi" flag, the guest won't panic.

Comment 12 RHEL Program Management 2012-07-26 20:38:41 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 13 Ronen Hod 2012-07-29 06:28:17 UTC
Closing.
The workaround is simple (not using the "acpi" flag), which should be enough for RHEL5 existing deployments.

Thanks, Ronen.

Comment 14 Eduardo Habkost 2012-07-30 15:12:28 UTC
Bugzilla changed the componentn by itself. Moving back to the right component.

Comment 15 Eliska Slobodova 2012-08-27 15:30:16 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,11 +1 @@
-Cause:
+Qemu-kvm does not check if a given CPU flag is really supported by the KVM kernel module. Attempting to enable the "acpi" flag can lead to a kernel panic on guest machines. To work around this problem, do not enable the "acpi" CPU flag in the configuration of a virtual machine.-  qemu-kvm doesn't check if a given CPU flag is really supported by the KVM kernel module.
-
-Consequence:
-  if the user asks the "acpi" flag to be enabled, Linux guests may panic.
-
-Workaround:
-  never enabling the "acpi" CPU flag on virtual machine configuration.
-
-Result:
-  Without the "acpi" flag, the guest won't panic.