Bug 509809 - Host panic when try to run kvm guest on a host which restored from suspend.
Summary: Host panic when try to run kvm guest on a host which restored from suspend.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On: 510814
Blocks: 510812 5.4, TechnicalNotes
TreeView+ depends on / blocked
 
Reported: 2009-07-06 11:50 UTC by Mark Xie
Modified: 2013-01-09 21:46 UTC (History)
12 users (show)

Fixed In Version: kernel-2.6.18-170.el5
Doc Type: Bug Fix
Doc Text:
Currently, KVM cannot disable virtualization extensions on a CPU while it is being taken down. Consequently, suspending a host running KVM-based virtual machines may cause the host to crash.
Clone Of:
Environment:
Last Closed: 2010-03-30 07:44:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Mark Xie 2009-07-06 11:50:20 UTC
Description of problem:
System panic happens when try to run a kvm guest on a host which restore from suspend.

Version-Release number of selected component (if applicable):
Host RHEL5u4 Server x86_64 20090701.0

# uname -a
Linux dhcp-66-70-3.nay.redhat.com 2.6.18-156.el5 #1 SMP Mon Jun 29 18:16:54 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.4 Beta (Tikanga)

# rpm -q kvm
kvm-83-82.el5

# rpm -q kernel
kernel-2.6.18-156.el5

# rpm -qa |grep kvm
etherboot-roms-kvm-5.4.4-10.el5
kvm-tools-83-82.el5
etherboot-zroms-kvm-5.4.4-10.el5
kvm-qemu-img-83-82.el5
kmod-kvm-83-82.el5
kvm-83-82.el5

# uname -a
Linux dhcp-66-70-3.nay.redhat.com 2.6.18-156.el5 #1 SMP Mon Jun 29 18:16:54 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
100%

Steps to Reproduce:
1. Start the RHEL5.4-x86-64 host , and make sure the ksm.ko module have not loaded. (As there is a ksm bug exists: [Bug 505440] Panic on suspend with KSM module loaded)
2. Suspend the system by "echo disk>/sys/power/state"
3. Restore the host 
4. start the kvm guest by qemu-kvm or virsh.
  
Actual results:
1, When starting guest from command line directly :

# /usr/libexec/qemu-kvm -drive file=RHEL-Server-5.4-64.qcow2,media=disk,if=ide,cache=off,index=0 -net nic,macaddr=20:20:20:00:16:79,model=e1000,script=/etc/qemu-ifup -rtc-td-hack -no-hpet -usbdevice tablet -cpu qemu64,+sse2 -smp 1 -m 1G

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at lib/list_debug.c:26
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 2
Modules linked in: nls_utf8 radeon drm ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 hidp rfcomm l2cap bluetooth dm_log_clustered(U) lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i iw_cxgb3 ib_core cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp kvm_intel(U) kvm(U) snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer snd_page_alloc snd_hwdep sr_mod snd parport_pc i2c_core cdrom soundcore parport shpchp serio_raw sg e1000e pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 4597, comm: qemu-kvm Tainted: G      2.6.18-156.el5 #1
RIP: 0010:[<ffffffff801524f3>]  [<ffffffff801524f3>] __list_add+0x24/0x68
RSP: 0018:ffff810124301d78  EFLAGS: 00010082
RAX: 0000000000000058 RBX: ffff810104bba7a0 RCX: ffffffff80309c28
RDX: ffffffff80309c28 RSI: 0000000000000000 RDI: ffffffff80309c20
RBP: ffff8100010181d0 R08: ffffffff80309c28 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000080 R12: ffff810123dea7e0
R13: ffff810123de8080 R14: 000000000000000c R15: 0000000000001000
FS:  0000000043255940(0063) GS:ffff8101041e4e40(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000118df000 CR3: 00000001249e0000 CR4: 00000000000006a0
Process qemu-kvm (pid: 4597, threadinfo ffff810124300000, task ffff810127e157e0)
Stack:  ffff810123de8080 0000000000000002 ffff810114ee6000 ffffffff881e8074
 0000000000000202 ffffffff80090884 ffff810123de8080 0000000108741000
 0000000000000202 ffff810123de8080 ffff810123de8080 ffffffff883adef5
Call Trace:








2, If start by virsh:

# ps ax |grep qemu-kvm
 4410 ?        Sl     0:53 /usr/libexec/qemu-kvm -S -M pc -m 2048 -smp 2 -name test_ksm -uuid ce91d75d-25ac-1360-4eed-f7f1968bcc5d -monitor pty -pidfile /var/run/libvirt/qemu//test_ksm.pid -boot c -drive file=/var/lib/libvirt/images/kvm-rhel5.3-i386.img,if=ide,index=0,boot=on -drive file=,if=ide,media=cdrom,index=2 -net nic,macaddr=54:52:00:22:fe:1d,vlan=0 -net tap,fd=16,script=,vlan=0,ifname=vnet0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -soundhw es1370


----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at ...-83-maint-snapshot-20090205/kernel-/x86/kvm_main.c:2444
invalid opcode: 0000 [1] SMP
last sysfs file: /class/net/lo/type
CPU 1
Modules linked in: tun radeon drm ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 hidp rfcomm l2cap bluetooth dm_log_clustered(U) lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i iw_cxgb3 ib_core cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp kvm_intel(U) kvm(U) snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss sr_mod cdrom snd_pcm i2c_i801 parport_pc snd_timer snd_page_alloc snd_hwdep i2c_core snd parport shpchp e1000e soundcore sg serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 4245, comm: qemu-kvm Tainted: G      2.6.18-156.el5 #1
RIP: 0010:[<ffffffff883a9489>]  [<ffffffff883a9489>] :kvm:kvm_handle_fault_on_reboot+0xb/0x16
RSP: 0018:ffff81011f651cb0  EFLAGS: 00010246
RAX: ffff81011f651cc8 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff81010c4b8000 RSI: ffff810001000108 RDI: ffff81010c4b8000
RBP: ffff8101278b0040 R08: ffff81011a239e0e R09: ffff810126c6c000
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810126c6c000
R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000001000
FS:  0000000040ad3940(0063) GS:ffff8101041d27c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000040ad2fc8 CR3: 00000001234b8000 CR4: 00000000000006a0
Process qemu-kvm (pid: 4245, threadinfo ffff81011f650000, task ffff8101278c27a0)
Stack:  ffffffff881bab94 0000000000001000 ffffffff881bc215 000000010c4b8000
 ffffffff881bcd6f 0484030400000001 0000004408bfebd0 ffff810000019c10
 0000000200000000 0000000100000000 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff881bab94>] :kvm_intel:vmcs_clear+0x1c/0x42
 [<ffffffff881bc215>] :kvm_intel:alloc_vmcs_cpu+0x3a/0xb7
 [<ffffffff881bcd6f>] :kvm_intel:vmx_create_vcpu+0x110/0x77c
 [<ffffffff883aaceb>] :kvm:kvm_vm_ioctl+0x10d/0xad0
 [<ffffffff8000f966>] __alloc_pages+0x65/0x2ce
 [<ffffffff8000966a>] __handle_mm_fault+0x823/0xf98
 [<ffffffff800225a6>] __up_read+0x19/0x7f
 [<ffffffff80067b58>] do_page_fault+0x4fe/0x830
 [<ffffffff800426ca>] do_ioctl+0x21/0x6b
 [<ffffffff8003090e>] vfs_ioctl+0x457/0x4b9
 [<ffffffff800b70fc>] audit_syscall_entry+0x180/0x1b3
 [<ffffffff8004cd71>] sys_ioctl+0x59/0x78
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

Code: 0f 0b 68 22 36 3c 88 c2 8c 09 c3 55 48 89 fd 53 31 db 48 83
RIP  [<ffffffff883a9489>] :kvm:kvm_handle_fault_on_reboot+0xb/0x16
 RSP <ffff81011f651cb0>
 <0>Kernel panic - not syncing: Fatal exception




Expected results:
The guest should be start successfully.


Additional info:


Host CPU:
processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
stepping        : 10
cpu MHz         : 2826.235
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm
bogomips        : 5652.50
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Comment 2 Eduardo Habkost 2009-07-10 22:04:21 UTC
To properly fix suspend of the host with KVM, we would need the following patch set on the RHEL5 kernel: http://lkml.org/lkml/2007/5/24/108

Right now, there is no way for KVM to disable virtualization extensions on the CPU while it is being taken down, while no process can be scheduled on that CPU.

Comment 3 Dor Laor 2009-07-14 14:51:08 UTC
Postponing for next release

Comment 4 Dor Laor 2009-07-14 14:51:08 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Suspending a host running kvm VMs might crash the host since there is no way for KVM to disable virtualization extensions on the
CPU while it is being taken down.

Comment 8 Ryan Lerch 2009-08-18 23:12:45 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,2 +1 @@
-Suspending a host running kvm VMs might crash the host since there is no way for KVM to disable virtualization extensions on the
+Currently, there is no way for KVM to disable virtualization extensions on a CPU while it is being taken down. Consequently, suspending a host running KVM-based virtual machines may cause the host to crash.-CPU while it is being taken down.

Comment 10 Ryan Lerch 2009-08-18 23:14:35 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-Currently, there is no way for KVM to disable virtualization extensions on a CPU while it is being taken down. Consequently, suspending a host running KVM-based virtual machines may cause the host to crash.+Currently, KVM cannot disable virtualization extensions on a CPU while it is being taken down. Consequently, suspending a host running KVM-based virtual machines may cause the host to crash.

Comment 14 Eduardo Habkost 2009-10-15 19:16:16 UTC
Should be fixed by the fix for bug #510814. Moving to POST to reflect the fix status.

Comment 15 Eduardo Habkost 2009-10-21 19:30:50 UTC
The fix for bug #510814 (that should solve this bug) is in kernel-2.6.18-170.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Testing is welcome.

Comment 17 Mark Xie 2009-10-23 02:20:26 UTC
Have been tested kernel-2.6.18-170.el5.x86_64.rpm.

Guest can be boot successfully after host resume from suspend, both windows and linux Guest runs OK, both booting from console and booting from GUI command line runs OK.

Conditions have been covered:
1) Intel/AMD host
2) windows Guest/ RHEL Guest
3) Run guest from console/ Run guest from GUI pesudo terminal

All these covered conditions passed.

Comment 19 lihuang 2010-01-31 13:00:51 UTC
retest on Intel Host,
guest works fine after resume from suspend ( host auto resume due to 550014 )

Comment 21 errata-xmlrpc 2010-03-30 07:44:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.