Description of problem: System panic happens when try to run a kvm guest on a host which restore from suspend. Version-Release number of selected component (if applicable): Host RHEL5u4 Server x86_64 20090701.0 # uname -a Linux dhcp-66-70-3.nay.redhat.com 2.6.18-156.el5 #1 SMP Mon Jun 29 18:16:54 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.4 Beta (Tikanga) # rpm -q kvm kvm-83-82.el5 # rpm -q kernel kernel-2.6.18-156.el5 # rpm -qa |grep kvm etherboot-roms-kvm-5.4.4-10.el5 kvm-tools-83-82.el5 etherboot-zroms-kvm-5.4.4-10.el5 kvm-qemu-img-83-82.el5 kmod-kvm-83-82.el5 kvm-83-82.el5 # uname -a Linux dhcp-66-70-3.nay.redhat.com 2.6.18-156.el5 #1 SMP Mon Jun 29 18:16:54 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux How reproducible: 100% Steps to Reproduce: 1. Start the RHEL5.4-x86-64 host , and make sure the ksm.ko module have not loaded. (As there is a ksm bug exists: [Bug 505440] Panic on suspend with KSM module loaded) 2. Suspend the system by "echo disk>/sys/power/state" 3. Restore the host 4. start the kvm guest by qemu-kvm or virsh. Actual results: 1, When starting guest from command line directly : # /usr/libexec/qemu-kvm -drive file=RHEL-Server-5.4-64.qcow2,media=disk,if=ide,cache=off,index=0 -net nic,macaddr=20:20:20:00:16:79,model=e1000,script=/etc/qemu-ifup -rtc-td-hack -no-hpet -usbdevice tablet -cpu qemu64,+sse2 -smp 1 -m 1G ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at lib/list_debug.c:26 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 2 Modules linked in: nls_utf8 radeon drm ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 hidp rfcomm l2cap bluetooth dm_log_clustered(U) lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i iw_cxgb3 ib_core cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp kvm_intel(U) kvm(U) snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer snd_page_alloc snd_hwdep sr_mod snd parport_pc i2c_core cdrom soundcore parport shpchp serio_raw sg e1000e pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 4597, comm: qemu-kvm Tainted: G 2.6.18-156.el5 #1 RIP: 0010:[<ffffffff801524f3>] [<ffffffff801524f3>] __list_add+0x24/0x68 RSP: 0018:ffff810124301d78 EFLAGS: 00010082 RAX: 0000000000000058 RBX: ffff810104bba7a0 RCX: ffffffff80309c28 RDX: ffffffff80309c28 RSI: 0000000000000000 RDI: ffffffff80309c20 RBP: ffff8100010181d0 R08: ffffffff80309c28 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000080 R12: ffff810123dea7e0 R13: ffff810123de8080 R14: 000000000000000c R15: 0000000000001000 FS: 0000000043255940(0063) GS:ffff8101041e4e40(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 00000000118df000 CR3: 00000001249e0000 CR4: 00000000000006a0 Process qemu-kvm (pid: 4597, threadinfo ffff810124300000, task ffff810127e157e0) Stack: ffff810123de8080 0000000000000002 ffff810114ee6000 ffffffff881e8074 0000000000000202 ffffffff80090884 ffff810123de8080 0000000108741000 0000000000000202 ffff810123de8080 ffff810123de8080 ffffffff883adef5 Call Trace: 2, If start by virsh: # ps ax |grep qemu-kvm 4410 ? Sl 0:53 /usr/libexec/qemu-kvm -S -M pc -m 2048 -smp 2 -name test_ksm -uuid ce91d75d-25ac-1360-4eed-f7f1968bcc5d -monitor pty -pidfile /var/run/libvirt/qemu//test_ksm.pid -boot c -drive file=/var/lib/libvirt/images/kvm-rhel5.3-i386.img,if=ide,index=0,boot=on -drive file=,if=ide,media=cdrom,index=2 -net nic,macaddr=54:52:00:22:fe:1d,vlan=0 -net tap,fd=16,script=,vlan=0,ifname=vnet0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -soundhw es1370 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at ...-83-maint-snapshot-20090205/kernel-/x86/kvm_main.c:2444 invalid opcode: 0000 [1] SMP last sysfs file: /class/net/lo/type CPU 1 Modules linked in: tun radeon drm ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 hidp rfcomm l2cap bluetooth dm_log_clustered(U) lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i iw_cxgb3 ib_core cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp kvm_intel(U) kvm(U) snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss sr_mod cdrom snd_pcm i2c_i801 parport_pc snd_timer snd_page_alloc snd_hwdep i2c_core snd parport shpchp e1000e soundcore sg serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 4245, comm: qemu-kvm Tainted: G 2.6.18-156.el5 #1 RIP: 0010:[<ffffffff883a9489>] [<ffffffff883a9489>] :kvm:kvm_handle_fault_on_reboot+0xb/0x16 RSP: 0018:ffff81011f651cb0 EFLAGS: 00010246 RAX: ffff81011f651cc8 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff81010c4b8000 RSI: ffff810001000108 RDI: ffff81010c4b8000 RBP: ffff8101278b0040 R08: ffff81011a239e0e R09: ffff810126c6c000 R10: 0000000000000000 R11: 0000000000000002 R12: ffff810126c6c000 R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000001000 FS: 0000000040ad3940(0063) GS:ffff8101041d27c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000040ad2fc8 CR3: 00000001234b8000 CR4: 00000000000006a0 Process qemu-kvm (pid: 4245, threadinfo ffff81011f650000, task ffff8101278c27a0) Stack: ffffffff881bab94 0000000000001000 ffffffff881bc215 000000010c4b8000 ffffffff881bcd6f 0484030400000001 0000004408bfebd0 ffff810000019c10 0000000200000000 0000000100000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff881bab94>] :kvm_intel:vmcs_clear+0x1c/0x42 [<ffffffff881bc215>] :kvm_intel:alloc_vmcs_cpu+0x3a/0xb7 [<ffffffff881bcd6f>] :kvm_intel:vmx_create_vcpu+0x110/0x77c [<ffffffff883aaceb>] :kvm:kvm_vm_ioctl+0x10d/0xad0 [<ffffffff8000f966>] __alloc_pages+0x65/0x2ce [<ffffffff8000966a>] __handle_mm_fault+0x823/0xf98 [<ffffffff800225a6>] __up_read+0x19/0x7f [<ffffffff80067b58>] do_page_fault+0x4fe/0x830 [<ffffffff800426ca>] do_ioctl+0x21/0x6b [<ffffffff8003090e>] vfs_ioctl+0x457/0x4b9 [<ffffffff800b70fc>] audit_syscall_entry+0x180/0x1b3 [<ffffffff8004cd71>] sys_ioctl+0x59/0x78 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 22 36 3c 88 c2 8c 09 c3 55 48 89 fd 53 31 db 48 83 RIP [<ffffffff883a9489>] :kvm:kvm_handle_fault_on_reboot+0xb/0x16 RSP <ffff81011f651cb0> <0>Kernel panic - not syncing: Fatal exception Expected results: The guest should be start successfully. Additional info: Host CPU: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz stepping : 10 cpu MHz : 2826.235 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm bogomips : 5652.50 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
To properly fix suspend of the host with KVM, we would need the following patch set on the RHEL5 kernel: http://lkml.org/lkml/2007/5/24/108 Right now, there is no way for KVM to disable virtualization extensions on the CPU while it is being taken down, while no process can be scheduled on that CPU.
Postponing for next release
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Suspending a host running kvm VMs might crash the host since there is no way for KVM to disable virtualization extensions on the CPU while it is being taken down.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,2 +1 @@ -Suspending a host running kvm VMs might crash the host since there is no way for KVM to disable virtualization extensions on the +Currently, there is no way for KVM to disable virtualization extensions on a CPU while it is being taken down. Consequently, suspending a host running KVM-based virtual machines may cause the host to crash.-CPU while it is being taken down.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -Currently, there is no way for KVM to disable virtualization extensions on a CPU while it is being taken down. Consequently, suspending a host running KVM-based virtual machines may cause the host to crash.+Currently, KVM cannot disable virtualization extensions on a CPU while it is being taken down. Consequently, suspending a host running KVM-based virtual machines may cause the host to crash.
Should be fixed by the fix for bug #510814. Moving to POST to reflect the fix status.
The fix for bug #510814 (that should solve this bug) is in kernel-2.6.18-170.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Testing is welcome.
Have been tested kernel-2.6.18-170.el5.x86_64.rpm. Guest can be boot successfully after host resume from suspend, both windows and linux Guest runs OK, both booting from console and booting from GUI command line runs OK. Conditions have been covered: 1) Intel/AMD host 2) windows Guest/ RHEL Guest 3) Run guest from console/ Run guest from GUI pesudo terminal All these covered conditions passed.
retest on Intel Host, guest works fine after resume from suspend ( host auto resume due to 550014 )
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html