Bug 659645

Summary: Host crash during suspending to memory while kvm guest running
Product: Red Hat Enterprise Linux 5 Reporter: Xiaoli Tian <xtian>
Component: kvmAssignee: Andrea Arcangeli <aarcange>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 5.6CC: amit.shah, gcosta, michen, mkenneth, mtosatti, rhod, riel, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-09-12 17:23:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580949    

Description Xiaoli Tian 2010-12-03 10:23:05 UTC
Description of problem:
Host crash when suspending to memory while kvm guest running.

Version-Release number of selected component (if applicable):
Kernel-2.6.18-233.el5
kvm-83-219.el5


How reproducible:
98%

Steps to Reproduce:
1.Start a guest with command: 
/usr/libexec/qemu-kvm -rtc-td-hack -usbdevice tablet -no-hpet -drive file=/root/win2003_x64/win2k3_64_virtio.raw,if=virtio,boot=on,werror=stop,cache=none,format=raw,media=disk -cpu qemu64,+sse2 -smp 4 -m 8G -net nic,macaddr=00:32:34:5f:d6:2e,model=virtio,vlan=0 -net tap,script=/etc/qemu-ifup,vlan=0 -fda /root/virtio-drivers-1.0.0-45801.vfd -uuid `uuidgen` -vnc :1  -boot c -balloon none -monitor stdio

2.Suspend host to memory with command:echo mem >/sys/power/state

  
Actual results:

Host gets panic :


Expected results:


Additional info:

Detail log is as following:

Disabling non-boot CPUs ...
Breaking affinity for irq 4
CPU 1 is now offline
CPU1 is down
BUG: soft lockup - CPU#3 stuck for 60s! [qemu-kvm:3628]
CPU 3:
Modules linked in: tun radeon drm autofs4 hidp rfcomm l2cap bluetooth
lockd sund
Pid: 3628, comm: qemu-kvm Tainted: G      2.6.18-233.el5 #1
RIP: 0010:[<ffffffff80077371>]  [<ffffffff80077371>]
__smp_call_function_many+0c
RSP: 0018:ffff810202ea5b78  EFLAGS: 00000297
RAX: 0000000000000002 RBX: 0000000000000003 RCX: 0000000000000282
RDX: 00000000000008fc RSI: ffff810202ea5c18 RDI: 00000000000000fc
RBP: 0000000100000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000002 R12: 00000002000280d2
R13: ffff81000001dc10 R14: 0000004400000000 R15: 0000000000000000
FS:  0000000043dd3940(0063) GS:ffff81021fc1c640(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aae573aaaf0 CR3: 000000021cb0f000 CR4: 00000000000006e0

Call Trace:
  [<ffffffff884170f8>] :kvm:ack_flush+0x0/0x1
  [<ffffffff884170f8>] :kvm:ack_flush+0x0/0x1
  [<ffffffff80077473>] smp_call_function_many+0x38/0x4c
  [<ffffffff884187fe>] :kvm:make_all_cpus_request+0x8f/0xa4
  [<ffffffff88418828>] :kvm:kvm_flush_remote_tlbs+0xb/0x17
  [<ffffffff884221be>] :kvm:kvm_mmu_zap_page+0x202/0x3ca
  [<ffffffff88423373>] :kvm:mmu_set_spte+0x255/0x3ca
  [<ffffffff88423a5e>] :kvm:direct_map_entry+0x5a/0xf6
  [<ffffffff8842134b>] :kvm:walk_shadow+0x96/0xc3
  [<ffffffff884213af>] :kvm:__direct_map+0x36/0x43
  [<ffffffff88423a04>] :kvm:direct_map_entry+0x0/0xf6
  [<ffffffff8842520a>] :kvm:tdp_page_fault+0xdf/0x11f
  [<ffffffff88422449>] :kvm:mmu_free_roots+0x8a/0x152
  [<ffffffff8841e6f3>] :kvm:kvm_arch_vcpu_ioctl_run+0x1d3/0x61e
  [<ffffffff88419e57>] :kvm:kvm_vcpu_ioctl+0xf2/0x448
  [<ffffffff80022242>] __up_read+0x19/0x7f
  [<ffffffff8006723e>] do_page_fault+0x4fe/0x874
  [<ffffffff80042268>] do_ioctl+0x21/0x6b
  [<ffffffff80030262>] vfs_ioctl+0x457/0x4b9
  [<ffffffff8004c737>] sys_ioctl+0x59/0x78
  [<ffffffff8005d28d>] tracesys+0xd5/0xe0

BUG: soft lockup - CPU#0 stuck for 60s! [events/0:14]
CPU 0:
Modules linked in: tun radeon drm autofs4 hidp rfcomm l2cap bluetooth
lockd sund
Pid: 14, comm: events/0 Tainted: G      2.6.18-233.el5 #1
RIP: 0010:[<ffffffff80064bbc>]  [<ffffffff80064bbc>]
.text.lock.spinlock+0x2/0x0
RSP: 0018:ffff81021fa0fd88  EFLAGS: 00000286
RAX: 0000000000000000 RBX: ffffffff80313a08 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffff8007382d RDI: ffffffff80314728
RBP: 0000000000000000 R08: 0000000000000001 R09: ffff81021fa0fdc0
R10: ffff81021b5ffa00 R11: 0000000000000206 R12: ffff81021fce4000
R13: 0000000000000000 R14: ffff8100090058a0 R15: 00000000072651d8
FS:  0000000041222940(0000) GS:ffffffff80424000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaaacd5000 CR3: 000000020dc26000 CR4: 00000000000006e0

Call Trace:
  [<ffffffff8007745f>] smp_call_function_many+0x24/0x4c
  [<ffffffff8007382d>] mcheck_check_cpu+0x0/0x30
  [<ffffffff80077564>] smp_call_function+0x4e/0x5e
  [<ffffffff8007382d>] mcheck_check_cpu+0x0/0x30
  [<ffffffff80072af2>] mcheck_timer+0x0/0x6c
  [<ffffffff80095b37>] on_each_cpu+0x10/0x22
  [<ffffffff80072b0e>] mcheck_timer+0x1c/0x6c
  [<ffffffff8004d7aa>] run_workqueue+0x99/0xf6
  [<ffffffff80049ff2>] worker_thread+0x0/0x122
  [<ffffffff8004a0e2>] worker_thread+0xf0/0x122
  [<ffffffff8008e41e>] default_wake_function+0x0/0xe
  [<ffffffff80032968>] kthread+0xfe/0x132
  [<ffffffff8005dfb1>] child_rip+0xa/0x11
  [<ffffffff8003286a>] kthread+0x0/0x132
  [<ffffffff8005dfa7>] child_rip+0x0/0x11

Comment 2 Ronen Hod 2011-09-12 17:23:41 UTC
Seems like a serious bug in a scenario that probably never happens in real life, but since we did not encounter it since, it is probably not hurting anybody (or fixed), so I am closing for RHEL5.8
I suspend my laptop (FC14) with a running VM, so it was probably fixed since.