Bug 511031

Summary: qemu-kvm hang during migration when stress test is running in the guest
Product: Red Hat Enterprise Linux 5 Reporter: jason wang <jasowang>
Component: kvmAssignee: Glauber Costa <gcosta>
Status: CLOSED ERRATA QA Contact: Lawrence Lim <llim>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: cpelland, ehabkost, sghosh, shuang, tburke, tools-bugs, virt-maint, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kvm-83-93.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 09:34:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 495630    
Attachments:
Description Flags
strace result on the src host
none
strace result on the src host
none
strace result on the dst host
none
Stress test none

Description jason wang 2009-07-13 11:42:34 UTC
Description of problem:
When migrate vm with stress testing running, qemu-kvm would hang during the migration.

Version-Release number of selected component (if applicable):
Host OS version:
Linux amd-8750-4-2 2.6.18-157.el5 #1 SMP Mon Jul 6 18:12:07 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
Could also reproduce in rhev-hypervisor-5.4-2.0.99.10.3.el5rhev
Host KVM version:
etherboot-zroms-kvm-5.4.4-10.el5
kvm-debuginfo-83-87.el5
kmod-kvm-83-87.el5
kvm-83-87.el5
kvm-qemu-img-83-87.el5
kvm-tools-83-87.el5
Guest OS version:
RHEL-5.3-Server x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot the vm
2. run the stress test with the cmd line: stress -c N -i N -d N -m N where N is 2 * vcpu number
3. do the migration
  
Actual results:
1. qemu-kvm hang during migration

Expected results:
1. migration should finish successfully.

Additional info:
1. qemu-kvm cmdline:
src:
qemu-kvm -drive file=RHEL-Server-5.3-64.0.qcow2,if=ide,cache=off,index=0 -net nic,vlan=0,model=e1000,macaddr=00:33:44:55:11:22 -net tap,vlan=0 -vnc :10 -m 2048 -smp 2 -no-hpet -rtc-td-hack -cpu qemu64,+sse2 -vnc :10 -monitor stdio
dst:
qemu-kvm -drive file=RHEL-Server-5.3-64.0.qcow2,if=ide,cache=off,index=0 -net nic,vlan=0,model=e1000,macaddr=00:33:44:55:11:22 -net tap,vlan=0 -vnc :10 -m 2048 -smp 2 -no-hpet -rtc-td-hack -cpu qemu64,+sse2 -vnc :10 -monitor stdio -incoming tcp:0:4444

Comment 1 jason wang 2009-07-13 11:48:04 UTC
Created attachment 351462 [details]
strace result on the src host

Comment 2 jason wang 2009-07-13 11:51:45 UTC
Created attachment 351463 [details]
strace result on the src host

Comment 3 jason wang 2009-07-13 11:57:52 UTC
Created attachment 351464 [details]
strace result on the dst host

Comment 4 jason wang 2009-07-14 06:14:59 UTC
Could be reproduced in 83-81el5,83-71el5.

Comment 5 Glauber Costa 2009-07-14 15:04:46 UTC
It's probably a duplicate of bug 511199, due to the amount of EAGAINs in the source, and the stalling happening on recvfrom in the destination.

This report is, however, much more feature complete. I'll try to reproduce it.
But meanwhile, can you try it with the patch dor posted on that BZ?

thanks!

Comment 6 Glauber Costa 2009-07-14 15:09:31 UTC
btw, can you point me to this "stress" thing?

Comment 7 jason wang 2009-07-14 15:47:51 UTC
Created attachment 351618 [details]
Stress test

Comment 14 jason wang 2009-07-22 08:16:09 UTC
I've tested this case in 83-93el5, could not be reproduced.

Comment 18 Suqin Huang 2009-07-23 09:51:17 UTC
test on kvm-83-94.el5:

Test with 2vpus and 4vcpus (host has 4 cpu)
command used:

source:
/usr/libexec/qemu-kvm -no-hpet -rtc-td-hack -smp 2 -m 2G -name vm1 -drive file=/mnt/RHEL-Server-5.4-32.qcow2,if=ide,cache=off,index=0 -uuid d073cee9-8836-47da-b4c1-f5583f2dc747 -net nic,macaddr=00:26:9B:DE:C8:58,model=e1000 -net tap,vlan=0,script=/etc/qemu-ifup-switch -usbdevice tablet -vnc :5 -boot c -monitor stdio

run stress testing on vm:
#stress -c N -i N -d N -m N


des:
/usr/libexec/qemu-kvm -no-hpet -rtc-td-hack -smp 2 -m 2G -name vm1 -drive file=/mnt/RHEL-Server-5.4-32.qcow2,if=ide,cache=off,index=0 -uuid d073cee9-8836-47da-b4c1-f5583f2dc747 -net nic,macaddr=00:26:9B:DE:C8:58,model=e1000 -net tap,vlan=0,script=/etc/qemu-ifup-switch -usbdevice tablet -vnc :5 -boot c -monitor stdio -incoming tcp:0:6000


try five times, can not reproduce.
Change the status to *VERIFIED*, please reopen it if the issue is reproduced.

Comment 20 errata-xmlrpc 2009-09-02 09:34:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1272.html