Bug 511031 - qemu-kvm hang during migration when stress test is running in the guest
Summary: qemu-kvm hang during migration when stress test is running in the guest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Glauber Costa
QA Contact: Lawrence Lim
URL:
Whiteboard:
Depends On:
Blocks: LiveMigration
TreeView+ depends on / blocked
 
Reported: 2009-07-13 11:42 UTC by jason wang
Modified: 2014-03-26 00:58 UTC (History)
8 users (show)

Fixed In Version: kvm-83-93.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 09:34:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
strace result on the src host (115 bytes, application/x-bzip-compressed-tar)
2009-07-13 11:48 UTC, jason wang
no flags Details
strace result on the src host (1.38 MB, application/x-bzip-compressed-tar)
2009-07-13 11:51 UTC, jason wang
no flags Details
strace result on the dst host (1.98 MB, application/x-bzip-compressed-tar)
2009-07-13 11:57 UTC, jason wang
no flags Details
Stress test (159.88 KB, application/x-gzip)
2009-07-14 15:47 UTC, jason wang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1272 0 normal SHIPPED_LIVE New package: kvm 2009-09-01 09:34:32 UTC

Description jason wang 2009-07-13 11:42:34 UTC
Description of problem:
When migrate vm with stress testing running, qemu-kvm would hang during the migration.

Version-Release number of selected component (if applicable):
Host OS version:
Linux amd-8750-4-2 2.6.18-157.el5 #1 SMP Mon Jul 6 18:12:07 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
Could also reproduce in rhev-hypervisor-5.4-2.0.99.10.3.el5rhev
Host KVM version:
etherboot-zroms-kvm-5.4.4-10.el5
kvm-debuginfo-83-87.el5
kmod-kvm-83-87.el5
kvm-83-87.el5
kvm-qemu-img-83-87.el5
kvm-tools-83-87.el5
Guest OS version:
RHEL-5.3-Server x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot the vm
2. run the stress test with the cmd line: stress -c N -i N -d N -m N where N is 2 * vcpu number
3. do the migration
  
Actual results:
1. qemu-kvm hang during migration

Expected results:
1. migration should finish successfully.

Additional info:
1. qemu-kvm cmdline:
src:
qemu-kvm -drive file=RHEL-Server-5.3-64.0.qcow2,if=ide,cache=off,index=0 -net nic,vlan=0,model=e1000,macaddr=00:33:44:55:11:22 -net tap,vlan=0 -vnc :10 -m 2048 -smp 2 -no-hpet -rtc-td-hack -cpu qemu64,+sse2 -vnc :10 -monitor stdio
dst:
qemu-kvm -drive file=RHEL-Server-5.3-64.0.qcow2,if=ide,cache=off,index=0 -net nic,vlan=0,model=e1000,macaddr=00:33:44:55:11:22 -net tap,vlan=0 -vnc :10 -m 2048 -smp 2 -no-hpet -rtc-td-hack -cpu qemu64,+sse2 -vnc :10 -monitor stdio -incoming tcp:0:4444

Comment 1 jason wang 2009-07-13 11:48:04 UTC
Created attachment 351462 [details]
strace result on the src host

Comment 2 jason wang 2009-07-13 11:51:45 UTC
Created attachment 351463 [details]
strace result on the src host

Comment 3 jason wang 2009-07-13 11:57:52 UTC
Created attachment 351464 [details]
strace result on the dst host

Comment 4 jason wang 2009-07-14 06:14:59 UTC
Could be reproduced in 83-81el5,83-71el5.

Comment 5 Glauber Costa 2009-07-14 15:04:46 UTC
It's probably a duplicate of bug 511199, due to the amount of EAGAINs in the source, and the stalling happening on recvfrom in the destination.

This report is, however, much more feature complete. I'll try to reproduce it.
But meanwhile, can you try it with the patch dor posted on that BZ?

thanks!

Comment 6 Glauber Costa 2009-07-14 15:09:31 UTC
btw, can you point me to this "stress" thing?

Comment 7 jason wang 2009-07-14 15:47:51 UTC
Created attachment 351618 [details]
Stress test

Comment 14 jason wang 2009-07-22 08:16:09 UTC
I've tested this case in 83-93el5, could not be reproduced.

Comment 18 Suqin Huang 2009-07-23 09:51:17 UTC
test on kvm-83-94.el5:

Test with 2vpus and 4vcpus (host has 4 cpu)
command used:

source:
/usr/libexec/qemu-kvm -no-hpet -rtc-td-hack -smp 2 -m 2G -name vm1 -drive file=/mnt/RHEL-Server-5.4-32.qcow2,if=ide,cache=off,index=0 -uuid d073cee9-8836-47da-b4c1-f5583f2dc747 -net nic,macaddr=00:26:9B:DE:C8:58,model=e1000 -net tap,vlan=0,script=/etc/qemu-ifup-switch -usbdevice tablet -vnc :5 -boot c -monitor stdio

run stress testing on vm:
#stress -c N -i N -d N -m N


des:
/usr/libexec/qemu-kvm -no-hpet -rtc-td-hack -smp 2 -m 2G -name vm1 -drive file=/mnt/RHEL-Server-5.4-32.qcow2,if=ide,cache=off,index=0 -uuid d073cee9-8836-47da-b4c1-f5583f2dc747 -net nic,macaddr=00:26:9B:DE:C8:58,model=e1000 -net tap,vlan=0,script=/etc/qemu-ifup-switch -usbdevice tablet -vnc :5 -boot c -monitor stdio -incoming tcp:0:6000


try five times, can not reproduce.
Change the status to *VERIFIED*, please reopen it if the issue is reproduced.

Comment 20 errata-xmlrpc 2009-09-02 09:34:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1272.html


Note You need to log in before you can comment on or make changes to this bug.