Bug 1086168

Summary: qemu-kvm can not cancel migration in src host when network of dst host failed
Product: Red Hat Enterprise Linux 7 Reporter: Jun Li <juli>
Component: qemu-kvmAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: dgilbert, hhuang, huding, jen, juzhang, lmiksik, meyang, michen, qzhang, rbalakri, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-1.5.3-87.el7 Doc Type: Bug Fix
Doc Text:
A failure in the destination host or network during a migration could lead to a long (~15 min) TCP timeout before migration_cancel could be used. Employ the shutdown(2) system call in migration_cancel to force the socket to be closed quickly.
Story Points: ---
Clone Of:
: 1167197 1168790 (view as bug list) Environment:
Last Closed: 2015-11-19 04:52:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1167197, 1168790    

Description Jun Li 2014-04-10 09:43:15 UTC
Description of problem:
qemu-kvm can not cancel migration in src host when network of dst host failed.
Network of dst host will be failed such as:
Scenario 1.Net cable of dst host was unplug;
Scenario 2.use iptables to drop the data from src host.

The following will via iptables(scenario 2) to descript this issue. 

Version-Release number of selected component (if applicable):
qemu-kvm-1.5.3-60.el7.x86_64
3.10.0-121.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot guest via following cli in src host and dst host.
src:
gdb --args /usr/libexec/qemu-kvm -M pc -m 4G -cpu SandyBridge,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time -smp 8,maxcpus=8 -qmp tcp::8888,server,nowait -vnc :1 -monitor stdio -boot menu=on -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -netdev tap,id=tap0,vhost=on,script=/etc/qemu-ifup,queues=4 -device virtio-net-pci,netdev=tap0,id=net0,mq=on,mac=24:be:05:15:11:11 -drive file=/dev/sdc,if=none,id=img,rerror=stop,werror=stop,format=raw -device virtio-blk-pci,drive=img,id=sys-img,scsi=off,addr=0x4 -drive file=/mnt/ISO/en_windows_server_2012_r2_x64_dvd_2707946.iso,if=none,id=cdrom0,rerror=stop,werror=stop -device ide-cd,drive=cdrom0,bus=ide.0,unit=0,id=disk-cdrom0,bootindex=1 -drive file=/mnt/virtio-win-1.7.0.iso,if=none,id=cdrom1,rerror=stop,werror=stop -device ide-cd,drive=cdrom1,bus=ide.1,unit=0,id=disk-cdrom1 -S
--
dst:
gdb --args /usr/libexec/qemu-kvm -M pc -m 4G -cpu SandyBridge,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time -smp 8,maxcpus=8 -qmp tcp::8888,server,nowait -vnc :1 -monitor stdio -boot menu=on -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -netdev tap,id=tap0,vhost=on,script=/etc/qemu-ifup,queues=4 -device virtio-net-pci,netdev=tap0,id=net0,mq=on,mac=24:be:05:15:11:11 -drive file=/dev/sdc,if=none,id=img,rerror=stop,werror=stop,format=raw -device virtio-blk-pci,drive=img,id=sys-img,scsi=off,addr=0x4 -drive file=/mnt/ISO/en_windows_server_2012_r2_x64_dvd_2707946.iso,if=none,id=cdrom0,rerror=stop,werror=stop -device ide-cd,drive=cdrom0,bus=ide.0,unit=0,id=disk-cdrom0,bootindex=1 -drive file=/mnt/virtio-win-1.7.0.iso,if=none,id=cdrom1,rerror=stop,werror=stop -device ide-cd,drive=cdrom1,bus=ide.1,unit=0,id=disk-cdrom1 -S \
-incoming tcp::5800,server,nowait 
2.do migration from src host to dst host.
(qemu) migrate -d tcp:10.66.4.247:5800
3.during migration in progress , use firewall on destination host.
# iptables -A INPUT -p tcp -d 10.66.4.247 --dport 5800 -j DROP
4.cancel this migration in src host(as this migration can not finish).
(qemu) migrate_cancel 
5.Check migration is cancel or not in src host.
(qemu) info migrate


Actual results:
after step5, can not cancel this migration in src host.
(qemu) info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: off zero-blocks: off 
Migration status: active
total time: 19296 milliseconds
expected downtime: 30 milliseconds
setup: 94 milliseconds
transferred ram: 284456 kbytes
throughput: 268.57 mbps
remaining ram: 550920 kbytes
total ram: 4211404 kbytes
duplicate: 1182023 pages
skipped: 0 pages
normal: 611672 pages
normal bytes: 2446688 kbytes

Expected results:
can cancel this migration when run migrate_cancel.

Additional info:

Comment 2 Jun Li 2014-04-10 10:09:50 UTC
Also test with qemu-kvm-rhev-1.5.3-50.el7.x86_64, hit this issue, too.

Comment 3 Qian Guo 2014-11-27 01:13:27 UTC
*** Bug 1168156 has been marked as a duplicate of this bug. ***

Comment 4 Dr. David Alan Gilbert 2015-01-08 11:13:40 UTC
Posted fix upstream to qemu-devel.

That still leaves a ~2 min timeout if you migrate to a host that's already dead; but that's a lot better than the ~15 mins that you get if it happens in the middle.

Comment 8 Miroslav Rezanina 2015-03-18 11:24:03 UTC
Fix included in qemu-kvm-1.5.3-87.el7

Comment 9 huiqingding 2015-03-27 07:37:31 UTC
Reproduce this bug using the following version:
kernel-3.10.0-234.el7.x86_64
qemu-kvm-1.5.3-60.el7.x86_64

Steps to Reproduce:
1.boot guest via following cli in src host and dst host.
src:
# /usr/libexec/qemu-kvm -M pc -m 4G -cpu SandyBridge,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time -smp 8,maxcpus=8 -qmp tcp::8888,server,nowait -vnc :1 -monitor stdio -boot menu=on -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -netdev tap,id=tap0,vhost=on,script=/etc/qemu-ifup,queues=4 -device virtio-net-pci,netdev=tap0,id=net0,mq=on,mac=24:be:05:15:11:11 -drive file=/dev/sdc,if=none,id=img,rerror=stop,werror=stop,format=raw -device virtio-blk-pci,drive=img,id=sys-img,scsi=off,addr=0x4 -drive file=/mnt/ISO/en_windows_server_2012_r2_x64_dvd_2707946.iso,if=none,id=cdrom0,rerror=stop,werror=stop -device ide-cd,drive=cdrom0,bus=ide.0,unit=0,id=disk-cdrom0,bootindex=1 -drive file=/mnt/virtio-win-1.7.0.iso,if=none,id=cdrom1,rerror=stop,werror=stop -device ide-cd,drive=cdrom1,bus=ide.1,unit=0,id=disk-cdrom1 -S
--
dst:
# /usr/libexec/qemu-kvm -M pc -m 4G -cpu SandyBridge,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time -smp 8,maxcpus=8 -qmp tcp::8888,server,nowait -vnc :1 -monitor stdio -boot menu=on -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -netdev tap,id=tap0,vhost=on,script=/etc/qemu-ifup,queues=4 -device virtio-net-pci,netdev=tap0,id=net0,mq=on,mac=24:be:05:15:11:11 -drive file=/dev/sdc,if=none,id=img,rerror=stop,werror=stop,format=raw -device virtio-blk-pci,drive=img,id=sys-img,scsi=off,addr=0x4 -drive file=/mnt/ISO/en_windows_server_2012_r2_x64_dvd_2707946.iso,if=none,id=cdrom0,rerror=stop,werror=stop -device ide-cd,drive=cdrom0,bus=ide.0,unit=0,id=disk-cdrom0,bootindex=1 -drive file=/mnt/virtio-win-1.7.0.iso,if=none,id=cdrom1,rerror=stop,werror=stop -device ide-cd,drive=cdrom1,bus=ide.1,unit=0,id=disk-cdrom1 -S \
-incoming tcp::5800,server,nowait 
2.do migration from src host to dst host.
(qemu) migrate -d tcp:10.66.9.152:5800
3. unplug the net cable of the dest host
4.cancel this migration in src host(as this migration can not finish).
(qemu) migrate_cancel 
5.Check migration is cancel or not in src host.
(qemu) info migrate


Actual results:
after step5, can not cancel this migration in src host.
(qemu) info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: off zero-blocks: off 
Migration status: active
total time: 47332 milliseconds
expected downtime: 30 milliseconds
setup: 15 milliseconds
transferred ram: 472101 kbytes
throughput: 268.57 mbps
remaining ram: 3732476 kbytes
total ram: 4211404 kbytes
duplicate: 4393 pages
skipped: 0 pages
normal: 1111387 pages
normal bytes: 4445548 kbytes
(qemu) info status
VM status: running
(qemu) info status
VM status: running

Comment 10 huiqingding 2015-03-27 07:43:54 UTC
Test this bug using the following version:
kernel-3.10.0-234.el7.x86_64
qemu-kvm-1.5.3-87.el7.x86_64

Steps to Test:
1.boot guest via following cli in src host and dst host.
src:
# /usr/libexec/qemu-kvm -M pc -m 4G -cpu SandyBridge,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time -smp 8,maxcpus=8 -qmp tcp::8888,server,nowait -vnc :1 -monitor stdio -boot menu=on -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -netdev tap,id=tap0,vhost=on,script=/etc/qemu-ifup,queues=4 -device virtio-net-pci,netdev=tap0,id=net0,mq=on,mac=24:be:05:15:11:11 -drive file=/dev/sdc,if=none,id=img,rerror=stop,werror=stop,format=raw -device virtio-blk-pci,drive=img,id=sys-img,scsi=off,addr=0x4 -drive file=/mnt/ISO/en_windows_server_2012_r2_x64_dvd_2707946.iso,if=none,id=cdrom0,rerror=stop,werror=stop -device ide-cd,drive=cdrom0,bus=ide.0,unit=0,id=disk-cdrom0,bootindex=1 -drive file=/mnt/virtio-win-1.7.0.iso,if=none,id=cdrom1,rerror=stop,werror=stop -device ide-cd,drive=cdrom1,bus=ide.1,unit=0,id=disk-cdrom1 -S
--
dst:
# /usr/libexec/qemu-kvm -M pc -m 4G -cpu SandyBridge,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time -smp 8,maxcpus=8 -qmp tcp::8888,server,nowait -vnc :1 -monitor stdio -boot menu=on -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -netdev tap,id=tap0,vhost=on,script=/etc/qemu-ifup,queues=4 -device virtio-net-pci,netdev=tap0,id=net0,mq=on,mac=24:be:05:15:11:11 -drive file=/dev/sdc,if=none,id=img,rerror=stop,werror=stop,format=raw -device virtio-blk-pci,drive=img,id=sys-img,scsi=off,addr=0x4 -drive file=/mnt/ISO/en_windows_server_2012_r2_x64_dvd_2707946.iso,if=none,id=cdrom0,rerror=stop,werror=stop -device ide-cd,drive=cdrom0,bus=ide.0,unit=0,id=disk-cdrom0,bootindex=1 -drive file=/mnt/virtio-win-1.7.0.iso,if=none,id=cdrom1,rerror=stop,werror=stop -device ide-cd,drive=cdrom1,bus=ide.1,unit=0,id=disk-cdrom1 -S \
-incoming tcp::5800,server,nowait 
2.do migration from src host to dst host.
(qemu) migrate -d tcp:10.66.9.152:5800
3. unplug the net cable of the dest host
4.cancel this migration in src host(as this migration can not finish).
(qemu) migrate_cancel 
5.Check migration is cancel or not in src host.
(qemu) info migrate


Actual results:
after step5, can cancel this migration in src host.
(qemu) info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: off zero-blocks: off 
Migration status: cancelled
total time: 0 milliseconds

Based on the above results, I think this bug has been fixed.

Comment 12 Shaolong Hu 2015-06-18 09:18:52 UTC
Verified on qemu-kvm-1.5.3-92.el7.x86_64:


1. cmd:

/usr/libexec/qemu-kvm -enable-kvm -M pc -smp 4 -m 4G -name rhel6.3-64 -uuid 3f2ea5cd-3d29-48ff-aab2-23df1b6ae213 -drive file=/root/RHEL-Server-7.2-64-virtio.qcow2,cache=none,if=none,rerror=stop,werror=stop,id=drive-virtio-disk0,format=qcow2,aio=native -device virtio-blk-pci,drive=drive-virtio-disk0,id=device-virtio-disk0,bootindex=1 -netdev tap,script=/etc/qemu-ifup,id=netdev0 -device virtio-net-pci,netdev=netdev0,id=device-net0,mac=aa:54:00:11:22:33 -boot order=cd -monitor stdio -usb -device usb-tablet,id=input0 -chardev socket,id=s1,path=/tmp/s1,server,nowait -device isa-serial,chardev=s1 -monitor tcp::1234,server,nowait -vga qxl -global qxl-vga.revision=3 -spice port=5920,disable-ticketing -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -vnc :10 -qmp tcp:0:5555,server,nowait

2. start migration, on des host, drop migration incoming packet with:
iptables -A INPUT -p tcp -d 10.66.84.12 --dport 5556 -j DROP

3. cancel migration on src host, migration cancelled immediately:

(qemu) migrate_cancel 
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
Migration status: cancelled
total time: 0 milliseconds

Comment 16 errata-xmlrpc 2015-11-19 04:52:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2213.html