Bug 1449490

Summary: [q35] guest hang after do migration with virtio-scsi-pci.
Product: Red Hat Enterprise Linux 7 Reporter: xiagao
Component: qemu-kvm-rhevAssignee: Sameeh Jubran <sjubran>
Status: CLOSED ERRATA QA Contact: jingzhao <jinzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.4CC: ailan, chayang, dgilbert, drjones, hhuang, huding, jinzhao, juzhang, knoel, lijin, michen, mrezanin, phou, qzhang, virt-bugs, virt-maint, wyu, xiagao, yvugenfi
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-7.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:38:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1376765    

Description xiagao 2017-05-10 06:47:34 UTC
Description of problem:
guest hang after do migration with scsi controller under q35


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-3.el7.x86_64
kernel-3.10.0-663.el7.x86_64
OVMF-20170228-4.gitc325e41585e3.el7.noarch

How reproducible:
3/3

Steps to Reproduce:
1. boot guest with qemu cmd line [1]
2. start dst guest with “-incoming tcp:0:5800 ”
3. doing live migration to dst host
{"execute": "migrate","arguments":{"uri": "tcp:0:5800"}}
4. after migration,check guest

Actual results:
guest hang.


Expected results:
guest runs well.

Additional info:
[1]:
/usr/libexec/qemu-kvm \
-name 'win10-64-scsi-ovmf-nfs' \
-nodefaults  \
-vga std \
-m 3G  \
-smp 4  \
-enable-kvm \
-usb -device usb-tablet \
-machine q35,smm=on,accel=kvm \
-drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,readonly=on,unit=0 \
-drive file=/home/OVMF_VARS_xiagao22.fd,if=pflash,format=raw,unit=1 \
-debugcon file:/home/ovmf.log \
-global isa-debugcon.iobase=0x402 \
-rtc base=localtime,clock=host,driftfix=slew  \
-boot order=cd,menu=on \
-vnc :1 \
-enable-kvm \
-monitor stdio \
-qmp tcp:0:1235,server,nowait \
-device ahci,id=ahci \
-drive file=virtio-win-prewhql-136.vfd,if=floppy,id=drive-fdc0-0-0,format=raw,cache=none \
-netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device e1000e,netdev=hostnet0,id=net0,mac=00:52:0a:5c:f2:1a \
-device ioh3420,bus=pcie.0,id=root1.0,slot=1 -device virtio-scsi-pci,id=scsi0,num_queues=4,bus=root1.0 \
-drive file=win1064.ovmf,if=none,id=drive-scsi-disk0,format=raw,serial=xiagao,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0 \
-device ioh3420,bus=pcie.0,id=root2.0,slot=2 -device virtio-scsi-pci,id=scsi1,num_queues=4,bus=root2.0 \
-drive file=scsi-disk3.raw,if=none,id=drive-scsi-disk1,format=raw,serial=xiagao123,cache=none -device scsi-hd,bus=scsi1.0,drive=drive-scsi-disk1,id=scsi-disk1 \
-device ioh3420,bus=pcie.0,id=root3.0,slot=3 -device virtio-scsi-pci,id=scsi2,num_queues=4,bus=root3.0 \
-drive file=en_windows_10_enterprise_version_1703_updated_march_2017_x64_dvd_10189290.iso,if=none,readonly=on,format=raw,id=cdrom1,media=cdrom -device scsi-disk,bus=scsi2.0,drive=cdrom1,id=scsi2-0 \
-device ioh3420,bus=pcie.0,id=root4.0,slot=4

Comment 2 xiagao 2017-05-10 09:35:40 UTC
Hit the same issue on q35+seabios.

Comment 3 Dr. David Alan Gilbert 2017-05-10 18:38:29 UTC
1) Can you try with a different NIC than e1000e - I'm tracking a separate e1000e migration bug
2) Does a Linux guest fail in the same way?

Comment 4 xiagao 2017-05-11 10:14:13 UTC
(In reply to Dr. David Alan Gilbert from comment #3)
> 1) Can you try with a different NIC than e1000e - I'm tracking a separate
> e1000e migration bug
Try with "virtio-net-pci", can not reproduce.

> 2) Does a Linux guest fail in the same way?
Not fail in the same way.

Comment 5 Dr. David Alan Gilbert 2017-05-11 10:30:46 UTC
(In reply to xiagao from comment #4)
> (In reply to Dr. David Alan Gilbert from comment #3)
> > 1) Can you try with a different NIC than e1000e - I'm tracking a separate
> > e1000e migration bug
> Try with "virtio-net-pci", can not reproduce.

Hmm in that case I wonder if it's related to bz 1447935 that's a windows migrate problem with e1000e.

> > 2) Does a Linux guest fail in the same way?
> Not fail in the same way.

Comment 6 Dr. David Alan Gilbert 2017-05-11 11:51:44 UTC
when you say 'guest hang' is it completely hung? Does the mouse still move?
On bz 1447935 the mouse still moves, just nothing else happens - or if it does happen it happens very slowly.


(My suspicion is the network card is giving out lots and lots of false interrupts so the guest gets nothing else done)

Comment 7 xiagao 2017-05-12 02:28:06 UTC
(In reply to Dr. David Alan Gilbert from comment #6)
> when you say 'guest hang' is it completely hung? Does the mouse still move?
> On bz 1447935 the mouse still moves, just nothing else happens - or if it
> does happen it happens very slowly.
> 
> 
> (My suspicion is the network card is giving out lots and lots of false
> interrupts so the guest gets nothing else done)

Hi, in my test environment the guest is completely hung and the mouse can not move.

Comment 13 Sameeh Jubran 2017-05-17 12:03:36 UTC
A patch that fixes the bug for e1000e device was sent to upstream Qemu: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04019.html

Comment 15 Dr. David Alan Gilbert 2017-05-18 10:01:38 UTC
*** Bug 1447935 has been marked as a duplicate of this bug. ***

Comment 17 Miroslav Rezanina 2017-05-30 15:03:25 UTC
Fix included in qemu-kvm-rhev-2.9.0-7.el7

Comment 18 jingzhao 2017-06-01 05:21:00 UTC
Reproduce the issue on qemu-kvm-rhev-2.9.0-3.el7.x86_64 and verified the issue on qemu-kvm-rhev-2.9.0-7.el7.x86_64

ps: qemu command line

/usr/libexec/qemu-kvm \
-machine q35,smm=on,accel=kvm \
-cpu Opteron_G3 \
-nodefaults -rtc base=utc \
-m 2G \
-smp 2,sockets=2,cores=1,threads=1 \
-enable-kvm \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-k en-us \
-nodefaults \
-drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/home/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
-serial unix:/tmp/serial0,server,nowait \
-debugcon file:/home/ovmf.log \
-global isa-debugcon.iobase=0x402 \
-boot menu=on \
-qmp tcp:0:6666,server,nowait \
-vga qxl \
-device pcie-root-port,bus=pcie.0,id=root3 \
-device virtio-scsi-pci,id=scsi1,bus=root3 \
-drive file=win10-ovmf-bk.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device scsi-hd,drive=drive-scsi-disk0,id=scsi-disk0,bus=scsi1.0 \
-device pcie-root-port,bus=pcie.0,id=root0,multifunction=on,chassis=1,addr=0xa.0 \
-device e1000e,netdev=tap10,mac=9a:6a:6b:6c:6d:6e,bus=root0 -netdev tap,id=tap10 \
-device pcie-root-port,bus=pcie.0,id=root1,chassis=11,addr=0xa.1 \
-device pcie-root-port,bus=pcie.0,id=root2,chassis=12,addr=0xa.2 \
-device pcie-root-port,bus=pcie.0,id=root8,slot=3 \
-cdrom /home/en_windows_10_enterprise_version_1703_updated_march_2017_x64_dvd_10189290.iso \
-device ahci,id=ahci0 \
-drive file=/usr/share/virtio-win/virtio-win-1.9.0.iso,if=none,id=drive-scsi-disk1,format=raw,cache=none,werror=stop,rerror=stop -device ide-cd,drive=drive-scsi-disk1,id=scsi-disk1,bus=ahci0.0 \
-monitor stdio \
-vnc :2 \

Comment 19 jingzhao 2017-06-01 05:21:47 UTC
According to comment 18, changed the status to verified

Comment 21 errata-xmlrpc 2017-08-02 04:38:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392