Bug 1589634

Summary: Migration failed when rebooting guest with multiple virtio videos
Product: Red Hat Enterprise Linux 7 Reporter: yafu <yafu>
Component: qemu-kvm-rhevAssignee: Gerd Hoffmann <kraxel>
Status: CLOSED ERRATA QA Contact: Guo, Zhiyi <zhguo>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: chayang, fjin, jinzhao, juzhang, knoel, kraxel, lizhu, michen, nanliu, qzhang, virt-maint, yafu, yanqzhan, yuhuang, zhguo
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-8.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-01 11:10:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
domain xml
none
guest log on the source host
none
guest log on the target host
none
guest log on the source host 2
none
guest log on the target host 2
none
guest log on the source host 3
none
guest log on the target host 3
none
guest log on the source host 4
none
guest log on the target host 4
none
guest log on the source host - 5
none
guest log on the target host - 5 none

Description yafu 2018-06-11 03:01:27 UTC
Description of problem:
Migration failed when rebooting guest with multiple virtio videos.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.12.0-3.el7.x86_64
libvirt-4.3.0-1.el7.x86_64

How reproducible:
50%

Steps to Reproduce:
1.Start a guest with multiple virtio videos:
#virsh dumpxml iommu1
<os>
    <type arch='x86_64' machine='pc-q35-rhel7.5.0'>hvm</type>
    <boot dev='hd'/>
  </os>
...
<video>
      <model type='virtio' heads='1' primary='yes'>
        <acceleration accel3d='no'/>
      </model>
      <alias name='ua-04c2decd-4e33-4023-84de-12205c777af6'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </video>
    <video>
      <model type='virtio' heads='1'>
        <acceleration accel3d='no'/>
      </model>
      <alias name='ua-04c2decd-4e35-4023-84de-12205c777af6'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
 </video>

2.Do migration while rebooting the guest:
#virsh reboot iommu1; virsh migrate iommu1 qemu+ssh://10.66.4.101/system --live --verbose --p2p --tunnelled
Migration: [ 98 %]error: internal error: qemu unexpectedly closed the monitor: 2018-06-11T02:30:54.724223Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:01.4:00.0/virtio-gpu'
2018-06-11T02:30:54.724308Z qemu-kvm: warning: TSC frequency mismatch between VM (2593994 kHz) and host (3311999 kHz), and TSC scalin<driver iommu='on' ats='on'/>g unavailable
2018-06-11T02:30:54.724405Z qemu-kvm: warning: TSC frequency mismatch between VM (2593994 kHz) and host (3311999 kHz), and TSC scaling unavailable
2018-06-11T02:30:54.724468Z qemu-kvm: warning: TSC frequency mismatch between VM (2593994 kHz) and host (3311999 kHz), and TSC scaling unavailable
2018-06-11T02:30:54.724605Z qemu-kvm: warning: TSC frequency mismatch between VM (2593994 kHz) and host (3311999 kHz), and TSC scaling unavailable
2018-06-11T02:30:54.724705Z qemu-kvm: load of migration failed: Invalid argument
red_channel_client_disconnect: rcc=0x5626b47eb1b0 (channel=0x5626b4320220 type=5 id=0)
red_channel_client_disconnect: rcc=0x5626bfd7e1b0 (channel=0x5626b43202e0 type=6 id=0)

3.Check the qemu log on the target host:
#cat /var/log/libvirt/qemu/iommu1.log
2018-06-11T02:30:54.724187Z qemu-kvm: Failed to load virtio-gpu:virtio-gpu
2018-06-11T02:30:54.724223Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:01.4:00.0/virtio-gpu'
2018-06-11T02:30:54.724308Z qemu-kvm: warning: TSC frequency mismatch between VM (2593994 kHz) and host (3311999 kHz), and TSC scaling unavailable
2018-06-11T02:30:54.724405Z qemu-kvm: warning: TSC frequency mismatch between VM (2593994 kHz) and host (3311999 kHz), and TSC scaling unavailable
2018-06-11T02:30:54.724468Z qemu-kvm: warning: TSC frequency mismatch between VM (2593994 kHz) and host (3311999 kHz), and TSC scaling unavailable
2018-06-11T02:30:54.724605Z qemu-kvm: warning: TSC frequency mismatch between VM (2593994 kHz) and host (3311999 kHz), and TSC scaling unavailable
2018-06-11T02:30:54.724705Z qemu-kvm: load of migration failed: Invalid argument
red_channel_client_disconnect: rcc=0x5626b47eb1b0 (channel=0x5626b4320220 type=5 id=0)
red_channel_client_disconnect: rcc=0x5626bfd7e1b0 (channel=0x5626b43202e0 type=6 id=0)
2018-06-11 02:30:54.993+0000: shutting down, reason=crashed

Actual results:
Migration failed when rebooting guest with multiple virtio video.

Expected results:
Should do migration successfully.

Additional info:

Comment 2 yafu 2018-06-11 03:12:35 UTC
Created attachment 1449844 [details]
domain xml

Comment 3 Gerd Hoffmann 2018-06-13 11:02:53 UTC
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16710970
Please try again with this build.
Has additional logging (both source and target) to track down what exactly fails.

Comment 4 yafu 2018-06-14 03:23:31 UTC
Created attachment 1451084 [details]
guest log on the source host

Comment 5 yafu 2018-06-14 03:25:31 UTC
Created attachment 1451085 [details]
guest log on the target host

Comment 6 yafu 2018-06-14 03:27:06 UTC
(In reply to Gerd Hoffmann from comment #3)
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16710970
> Please try again with this build.
> Has additional logging (both source and target) to track down what exactly
> fails.

Please see the log in the attachment.

Comment 7 Gerd Hoffmann 2018-06-14 07:38:56 UTC
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16720687
please try.  hopefully fixes the bug.  debug logging still in, in case it still fails attach logs please.

Comment 8 yafu 2018-06-14 08:34:36 UTC
Created attachment 1451213 [details]
guest log on the source host 2

Comment 9 yafu 2018-06-14 08:35:12 UTC
Created attachment 1451214 [details]
guest log on the target host 2

Comment 10 yafu 2018-06-14 08:36:16 UTC
(In reply to Gerd Hoffmann from comment #7)
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16720687
> please try.  hopefully fixes the bug.  debug logging still in, in case it
> still fails attach logs please.

I still can reproduce the bug with this build. Please see the log in the attachment.

Comment 11 Gerd Hoffmann 2018-06-18 13:29:31 UTC
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16764269
more logging added, please try again.

Comment 12 yafu 2018-06-19 03:10:24 UTC
Created attachment 1452797 [details]
guest log on the source host 3

Comment 13 yafu 2018-06-19 03:11:06 UTC
Created attachment 1452798 [details]
guest log on the target host 3

Comment 14 yafu 2018-06-19 03:11:55 UTC
(In reply to Gerd Hoffmann from comment #11)
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16764269
> more logging added, please try again.

Still can reproduce the issue with this build. Please see the log in the attachment.

Comment 15 Gerd Hoffmann 2018-06-19 09:05:59 UTC
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16776712
new build with even more logging  ...

Comment 16 yafu 2018-06-20 08:17:20 UTC
Created attachment 1453146 [details]
guest log on the source host 4

Comment 17 yafu 2018-06-20 08:18:04 UTC
Created attachment 1453147 [details]
guest log on the target host 4

Comment 18 yafu 2018-06-20 08:19:06 UTC
(In reply to Gerd Hoffmann from comment #15)
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16776712
> new build with even more logging  ...

Still can reproduce the issue with this build. Please see the log in the attachment.

Comment 19 Gerd Hoffmann 2018-06-20 13:16:40 UTC
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16797459
next test build ...

Comment 20 yafu 2018-06-21 03:46:49 UTC
Created attachment 1453327 [details]
guest log on the source host - 5

Comment 21 yafu 2018-06-21 03:47:31 UTC
Created attachment 1453328 [details]
guest log on the target host - 5

Comment 22 yafu 2018-06-21 03:48:57 UTC
(In reply to Gerd Hoffmann from comment #19)
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16797459
> next test build ...

The issue can only be reproduced sometimes now(~10%). Please see the log in the attachment.

Comment 23 Gerd Hoffmann 2018-06-21 05:52:41 UTC
> The issue can only be reproduced sometimes now(~10%). Please see the log in
> the attachment.

Hmm, that appears to be a completely different issue ...

2018-06-21T03:35:41.219392Z qemu-kvm: VQ 0 size 0x40 Guest index 0xe07f inconsistent with Host index 0x436: delta 0xdc49
2018-06-21T03:35:41.219426Z qemu-kvm: Failed to load virtio-gpu:virtio

Comment 24 Gerd Hoffmann 2018-06-25 10:55:11 UTC
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16856011
one more test build

Comment 25 yafu 2018-06-28 03:18:15 UTC
(In reply to Gerd Hoffmann from comment #24)
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16856011
> one more test build

I do the test in a loop and migration still failed sometimes(~5%), and the error info in the guest log on the target host is as follows:
2018-06-27T12:38:48.500703Z qemu-kvm: VQ 0 size 0x40 Guest index 0xbf55 inconsistent with Host index 0x43a: delta 0xbb1b
2018-06-27T12:38:48.500726Z qemu-kvm: Failed to load virtio-gpu:virtio
2018-06-27T12:38:48.500737Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:01.4:00.0/virtio-gpu'
2018-06-27T12:38:48.500779Z qemu-kvm: warning: TSC frequency mismatch between VM (2099996 kHz) and host (2593995 kHz), and TSC scaling unavailable
2018-06-27T12:38:48.501780Z qemu-kvm: warning: TSC frequency mismatch between VM (2099996 kHz) and host (2593995 kHz), and TSC scaling unavailable
2018-06-27T12:38:48.501873Z qemu-kvm: warning: TSC frequency mismatch between VM (2099996 kHz) and host (2593995 kHz), and TSC scaling unavailable
2018-06-27T12:38:48.501940Z qemu-kvm: warning: TSC frequency mismatch between VM (2099996 kHz) and host (2593995 kHz), and TSC scaling unavailable
2018-06-27T12:38:48.502026Z qemu-kvm: load of migration failed: Operation not permitted
2018-06-27 12:38:48.715+0000: shutting down, reason=failed

Comment 26 Gerd Hoffmann 2018-07-02 11:18:04 UTC
(In reply to yafu from comment #25)
> (In reply to Gerd Hoffmann from comment #24)
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16856011
> > one more test build
> 
> I do the test in a loop and migration still failed sometimes(~5%), and the
> error info in the guest log on the target host is as follows:
> 2018-06-27T12:38:48.500703Z qemu-kvm: VQ 0 size 0x40 Guest index 0xbf55
> inconsistent with Host index 0x43a: delta 0xbb1b

Ok.  As this seems to be something completely different, can you open a new bug please?  You can assign it to me.  Thanks.

Comment 28 yafu 2018-07-03 10:12:35 UTC
(In reply to Gerd Hoffmann from comment #26)
> (In reply to yafu from comment #25)
> > (In reply to Gerd Hoffmann from comment #24)
> > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16856011
> > > one more test build
> > 
> > I do the test in a loop and migration still failed sometimes(~5%), and the
> > error info in the guest log on the target host is as follows:
> > 2018-06-27T12:38:48.500703Z qemu-kvm: VQ 0 size 0x40 Guest index 0xbf55
> > inconsistent with Host index 0x43a: delta 0xbb1b
> 
> Ok.  As this seems to be something completely different, can you open a new
> bug please?  You can assign it to me.  Thanks.

File a new bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1597621

Comment 30 Miroslav Rezanina 2018-07-24 14:24:44 UTC
Fix included in qemu-kvm-rhev-2.12.0-8.el7

Comment 32 liunana 2018-08-20 08:07:23 UTC
Do migration 5 times successfully on RHEL-7.6 by steps below:
1. boot qemu with "-device virtio-vga,max_outputs=2" both source host and target host.
2. do migration
3. reboot source guest
4. do migration successfully without error logs.


Additional info:
# uname -r 
3.10.0-933.el7.x86_64

#rpm -qa | grep qemu
qemu-kvm-rhev-2.12.0-10.el7.x86_64
qemu-img-rhev-2.12.0-10.el7.x86_64

Comment 33 liunana 2018-08-22 11:56:54 UTC
Also do migration 10 times successfully on RHEL-7.6 with libvirt by steps of reproduce, and both source guest and destination guest work well.

Additional info:
# uname -r 
3.10.0-933.el7.x86_64

#rpm -qa | grep qemu
qemu-kvm-rhev-2.12.0-10.el7.x86_64
qemu-img-rhev-2.12.0-10.el7.x86_64

Comment 36 errata-xmlrpc 2018-11-01 11:10:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443