Bug 1435521

Summary:

Migration failed with postcopy enabled from rhel7.3.z host to rhel7.4 host "error while loading state for instance 0x0 of device 'pci@800000020000000:05.0/virtio-rng'"

Product:

Red Hat Enterprise Linux 7

Reporter:

xianwang <xianwang>

Component:

qemu-kvm-rhev

Assignee:

Laurent Vivier <lvivier>

Status:

CLOSED ERRATA

QA Contact:

xianwang <xianwang>

Severity:

high

Docs Contact:

Priority:

high

Version:

7.4

CC:

dgibson, dgilbert, hhuang, huding, knoel, michen, mtessun, qzhang, stefanha, swadeley, virt-maint, xianwang, zhengtli

Target Milestone:

Keywords:

ZStream

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

qemu-kvm-rhev-2.9.0-2.el7

Doc Type:

Bug Fix

Doc Text:

When using post-copy migration with the virtio Random Number Generator (RNG) device, virtio-rng continued generating data while the migration was already in post-copy mode. Consequently, the migration sometimes failed and the error message "Failed to load virtio-rng:virtio" was displayed. The source code has been improved and the problem no longer occurs in the scenario described.

Story Points:

---

Clone Of:

Clones:

1450375 (view as bug list)

Environment:

Last Closed:

2017-08-02 03:39:56 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1450375

Attachments:

Description	Flags
corefile	none

Description xianwang 2017-03-24 04:46:13 UTC

Description of problem:
Do migration from rhel7.3.z host to rhel7.4 host with postcopy enabled, migration failed with error message "qemu-kvm: error while loading state for instance 0x0 of device 'pci@800000020000000:05.0/virtio-rng'"

Version-Release number of selected component (if applicable):
Host version:
(host1)
RHEL-7.3-updates-20170222.0 Server ppc64le
kernel-3.10.0-514.17.1.el7
qemu-kvm-rhev-2.6.0-28.el7_3.7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
(host2)
RHEL-7.4-20170317.n.0 Server ppc64le
kernel-3.10.0-623.el7
qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848.ppc64le
SLOF-20170303-1.git66d250e.el7.noarch

Guest:
RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso
kernel-3.10.0-514.17.1.el7
mem: 4G
vcpu: 4

How reproducible:
2/20

Steps to Reproduce:
1.Boot a guest in rhel7.3.z host with qemu cli:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pseries-rhel7.3.0 \
    -nodefaults  \
    -vga std \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial0,server,nowait \
    -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \
    -chardev socket,path=/tmp/virtio_port-vs-20170207-030401-FfusAC1v,nowait,id=idQdLRHP,server \
    -device virtserialport,id=idBu8FQH,name=vs,bus=virtio_serial_pci0.0,chardev=idQdLRHP \
    -object rng-random,filename=/dev/random,id=passthrough-rOXjKxaC \
    -device virtio-rng-pci,id=virtio-rng-pci-GVn8yzUA,rng=passthrough-rOXjKxaC,bus=pci.0,addr=05 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
    -device pci-ohci,id=usb3,bus=pci.0,addr=08 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=09 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/root/rhel73-ppc64le-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,bootindex=0 \
    -drive file=/root/r1.qcow2,format=qcow2,if=none,id=drive_plane,werror=stop,rerror=stop \
    -device virtio-blk-pci,drive=drive_plane,id=plane \
    -drive file=/root/r2.qcow2,if=none,id=drive-system-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop \
    -device spapr-vscsi,reg=0x1000,id=scsi0 \
    -device scsi-hd,drive=drive-system-disk,id=system-disk,bus=scsi0.0 \
    -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0  \
    -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -device spapr-vlan,mac=9a:4f:50:51:52:54,netdev=hostnet0,id=net0 \
    -netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 4096 \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
    -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=/root/RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso \
    -device scsi-cd,id=cd1,drive=drive_cd1,bootindex=1 \
    -drive id=drive_unattended,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=/root/staf-kvm-devel/vt_test_images/rhel73-ppc64le/ks.iso \
    -device scsi-cd,id=unattended,drive=drive_unattended \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -device usb-mouse,id=input1,bus=usb1.0,port=2 \
    -device usb-kbd,id=input2,bus=usb1.0,port=3 \
    -vnc :1  \
    -monitor stdio \
    -rtc base=utc,clock=host  \
    -boot order=cdn,once=c,menu=on,strict=on  \
    -enable-kvm  \
    -watchdog i6300esb \
    -watchdog-action reset \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0
2.Boot a guest with same qemu cli as above in rhel7.4 host, appending "incoming tcp:0:5801"
3.In guest, execute test program to generate dirty pages
#gcc test.c -o test
#./test
#cat test.c
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
int main()
{
void wakeup();
signal(SIGALRM,wakeup);
alarm(120);
char *buf = (char *) calloc(40960, 4096);
while (1) {
int i;
for (i = 0; i < 40960 * 4; i++) {
buf[i * 4096 / 4]++;
}
printf(".");
}
}
void wakeup()
{
exit(0);
}
4.In rhel7.3.z host, do migration and check the migration status, after generating dirty pages, switch to postcopy mode
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.16.69.77:5801
(qemu) info migrate_capabilities 
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on 
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on 
Migration status: active
.....other migration info
dirty sync count: 4
dirty pages rate: 9539 pages
(qemu) migrate_start_postcopy 

Actual results:
migration failed with error messages as following:

src host(rhel7.3.z)
(qemu) qemu-kvm: RP: Received invalid message 0x0000 length 0x0000
qemu-kvm: socket_writev_buffer: Got err=32 for (131328/18446744073709551615)

(qemu) info status 
VM status: paused (postmigrate)
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on 
Migration status: failed
total time: 0 milliseconds

dst host(rhel7.4)
(qemu) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x52 - used_idx 0x53
qemu-kvm: Failed to load virtio-rng:virtio
qemu-kvm: error while loading state for instance 0x0 of device 'pci@800000020000000:05.0/virtio-rng'
qemu-kvm: load of migration failed: Operation not permitted
qemu-kvm: error while loading state section id 2(ram)
stable.sh: line 46: 66525 Segmentation fault   /usr/libexec/qemu-...qemu cli

Expected results:
migration succeed and vm works well

Additional info:

Comment 2 xianwang 2017-03-24 05:17:40 UTC

About this bug reproduction, I have tested about 20 times rhel7.4 host<-->rhel7.3.z host with postcopy enabled totally, about 14 times with "-machine pseries-rhel7.3.0"and 8 times "-machine pseries-rhel7.2.0", totally this bug have reproduced two times, one time is as bug scenario that with "-machine pseries-rhel7.3.0" cli, another one is with "-machine pseries-rhel7.2.0" cli, the latter test infomation such as host package version and test steps are same as bug. The error message is as below:
src host(rhel7.4)
(qemu) migrate_start_postcopy
(qemu) qemu-kvm: RP: Received invalid message 0x0000 length 0x0000

(qemu) info status 
VM status: paused (postmigrate)
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: failed
total time: 0 milliseconds

dst host(rhel7.3.z)
(qemu) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x5f - used_idx 0x60
qemu-kvm: error while loading state for instance 0x0 of device 'pci@800000020000000:05.0/virtio-rng'
qemu-kvm: load of migration failed: Operation not permitted
stable_dst.sh: line 45: 58244 Segmentation fault    /usr/libexec/qemu-...qemu cli

About other 18 times for testing this scenario, migration succeed and vm works well. I will prepare x86 test environment later to confirm whether this bug exists and update the result.

Comment 3 Dr. David Alan Gilbert 2017-03-24 10:41:14 UTC

There's at least two separate problems here:

   a) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x5f - used_idx 0x60
          I'd seen that once before; I meant to ask Amit about it but it's a bit late :-)  Stefan's suggestion was that it might be that virtio-rng is doing an async read on the random file but then getting the data back after the source is stopped so updating the virtio state incorrectly.

   b) Even when that happens it shouldn't seg fault - so we should try and get a backtrace on that and fix that anyway.
          
xianwang:
   1) Can you reproduce this on 7.4->7.4 ?
   2) Can you capture a core dump from a failure, it would be good to get a backtrace from teh segmentation fault.

Comment 4 xianwang 2017-03-26 04:38:55 UTC

This scenario can't be test on x86, when do migration rhel7.4-->7.3.z on x86, migration is failed with error message "error while loading state for instance 0x0 of device 'apic'", as Bug 1434784, I have added comment to bug 1434784 as #comment 3, so, for now, the confirmation that whether this bug exists on x86 is blocked by bug 1434784, I will update the hardware field later if this scenario can be test on x86.

Comment 5 xianwang 2017-03-27 07:25:21 UTC

Created attachment 1266574 [details]
corefile

Comment 6 xianwang 2017-03-27 07:28:56 UTC

(In reply to Dr. David Alan Gilbert from comment #3)
> There's at least two separate problems here:
> 
>    a) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x5f - used_idx 0x60
>           I'd seen that once before; I meant to ask Amit about it but it's a
> bit late :-)  Stefan's suggestion was that it might be that virtio-rng is
> doing an async read on the random file but then getting the data back after
> the source is stopped so updating the virtio state incorrectly.
> 
>    b) Even when that happens it shouldn't seg fault - so we should try and
> get a backtrace on that and fix that anyway.
>           
> xianwang:
>    1) Can you reproduce this on 7.4->7.4 ?
>    2) Can you capture a core dump from a failure, it would be good to get a
> backtrace from teh segmentation fault.


1)OK, I will prepare test environment for testing this scenario on 7.4->7.4, and will update test result later.
2)I have test the scenario as bug said that with "-machine pseries-rhel7.3.0" for 20 times, but not reproduced this bug; Then I have test the scenario as #comment 2 said that with "-machine pseries-rhel7.2.0" about more than 10 times and reproduced one time finally, capturing a core dump file, the file is uploaded as attachment corefile.

Reproduction step and result is as follows:
rhel7.4 host->rhel7.3.z host
version information is same as bug.

steps are same as bugs except that before booting guest on dst host,executing "# ulimit -c unlimited" to get core dump file.

test result is as following:
on src host:
(qemu) migrate_start_postcopy 
(qemu) qemu-kvm: RP: Received invalid message 0x0000 length 0x0000

(qemu) info status 
VM status: paused (postmigrate)
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: failed
total time: 0 milliseconds

on dst host:
(qemu) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x167 - used_idx 0x168
qemu-kvm: error while loading state for instance 0x0 of device 'pci@800000020000000:05.0/virtio-rng'
qemu-kvm: load of migration failed: Operation not permitted
stab_dst.sh: line 43: 27551 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox off -machine pseries-rhel7.2.0....... 

(gdb) core core.27551
[New LWP 27777]
[New LWP 27581]
[New LWP 27577]
[New LWP 27579]
[New LWP 27552]
[New LWP 27776]
[New LWP 27580]
[New LWP 27578]
[New LWP 27551]
Reading symbols from /usr/libexec/qemu-kvm...Reading symbols from /usr/lib/debug/usr/libexec/qemu-kvm.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/bb/a43ebbf8d0f86b15ee350386d4ef4f9577410e
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/qemu-kvm -name avocado-vt-vm1 -sandbox off -machine pseries-rhel7.'.
Program terminated with signal 11, Segmentation fault.
#0  qemu_file_is_writable (f=<optimized out>) at migration/qemu-file.c:104
104	    return f->ops->writev_buffer || f->ops->put_buffer;

Comment 7 Laurent Vivier 2017-03-30 11:09:34 UTC

(In reply to xianwang from comment #0)
> dst host(rhel7.4)
> (qemu) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x52 - used_idx 0x53

This error comes from:

------ hw/virtio/virtio.c

   2016 int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
...
   2178             /*
   2179              * Some devices migrate VirtQueueElements that have been popped
   2180              * from the avail ring but not yet returned to the used ring.
   2181              * Since max ring size < UINT16_MAX it's safe to use modulo
   2182              * UINT16_MAX + 1 subtraction.
   2183              */
   2184             vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
   2185                                 vdev->vq[i].used_idx);
   2186             if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
   2187                 error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
   2188                              "used_idx 0x%x",
   2189                              i, vdev->vq[i].vring.num,
   2190                              vdev->vq[i].last_avail_idx,
   2191                              vdev->vq[i].used_idx);
   2192                 return -1;
   2193             }
...

The value checked has been modified by:

commit e66bcc408146730958d1a840bda85d7ad51e0cd7
Author: Halil Pasic <pasic.ibm.com>
Date:   Mon Dec 19 16:44:44 2016 +0100

    virtio: fix vq->inuse recalc after migr
    
    Correct recalculation of vq->inuse after migration for the corner case
    where the avail_idx has already wrapped but used_idx not yet.
    
    Also change the type of the VirtQueue.inuse to unsigned int. This is
    done to be consistent with other members representing sizes (VRing.num),
    and because C99 guarantees max ring size < UINT_MAX but does not
    guarantee max ring size < INT_MAX.
    
    Signed-off-by: Halil Pasic <pasic.ibm.com>
    Fixes: bccdef6b ("virtio: recalculate vq->inuse after migration")
    CC: qemu-stable
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>
    Reviewed-by: Stefan Hajnoczi <stefanha>

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 933a3d7..8357218 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -93,7 +93,7 @@ struct VirtQueue
 
     uint16_t queue_index;
 
-    int inuse;
+    unsigned int inuse;
 
     uint16_t vector;
     VirtIOHandleOutput handle_output;
@@ -1878,9 +1878,11 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
             /*
              * Some devices migrate VirtQueueElements that have been popped
              * from the avail ring but not yet returned to the used ring.
+             * Since max ring size < UINT16_MAX it's safe to use modulo
+             * UINT16_MAX + 1 subtraction.
              */
-            vdev->vq[i].inuse = vdev->vq[i].last_avail_idx -
-                                vdev->vq[i].used_idx;
+            vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
+                                vdev->vq[i].used_idx);
             if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
                 error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
                              "used_idx 0x%x",

Comment 8 Laurent Vivier 2017-03-30 11:28:14 UTC

Before commit e66bcc4 ("virtio: fix vq->inuse recalc after migr"), it was working fine as "inuse" is signed and -1 < 8, not it is signed and becomes 0xffff > 8.

Stefan, any comment?

Comment 9 Laurent Vivier 2017-03-30 11:29:32 UTC

Before commit e66bcc4 ("virtio: fix vq->inuse recalc after migr"), it was working fine as "inuse" is signed and -1 < 8, now it is unsigned and becomes 0xffff > 8.

Stefan, any comment?

Comment 10 Dr. David Alan Gilbert 2017-03-30 12:38:25 UTC

H(In reply to xianwang from comment #6)
> (In reply to Dr. David Alan Gilbert from comment #3)
> > There's at least two separate problems here:
> > 
> >    a) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x5f - used_idx 0x60
> >           I'd seen that once before; I meant to ask Amit about it but it's a
> > bit late :-)  Stefan's suggestion was that it might be that virtio-rng is
> > doing an async read on the random file but then getting the data back after
> > the source is stopped so updating the virtio state incorrectly.
> > 
> >    b) Even when that happens it shouldn't seg fault - so we should try and
> > get a backtrace on that and fix that anyway.
> >           
> > xianwang:
> >    1) Can you reproduce this on 7.4->7.4 ?
> >    2) Can you capture a core dump from a failure, it would be good to get a
> > backtrace from teh segmentation fault.
> 
> 
> 1)OK, I will prepare test environment for testing this scenario on 7.4->7.4,
> and will update test result later.
> 2)I have test the scenario as bug said that with "-machine
> pseries-rhel7.3.0" for 20 times, but not reproduced this bug; Then I have
> test the scenario as #comment 2 said that with "-machine pseries-rhel7.2.0"
> about more than 10 times and reproduced one time finally, capturing a core
> dump file, the file is uploaded as attachment corefile.
> 
> Reproduction step and result is as follows:
> rhel7.4 host->rhel7.3.z host
> version information is same as bug.
> 
> steps are same as bugs except that before booting guest on dst
> host,executing "# ulimit -c unlimited" to get core dump file.
> 
> test result is as following:
> on src host:
> (qemu) migrate_start_postcopy 
> (qemu) qemu-kvm: RP: Received invalid message 0x0000 length 0x0000
> 
> (qemu) info status 
> VM status: paused (postmigrate)
> (qemu) info migrate
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks:
> off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
> Migration status: failed
> total time: 0 milliseconds
> 
> on dst host:
> (qemu) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x167 - used_idx 0x168
> qemu-kvm: error while loading state for instance 0x0 of device
> 'pci@800000020000000:05.0/virtio-rng'
> qemu-kvm: load of migration failed: Operation not permitted
> stab_dst.sh: line 43: 27551 Segmentation fault      (core dumped)
> /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox off -machine
> pseries-rhel7.2.0....... 
> 
> (gdb) core core.27551
> [New LWP 27777]
> [New LWP 27581]
> [New LWP 27577]
> [New LWP 27579]
> [New LWP 27552]
> [New LWP 27776]
> [New LWP 27580]
> [New LWP 27578]
> [New LWP 27551]
> Reading symbols from /usr/libexec/qemu-kvm...Reading symbols from
> /usr/lib/debug/usr/libexec/qemu-kvm.debug...done.
> done.
> Missing separate debuginfo for 
> Try: yum --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/bb/a43ebbf8d0f86b15ee350386d4ef4f9577410e
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/libexec/qemu-kvm -name avocado-vt-vm1 -sandbox
> off -machine pseries-rhel7.'.
> Program terminated with signal 11, Segmentation fault.
> #0  qemu_file_is_writable (f=<optimized out>) at migration/qemu-file.c:104
> 104	    return f->ops->writev_buffer || f->ops->put_buffer;

That isn't enough information, please use the command in gdb:

 t a a bt full

that will generate a lot more information.

Dave

Comment 12 xianwang 2017-03-31 03:13:25 UTC

(In reply to Dr. David Alan Gilbert from comment #10)
> H(In reply to xianwang from comment #6)
> > (In reply to Dr. David Alan Gilbert from comment #3)
> > > There's at least two separate problems here:
> > > 
> > >    a) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x5f - used_idx 0x60
> > >           I'd seen that once before; I meant to ask Amit about it but it's a
> > > bit late :-)  Stefan's suggestion was that it might be that virtio-rng is
> > > doing an async read on the random file but then getting the data back after
> > > the source is stopped so updating the virtio state incorrectly.
> > > 
> > >    b) Even when that happens it shouldn't seg fault - so we should try and
> > > get a backtrace on that and fix that anyway.
> > >           
> > > xianwang:
> > >    1) Can you reproduce this on 7.4->7.4 ?
> > >    2) Can you capture a core dump from a failure, it would be good to get a
> > > backtrace from teh segmentation fault.
> > 
> > 
> > 1)OK, I will prepare test environment for testing this scenario on 7.4->7.4,
> > and will update test result later.
> > 2)I have test the scenario as bug said that with "-machine
> > pseries-rhel7.3.0" for 20 times, but not reproduced this bug; Then I have
> > test the scenario as #comment 2 said that with "-machine pseries-rhel7.2.0"
> > about more than 10 times and reproduced one time finally, capturing a core
> > dump file, the file is uploaded as attachment corefile.
> > 
> > Reproduction step and result is as follows:
> > rhel7.4 host->rhel7.3.z host
> > version information is same as bug.
> > 
> > steps are same as bugs except that before booting guest on dst
> > host,executing "# ulimit -c unlimited" to get core dump file.
> > 
> > test result is as following:
> > on src host:
> > (qemu) migrate_start_postcopy 
> > (qemu) qemu-kvm: RP: Received invalid message 0x0000 length 0x0000
> > 
> > (qemu) info status 
> > VM status: paused (postmigrate)
> > (qemu) info migrate
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks:
> > off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
> > Migration status: failed
> > total time: 0 milliseconds
> > 
> > on dst host:
> > (qemu) qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x167 - used_idx 0x168
> > qemu-kvm: error while loading state for instance 0x0 of device
> > 'pci@800000020000000:05.0/virtio-rng'
> > qemu-kvm: load of migration failed: Operation not permitted
> > stab_dst.sh: line 43: 27551 Segmentation fault      (core dumped)
> > /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox off -machine
> > pseries-rhel7.2.0....... 
> > 
> > (gdb) core core.27551
> > [New LWP 27777]
> > [New LWP 27581]
> > [New LWP 27577]
> > [New LWP 27579]
> > [New LWP 27552]
> > [New LWP 27776]
> > [New LWP 27580]
> > [New LWP 27578]
> > [New LWP 27551]
> > Reading symbols from /usr/libexec/qemu-kvm...Reading symbols from
> > /usr/lib/debug/usr/libexec/qemu-kvm.debug...done.
> > done.
> > Missing separate debuginfo for 
> > Try: yum --enablerepo='*debug*' install
> > /usr/lib/debug/.build-id/bb/a43ebbf8d0f86b15ee350386d4ef4f9577410e
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `/usr/libexec/qemu-kvm -name avocado-vt-vm1 -sandbox
> > off -machine pseries-rhel7.'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  qemu_file_is_writable (f=<optimized out>) at migration/qemu-file.c:104
> > 104	    return f->ops->writev_buffer || f->ops->put_buffer;
> 
> That isn't enough information, please use the command in gdb:
> 
>  t a a bt full
> 
> that will generate a lot more information.
> 
> Dave

OK, I am queueing to reserve hosts in beaker and because the reproduction rate of this bug is low, so, I will update the result after I re-test this scenario. I will use the package that Laurent give https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12910461 to re-test this scenario.

Comment 13 Stefan Hajnoczi 2017-03-31 11:12:09 UTC

(In reply to Laurent Vivier from comment #9)
> Before commit e66bcc4 ("virtio: fix vq->inuse recalc after migr"), it was
> working fine as "inuse" is signed and -1 < 8, now it is unsigned and becomes
> 0xffff > 8.

Commit e66bcc4 is probably not relevant to this bug.  Before that commit the comparison was 0xffffffff > 8, now it is 0xffff > 8.  The error message would have been printed before commit e66bcc4 just the same.

C converts int > unsigned int into an unsigned comparison so it was not -1 > 8 == false (i.e. -1 < 8) as indicated in Comment 9.  Instead it already evaluated to true.

I suggest looking for a migration consistency bug as mentioned by David Gilbert:

In order to increment used_idx we must have incremented last_avail_idx first, because it's impossible for the device to "use" a descriptor before it became "available".

last_avail_idx cannot be smaller than used_idx, except for the uint16_t wrapping case covered by commit e66bcc4, but we are definitely not spanning the 65535 -> 1 overflow range in this bug report.

used_idx == 0x60 and last_avail_idx == 0x5f is an invalid state.  last_avail_idx is migrated as part of the device's vmstate.  used_idx is loaded from guest memory.

Perhaps the vring memory page containing used_idx is more up-to-date than the device vmstate that was transferred?  I wonder what the last_avail_idx value is on the *source* QEMU when the crash occurs, perhaps it is no longer 0x5f.

Comment 14 Laurent Vivier 2017-03-31 12:25:51 UTC

Thanks Stefan.

In virtio-rng driver, the function calling virtqueue_pop() and virtqueue_push() is called by a timer. So I guess this function can run while the migration is running, and perhaps it can explain the inconsistency between guest memory and device vmstate?

Comment 15 Stefan Hajnoczi 2017-03-31 13:19:29 UTC

(In reply to Laurent Vivier from comment #14)
> In virtio-rng driver, the function calling virtqueue_pop() and
> virtqueue_push() is called by a timer. So I guess this function can run
> while the migration is running, and perhaps it can explain the inconsistency
> between guest memory and device vmstate?

Good idea, that's probably what triggers the bug.

Comment 16 xianwang 2017-04-01 10:35:42 UTC

This bug is hit when test post-copy on x86_64 with rhel7.4
Bug reproduction is as following:
versions(both src and dst):
3.10.0-635.el7.x86_64
qemu-img-rhev-2.9.0-0.el7.patchwork201703291116.x86_64
seabios-bin-1.10.2-1.el7.noarch
src and dst hosts are different cpu model.

steps:
(1)Boot a guest in src host with qemu cli:
/usr/libexec/qemu-kvm \
    -name 'vm1'  \
    -sandbox off  \
    -machine pc-i440fx-rhel7.4.0 \
    -nodefaults  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \
    -chardev socket,path=/tmp/virtio_port-vs-20170207-030401-FfusAC1v,nowait,id=idQdLRHP,server \
    -device virtserialport,id=idBu8FQH,name=vs,bus=virtio_serial_pci0.0,chardev=idQdLRHP \
    -object rng-random,filename=/dev/random,id=passthrough-rOXjKxaC \
    -device virtio-rng-pci,id=virtio-rng-pci-GVn8yzUA,rng=passthrough-rOXjKxaC,bus=pci.0,addr=05 \
    -device usb-ehci,id=usb1,bus=pci.0,addr=06 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=09 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/root/rhel74-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,bootindex=0 \
    -object iothread,id=iothread0 \
    -drive file=/root/r1.qcow2,format=qcow2,if=none,id=drive_plane,werror=stop,rerror=stop \
    -device virtio-blk-pci,iothread=iothread0,bus=pci.0,drive=drive_plane,id=plane \
    -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0  \
    -netdev tap,id=idjlQN53,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 4096 \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/root/RHEL-7.3-20161019.0-Server-x86_64-dvd1.iso \
    -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -device usb-mouse,id=input1,bus=usb1.0,port=2 \
    -device usb-kbd,id=input2,bus=usb1.0,port=3 \
    -vnc :1 \
    -qmp tcp:0:8881,server,nowait \
    -vga std \
    -monitor stdio \
    -rtc base=utc,clock=host  \
    -boot order=cdn,once=c,menu=on,strict=off  \
    -enable-kvm  \
    -watchdog i6300esb \
    -watchdog-action reset \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0
(2)Boot a guest on dst host in gdb with same qemu cli as src host, appending "gdb --args /usr/libexec/qemu-kvm" and "incoming tcp:0:5801"
(gdb) r
(3)on src host,enable post-copy and switch to post-copy mode 
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.66.10.208:5801
(qemu) migrate_start_postcopy 
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: postcopy-active
total time: 40158 milliseconds
expected downtime: 244745 milliseconds
setup: 15 milliseconds
transferred ram: 11079 kbytes
throughput: 9.01 mbps
remaining ram: 4199736 kbytes
total ram: 4211528 kbytes
duplicate: 188 pages
skipped: 0 pages
normal: 2764 pages
normal bytes: 11056 kbytes
dirty sync count: 2
dirty pages rate: 67260 pages
postcopy request count: 16
(4)check the migration status 

Actual result:
migration failed and generating core dump file.

on src:
(qemu) qemu-kvm: RP: Received invalid message 0x0000 length 0x0000

(qemu) info status 
VM status: paused (postmigrate)
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: failed
total time: 0 milliseconds

on dst:
(qemu) [New Thread 0x7ffedb5fe700 (LWP 5990)]
[New Thread 0x7ffedadfd700 (LWP 5991)]
qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x18 - used_idx 0x19
qemu-kvm: Failed to load virtio-rng:virtio
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-rng'
qemu-kvm: load of migration failed: Operation not permitted

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffedadfd700 (LWP 5991)]
0x0000000000000000 in ?? ()
Missing separate debuginfos, use: debuginfo-install..........
(gdb) t a a bt full

Thread 13 (Thread 0x7ffedadfd700 (LWP 5991)):
#0  0x0000000000000000 in  ()
#1  0x00005555559602dc in qemu_fill_buffer (f=f@entry=0x55555acd0000) at migration/qemu-file.c:295
        len = <optimized out>
        pending = 1447
        __PRETTY_FUNCTION__ = "qemu_fill_buffer"
#2  0x0000555555960af8 in qemu_peek_buffer (f=f@entry=0x55555acd0000, buf=buf@entry=0x7ffedadfc5f0, size=size@entry=4096, offset=offset@entry=0) at migration/qemu-file.c:455
        received = <optimized out>
        pending = 1447
        index = 0
        __PRETTY_FUNCTION__ = "qemu_peek_buffer"
#3  0x0000555555960c35 in qemu_get_buffer (f=f@entry=0x55555acd0000, buf=0x7ffff7fe5000 "", size=size@entry=4096)
    at migration/qemu-file.c:493
        res = <optimized out>
        src = 0x0
        pending = 4096
        done = 0
#4  0x0000555555960cb8 in qemu_get_buffer_in_place (f=f@entry=0x55555acd0000, buf=buf@entry=0x7ffedadfc6a8, size=4096)
    at migration/qemu-file.c:540
#5  0x00005555557bec0d in ram_load (f=0x55555acd0000) at /usr/src/debug/qemu-2.9.0/migration/ram.c:2480
        host = 0x7fff9b1ce000
        block = 0x55555757e000
        addr = 3147620352
        page_buffer = <optimized out>
        place_source = 0x7ffff7fe5000
        ch = <optimized out>
        flags = <optimized out>
        ret = 0
        mis = <optimized out>
        postcopy_host_page = <optimized out>
        last_host = 0x7fff9b1ce000
        all_zero = false
---Type <return> to continue, or q <return> to quit---
        place_needed = true
        matching_page_sizes = true
        flags = 0
        ret = <optimized out>
        seq_iter = 51
        len = 0
        postcopy_running = <optimized out>
        postcopy_advised = <optimized out>
#6  0x00005555557bec0d in ram_load (f=0x55555acd0000, opaque=<optimized out>, version_id=<optimized out>)
    at /usr/src/debug/qemu-2.9.0/migration/ram.c:2540
        flags = 0
        ret = <optimized out>
        seq_iter = 51
        len = 0
        postcopy_running = <optimized out>
        postcopy_advised = <optimized out>
#7  0x00005555557bfdf5 in qemu_loadvm_state_main (mis=0x555556064360 <mis_current.31919>, f=0x55555acd0000)
    at /usr/src/debug/qemu-2.9.0/migration/savevm.c:1985
        section_id = 3
        le = 0x55555af31680
        ret = <optimized out>
        section_type = <optimized out>
        ret = 0
#8  0x00005555557bfdf5 in qemu_loadvm_state_main (f=f@entry=0x55555acd0000, mis=mis@entry=0x555556064360 <mis_current.31919>) at /usr/src/debug/qemu-2.9.0/migration/savevm.c:2016
        section_type = <optimized out>
        ret = 0
#9  0x00005555557c0750 in postcopy_ram_listen_thread (opaque=0x55555acd0000)
    at /usr/src/debug/qemu-2.9.0/migration/savevm.c:1519
        f = 0x55555acd0000
        mis = 0x555556064360 <mis_current.31919>
        load_res = <optimized out>
        __func__ = "postcopy_ram_listen_thread"
#10 0x00007fffedcc3dc5 in start_thread () at /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#11 0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 12 (Thread 0x7ffedb5fe700 (LWP 5990)):
#0  0x00007fffed9e7dfd in poll () at /lib64/libc.so.6
#1  0x0000555555961b64 in postcopy_ram_fault_thread (__timeout=-1, __nfds=2, __fds=0x7ffedb5fd920)
    at /usr/include/bits/poll2.h:46
        rb_offset = 3147616256
        pfd = {{fd = 29, events = 1, revents = 0}, {fd = 31, events = 1, revents = 0}}
        mis = 0x555556064360 <mis_current.31919>
        msg = 
              {event = 18 '\022', reserved1 = 0 '\000', reserved2 = 0, reserved3 = 0, arg = {pagefault = {flags = 0, address = 140735795744898}, fork = {ufd = 0}, remap = {from = 0, to = 140735795744898, len = 0}, madv_dn = {start = 0, end = 140735795744898}, reserved = {reserved1 = 0, reserved2 = 140735795744898, reserved3 = 0}}}
        ret = <optimized out>
        rb = <optimized out>
        last_rb = 0x55555757e000
        __func__ = "postcopy_ram_fault_thread"
#2  0x0000555555961b64 in postcopy_ram_fault_thread (opaque=0x555556064360 <mis_current.31919>)
    at migration/postcopy-ram.c:437
        rb_offset = 3147616256
        pfd = {{fd = 29, events = 1, revents = 0}, {fd = 31, events = 1, revents = 0}}
        mis = 0x555556064360 <mis_current.31919>
        msg = 
              {event = 18 '\022', reserved1 = 0 '\000', reserved2 = 0, reserved3 = 0, arg = {pagefault = {flags = 0, address = 140735795744898}, fork = {ufd = 0}, remap = {from = 0, to = 140735795744898, len = 0}, madv_dn = {start = 0, end = 140735795744898}, reserved = {reserved1 = 0, reserved2 = 140735795744898, reserved3 = 0}}}
        ret = <optimized out>
        rb = <optimized out>
        last_rb = 0x55555757e000
        __func__ = "postcopy_ram_fault_thread"
#3  0x00007fffedcc3dc5 in start_thread () at /lib64/libpthread.so.0
#4  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 11 (Thread 0x7ffedbdff700 (LWP 5989)):
---Type <return> to continue, or q <return> to quit---
#0  0x00007fffed9ecee7 in madvise () at /lib64/libc.so.6
#1  0x00007fffedcc3e71 in start_thread () at /lib64/libpthread.so.0
#2  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 10 (Thread 0x7fffe2bbc700 (LWP 5988)):
#0  0x00007fffedcc2c90 in __nptl_death_event () at /lib64/libpthread.so.0
#1  0x00007fffedcc3f1c in start_thread () at /lib64/libpthread.so.0
#2  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 9 (Thread 0x7ffedddff700 (LWP 5950)):
#0  0x00007fffedcc76d5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000555555a6a869 in qemu_cond_wait (cond=cond@entry=0x555556cb2e80, mutex=mutex@entry=0x555556cb2eb0)
    at util/qemu-thread-posix.c:133
        err = <optimized out>
        __func__ = "qemu_cond_wait"
#2  0x00005555559aa37b in vnc_worker_thread_loop (queue=queue@entry=0x555556cb2e80) at ui/vnc-jobs.c:205
        job = <optimized out>
        entry = <optimized out>
        tmp = <optimized out>
        vs = 
            {sioc = 0x0, ioc = 0x0, ioc_tag = 0, disconnecting = 0, dirty = {{0, 0, 0} <repeats 2048 times>}, lossy_rect = 0x0, vd = 0x0, need_update = 0, force_update = 0, has_dirty = 0, features = 0, absolute = 0, last_x = 0, last_y = 0, last_bmask = 0, client_width = 0, client_height = 0, share_mode = 0, vnc_encoding = 0, major = 0, minor = 0, auth = 0, subauth = 0, challenge = '\000' <repeats 15 times>, tls = 0x0, sasl = {conn = 0x0, wantSSF = false, runSSF = false, waitWriteSSF = 0, encoded = 0x0, encodedLength = 0, encodedOffset = 0, username = 0x0, mechlist = 0x0}, encode_ws = false, websocket = false, info = 0x0, output = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, input = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, write_pixels = 0x0, client_pf = {bits_per_pixel = 0 '\000', bytes_per_pixel = 0 '\000', depth = 0 '\000', rmask = 0, gmask = 0, bmask = 0, amask = 0, rshift = 0 '\000', gshift = 0 '\000', bshift = 0 '\000', ashift = 0 '\000', rmax = 0 '\000', gmax = 0 '\000', bmax = 0 '\000', amax = 0 '\000', rbits = 0 '\000', gbits = 0 '\000', bbits = 0 '\000', abits = 0 '\000'}, client_format = 0, client_be = false, audio_cap = 0x0, as = {freq = 0, nchannels = 0, fmt = AUD_FMT_U8, endianness = 0}, read_handler = 0x0, read_handler_expect = 0, modifiers_state = '\000' <repeats 255 times>, abort = false, output_mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, bh = 0x0, jobs_buffer = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, tight = {type = 0, quality = 0 '\000', compression = 0 '\000', pixel24 = 0 '\000---Type <return> to continue, or q <return> to quit---
', tight = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, tmp = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, zlib = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, gradient = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, png = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, levels = {0, 0, 0, 0}, stream = {{next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0, zfree = 0x0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0, zfree = 0x0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0, zfree = 0x0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0, zfree = 0x0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}}}, zlib = {zlib = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, tmp = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, stream = {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0, zfree = 0x0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, level = 0}, hextile = {send_tile = 0x0}, zrle = {type = 0, fb = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, zrle = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, tmp = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, zlib = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, stream = {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0, zfree = 0x0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, palette = {pool = {{idx = 0, color = 0, next = {le_next = 0x0, le_prev = 0x0}} <repeats 256 times>}, size = 0, max = 0, bpp = 0, table = {{lh_first = 0x0} <repeats 256 times>}}}, zywrle = {buf = {0 <repeats 4096 times>}}, mouse_mode_notifier = {notify = 0x0, node = {le_next = 0x0, le_prev = 0x0}}, next = {tqe_next = 0x0, tqe_prev = 0x0}}
        n_rectangles = <optimized out>
        saved_offset = <optimized out>
#3  0x00005555559aa8b8 in vnc_worker_thread (arg=0x555556cb2e80) at ui/vnc-jobs.c:312
        queue = 0x555556cb2e80
#4  0x00007fffedcc3dc5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 8 (Thread 0x7fffe01ae700 (LWP 5939)):
#0  0x00007fffedcc76d5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000555555a6a869 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555605c160 <qemu_global_mutex>)
    at util/qemu-thread-posix.c:133
        err = <optimized out>
        __func__ = "qemu_cond_wait"
#2  0x0000555555797d6b in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /usr/src/debug/qemu-2.9.0/cpus.c:1054
---Type <return> to continue, or q <return> to quit---
        cpu = 0x55555755e000
        r = <optimized out>
#3  0x0000555555797d6b in qemu_kvm_cpu_thread_fn (arg=0x55555755e000) at /usr/src/debug/qemu-2.9.0/cpus.c:1092
        cpu = 0x55555755e000
        r = <optimized out>
#4  0x00007fffedcc3dc5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 7 (Thread 0x7fffe09af700 (LWP 5938)):
#0  0x00007fffedcc76d5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000555555a6a869 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555605c160 <qemu_global_mutex>)
    at util/qemu-thread-posix.c:133
        err = <optimized out>
        __func__ = "qemu_cond_wait"
#2  0x0000555555797d6b in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /usr/src/debug/qemu-2.9.0/cpus.c:1054
        cpu = 0x55555753e000
        r = <optimized out>
#3  0x0000555555797d6b in qemu_kvm_cpu_thread_fn (arg=0x55555753e000) at /usr/src/debug/qemu-2.9.0/cpus.c:1092
        cpu = 0x55555753e000
        r = <optimized out>
#4  0x00007fffedcc3dc5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 6 (Thread 0x7fffe11b0700 (LWP 5937)):
#0  0x00007fffedcc76d5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000555555a6a869 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555605c160 <qemu_global_mutex>)
    at util/qemu-thread-posix.c:133
        err = <optimized out>
        __func__ = "qemu_cond_wait"
#2  0x0000555555797d6b in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /usr/src/debug/qemu-2.9.0/cpus.c:1054
        cpu = 0x555557522000
        r = <optimized out>
#3  0x0000555555797d6b in qemu_kvm_cpu_thread_fn (arg=0x555557522000) at /usr/src/debug/qemu-2.9.0/cpus.c:1092
        cpu = 0x555557522000
---Type <return> to continue, or q <return> to quit---
        r = <optimized out>
#4  0x00007fffedcc3dc5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 5 (Thread 0x7fffe19b1700 (LWP 5936)):
#0  0x00007fffedcc76d5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000555555a6a869 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x55555605c160 <qemu_global_mutex>)
    at util/qemu-thread-posix.c:133
        err = <optimized out>
        __func__ = "qemu_cond_wait"
#2  0x0000555555797d6b in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /usr/src/debug/qemu-2.9.0/cpus.c:1054
        cpu = 0x5555574a4000
        r = <optimized out>
#3  0x0000555555797d6b in qemu_kvm_cpu_thread_fn (arg=0x5555574a4000) at /usr/src/debug/qemu-2.9.0/cpus.c:1092
        cpu = 0x5555574a4000
        r = <optimized out>
#4  0x00007fffedcc3dc5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 3 (Thread 0x7fffe33bd700 (LWP 5923)):
#0  0x00007fffed9e7ebf in ppoll () at /lib64/libc.so.6
#1  0x0000555555a66b9b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/bits/poll2.h:77
#2  0x0000555555a66b9b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1)
    at util/qemu-timer.c:322
#3  0x0000555555a68835 in aio_poll (ctx=0x555556ceb840, blocking=blocking@entry=true) at util/aio-posix.c:622
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = -1
        start = 197397265019026
        __PRETTY_FUNCTION__ = "aio_poll"
#4  0x000055555586eebe in iothread_run (opaque=0x555556d42b00) at iothread.c:59
---Type <return> to continue, or q <return> to quit---
        iothread = 0x555556d42b00
#5  0x00007fffedcc3dc5 in start_thread () at /lib64/libpthread.so.0
#6  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7fffe3bbe700 (LWP 5922)):
#0  0x00007fffedcca1bd in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007fffedcc5d02 in _L_lock_791 () at /lib64/libpthread.so.0
#2  0x00007fffedcc5c08 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x0000555555a6a709 in qemu_mutex_lock (mutex=mutex@entry=0x55555605c160 <qemu_global_mutex>)
    at util/qemu-thread-posix.c:60
        err = <optimized out>
        __func__ = "qemu_mutex_lock"
#4  0x0000555555797bdc in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.9.0/cpus.c:1516
#5  0x0000555555a7a11e in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:257
        tries = 1
        n = 650
        node = <optimized out>
#6  0x00007fffedcc3dc5 in start_thread () at /lib64/libpthread.so.0
#7  0x00007fffed9f273d in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7ffff7f73cc0 (LWP 5918)):
#0  0x00007fffedcc4ef7 in pthread_join () at /lib64/libpthread.so.0
#1  0x0000555555a6adef in qemu_thread_join (thread=<optimized out>) at util/qemu-thread-posix.c:504
        err = <optimized out>
        ret = 0x7fffef4ca8c0
        __func__ = "qemu_thread_join"
#2  0x00005555557bf023 in migrate_decompress_threads_join () at /usr/src/debug/qemu-2.9.0/migration/ram.c:2336
        i = <optimized out>
        thread_count = <optimized out>
#3  0x000055555595bd98 in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:439
        f = <optimized out>
        mis = <optimized out>
        ps = <optimized out>
        ret = <optimized out>
---Type <return> to continue, or q <return> to quit---
#4  0x0000555555a7b29a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
        self = 0x555557121400
        co = 0x555557121400
#5  0x00007fffed941cf0 in __start_context () at /lib64/libc.so.6
#6  0x00007fffffffcf20 in  ()
#7  0x0000000000000000 in  ()
(gdb) 


(5)the cpu info of src host:
[root@dell-per630-01 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               1829.718
CPU max MHz:           3200.0000
CPU min MHz:           1200.0000
BogoMIPS:              4800.30
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc

(6)the cpu info of dst host:
[root@dhcp-10-208 ~]# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping	: 3
microcode	: 0x20
cpu MHz		: 3899.507
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bogomips	: 6784.60
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

src host is Haswell while dst host is i7-4770

Comment 17 Laurent Vivier 2017-04-03 08:47:47 UTC

(In reply to xianwang from comment #16)
> This bug is hit when test post-copy on x86_64 with rhel7.4
> Bug reproduction is as following:
> versions(both src and dst):
> 3.10.0-635.el7.x86_64
> qemu-img-rhev-2.9.0-0.el7.patchwork201703291116.x86_64
> seabios-bin-1.10.2-1.el7.noarch
> src and dst hosts are different cpu model.

Change hardware to all.

Comment 18 Laurent Vivier 2017-04-06 18:54:43 UTC

As I'm not able to reproduce it on my system with your disk image, could you try to reproduce with these packages:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12965632

Comment 19 xianwang 2017-04-07 10:11:38 UTC

(In reply to Laurent Vivier from comment #18)
> As I'm not able to reproduce it on my system with your disk image, could you
> try to reproduce with these packages:
> 
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12965632

As communicated in IRC, I have re-test this scenario 5 times with the qemu-kvm-rhev packages that you specified and reproduced once.

Comment 20 Laurent Vivier 2017-04-07 10:16:24 UTC

I've been able to reproduce the problem with xianwang's hosts.

It happens because it's a trans-oceanic migration (one host in Beijing, the other in BOS) and latency is very high and the post-copy migration may take several minutes.

Comment 21 Laurent Vivier 2017-04-07 18:01:10 UTC

about SEGV/ABORT

qemu-system-x86_64: VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0xb
qemu-system-x86_64: Failed to load virtio-rng:virtio
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-rng'
qemu-system-x86_64: load of migration failed: Operation not permitted
qemu-system-x86_64: /root/qemu-src/migration/qemu-file.c:441: qemu_peek_buffer: Assertion `!qemu_file_is_writable(f)' failed.

Program received signal SIGABRT, Aborted.

Message "load of migration failed" comes from "process_incoming_migration_co()"

...
    qemu_fclose(f);
    free_xbzrle_decoded_buf();

    if (ret < 0) {
        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                          MIGRATION_STATUS_FAILED);
        error_report("load of migration failed: %s", strerror(-ret));
        migrate_decompress_threads_join();
        exit(EXIT_FAILURE);
    }
...

and the ABORT() comes from qemu_peek_buffer(), postcopy_ram_listen_thread().

The both threads are using the same QEMUFile pointer, but while the first one (process_incoming_migration_co) closes it, the second one try to continue to use it. And as the memory is freed by qemu_fclose(), qemu_peek_buffer() uses a pointer to a corrupted memory. 

Moving the qemu_fclose() after the exit(EXIT_FAILURE) fixes this problem:

qemu-system-x86_64: VQ 0 size 0x8 < last_avail_idx 0x6 - used_idx 0x7
qemu-system-x86_64: Failed to load virtio-rng:virtio
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-rng'
process_incoming_migration_co 0x55555662d2e0
qemu-system-x86_64: load of migration failed: Operation not permitted
qemu-system-x86_64: error while loading state section id 3(ram)
qemu-system-x86_64: postcopy_ram_listen_thread: loadvm failed: -1

Comment 22 Dr. David Alan Gilbert 2017-04-07 18:50:59 UTC

(In reply to Laurent Vivier from comment #21)
> about SEGV/ABORT
> 
> qemu-system-x86_64: VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0xb
> qemu-system-x86_64: Failed to load virtio-rng:virtio
> qemu-system-x86_64: error while loading state for instance 0x0 of device
> '0000:00:05.0/virtio-rng'
> qemu-system-x86_64: load of migration failed: Operation not permitted
> qemu-system-x86_64: /root/qemu-src/migration/qemu-file.c:441:
> qemu_peek_buffer: Assertion `!qemu_file_is_writable(f)' failed.
> 
> Program received signal SIGABRT, Aborted.
> 
> Message "load of migration failed" comes from
> "process_incoming_migration_co()"
> 
> ...
>     qemu_fclose(f);
>     free_xbzrle_decoded_buf();
> 
>     if (ret < 0) {
>         migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                           MIGRATION_STATUS_FAILED);
>         error_report("load of migration failed: %s", strerror(-ret));
>         migrate_decompress_threads_join();
>         exit(EXIT_FAILURE);
>     }
> ...
> 
> and the ABORT() comes from qemu_peek_buffer(), postcopy_ram_listen_thread().
> 
> The both threads are using the same QEMUFile pointer, but while the first
> one (process_incoming_migration_co) closes it, the second one try to
> continue to use it. And as the memory is freed by qemu_fclose(),
> qemu_peek_buffer() uses a pointer to a corrupted memory. 
> 
> Moving the qemu_fclose() after the exit(EXIT_FAILURE) fixes this problem:
> 
> qemu-system-x86_64: VQ 0 size 0x8 < last_avail_idx 0x6 - used_idx 0x7
> qemu-system-x86_64: Failed to load virtio-rng:virtio
> qemu-system-x86_64: error while loading state for instance 0x0 of device
> '0000:00:05.0/virtio-rng'
> process_incoming_migration_co 0x55555662d2e0
> qemu-system-x86_64: load of migration failed: Operation not permitted
> qemu-system-x86_64: error while loading state section id 3(ram)
> qemu-system-x86_64: postcopy_ram_listen_thread: loadvm failed: -1

Hmm ok; the theory is that shouldn't happen :-)
What's supposed to happen is that if the listen thread has started up OK
then the cleanup should be done at the end of that thread rather than at that point.
If you look a few lines higher up in process_incoming_migration_co you'll see a trace_process_incoming_migration_co_end and if we're not in 'advise' state then we should exit incoming_migration_co there and let the other thread cleanup.
Given you've got a listen thread I'd expect to be in POSTCOPY_INCOMING_LISTENING state - but perhaps the ordering of setting that is wrong somewhere.
(Anyway if you want to worry about the virtio-rng side of it, I cna take that cleanup case)

Comment 23 Laurent Vivier 2017-04-07 19:29:39 UTC

About the "VQ 0 size 0x8 < last_avail_idx 0xa - used_idx 0xb"

This happens because the rng-random backend continues to call the virtio-rng chr_read() function (calling virtqueue_pop()/virtqueue_push()) while the post-copy migration has started when there is available entropy from /dev/random. And at this point, the vmstate of virtio-rng (last_avail_idx) has been migrated so we should not modify the RAM part (used_idx) of the virtqueue as this will generate an inconsistency between the part of virtqueue migrated within the vmstate and the part of virtqueue migrated within the RAM.

This seems to fix the problem:

diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
index 9639f4e..a46d8ad 100644
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -18,6 +18,7 @@
 #include "sysemu/rng.h"
 #include "qom/object_interfaces.h"
 #include "trace.h"
+#include "migration/migration.h"
 
 static bool is_guest_ready(VirtIORNG *vrng)
 {
@@ -53,6 +54,14 @@ static void chr_read(void *opaque, const void *buf, size_t si
         return;
     }
 
+    /* we can't modify the virtqueue while the vmstate
+     * has been migrated but the part in RAM is not.
+     */
+
+    if (migration_in_postcopy(migrate_get_current())) {
+        return;
+    }
+
     vrng->quota_remaining -= size;
 
     offset = 0;

Comment 24 Dr. David Alan Gilbert 2017-04-07 19:49:01 UTC

Great; I think it should be possible to make it more general - i.e. never do it when the CPU is stopped.

Comment 26 Laurent Vivier 2017-04-10 09:42:16 UTC

(In reply to Dr. David Alan Gilbert from comment #24)
> Great; I think it should be possible to make it more general - i.e. never do
> it when the CPU is stopped.

Something like:

diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -53,6 +53,10 @@ static void chr_read(void *opaque, const void *buf, size_t size)
         return;
     }
 
+    if (!runstate_check(RUN_STATE_RUNNING)) {
+        return;
+    }
+
     vrng->quota_remaining -= size;
 
     offset = 0;

Comment 28 Laurent Vivier 2017-04-11 11:58:58 UTC

As suggested by dgilbert, I've been able to reproduce the problem with only one host by adding latency with "tc":

$ ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.030 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.009 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.008 ms
$ sudo tc qdisc add dev lo root netem delay 500ms
qdisc netem 8004: dev lo root refcnt 2 limit 1000 delay 1.0s
$ ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=1000 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=1000 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=1000 ms
64 bytes from localhost (127.0.0.1): icmp_seq=4 ttl=64 time=1000 ms

We can remove the latency with:

$ sudo tc qdisc del dev lo root netem

There is also a pre-requisite in the guest: the "Hardware RNG Entropy Gatherer Daemon" must be running.

$ yum install rng-tools
$ systemctl start rngd

Comment 29 Laurent Vivier 2017-04-14 10:53:06 UTC

Patch sent upstream:

http://patchwork.ozlabs.org/patch/750019/

Comment 30 Miroslav Rezanina 2017-04-28 10:07:44 UTC

Fix included in qemu-kvm-rhev-2.9.0-2.el7

Comment 32 huiqingding 2017-05-10 09:18:02 UTC

I test qemu-kvm-rhev-2.9.0-3.el7.x86_64 on two x86 hosts, do postcopy migration 7.3.z->7.4, the guest has four numa nodes, bind them to host node0, attach virtio-serial and virtio-png. After postcopy migration, DST qemu quits with error:
(qemu) 2017-05-10T07:29:00.588468Z qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x4b - used_idx 0x4c
2017-05-10T07:29:00.588513Z qemu-kvm: Failed to load virtio-rng:virtio
2017-05-10T07:29:00.588521Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:1c.0/virtio-rng'
2017-05-10T07:29:00.588595Z qemu-kvm: load of migration failed: Operation not permitted
2017-05-10T07:29:00.588627Z qemu-kvm: error while loading state section id 2(ram)

Test steps:
0. check node0 free mem size on source 7.3.z, the free mem size > 8192M
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 16259 MB
node 0 free: 12834 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 16384 MB
node 1 free: 6650 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 
1. boot a guest on source 7.3.z host
/usr/libexec/qemu-kvm \
-name rhel7 \
-machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off \
-cpu SandyBridge,check \
-m 8192   \
-smp 8,cores=1,threads=1,sockets=8  \
-object memory-backend-ram,policy=bind,id=mem-0,size=1024M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-0,cpus=0,cpus=1 \
-object memory-backend-ram,policy=bind,id=mem-1,size=1024M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-1,cpus=2-3 \
-object memory-backend-ram,policy=bind,id=mem-2,size=2048M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-2,cpus=4 \
-object memory-backend-ram,policy=bind,id=mem-3,size=4096M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-3,cpus=5-7 \
-drive file=/mnt/stable_guest_abi/rhel74-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=f65effa5-90a6-47f2-8487-a9f64c95d4f5,cache=none,discard=unmap,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-net none \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 \
-device virtio-rng-pci,id=rng0,bus=pci.0,addr=0x1c \
-monitor stdio \
-vga qxl \
-vnc :1 \
2. do stress inside guest
# stressapptest -M 100 -s 1000
3. check node0 free mem size on source 7.4,the free mem size > 8192M
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 16349 MB
node 0 free: 15370 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 16383 MB
node 1 free: 13005 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
4. boot guest on the destination 7.4 host, adding "-incoming tcp:0:5800"
5. do postcopy migration
(qemu)migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.73.196.175:5800
(qemu) migrate_start_postcopy

Result:
after step5, Dst qemu-kvm quits with error
(qemu) red_dispatcher_loadvm_commands: 
qemu-kvm: VQ 0 size 0x8 < last_avail_idx 0x4a - used_idx 0x4b
qemu-kvm: Failed to load virtio-rng:virtio
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:1c.0/virtio-rng'
qemu-kvm: load of migration failed: Operation not permitted
qemu-kvm: error while loading state section id 2(ram)


Additional test:
1. Bind the guest numa node to host node1 as following, do the same postcopy migration, migration can be finished normally.
-m 8192   \
-smp 8,cores=1,threads=1,sockets=8  \
-object memory-backend-ram,policy=bind,id=mem-0,size=1024M,prealloc=yes,host-nodes=1 -numa node,memdev=mem-0,cpus=0,cpus=1 \
-object memory-backend-ram,policy=bind,id=mem-1,size=1024M,prealloc=yes,host-nodes=1 -numa node,memdev=mem-1,cpus=2-3 \
-object memory-backend-ram,policy=bind,id=mem-2,size=2048M,prealloc=yes,host-nodes=1 -numa node,memdev=mem-2,cpus=4 \
-object memory-backend-ram,policy=bind,id=mem-3,size=4096M,prealloc=yes,host-nodes=1 -numa node,memdev=mem-3,cpus=5-7 \

2. Remove virtio-serial from the command line, bind the guest numa node to host node0, do postcopy migration, migration can be finished normally.

The packages version:
7.3.z host:
kernel-3.10.0-514.18.1.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
7.4 host:
kernel-3.10.0-664.el7.x86_64

CPU info:
7.3.z host:
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               2400.000
BogoMIPS:              4801.43
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31

7.4 host:
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
Stepping:              4
CPU MHz:               2000.000
CPU max MHz:           2000.0000
CPU min MHz:           1200.0000
BogoMIPS:              3990.50
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts
qemu-kvm-rhev-2.9.0-3.el7.x86_64

Comment 33 Dr. David Alan Gilbert 2017-05-10 09:54:03 UTC

(In reply to huiqingding from comment #32)
> I test qemu-kvm-rhev-2.9.0-3.el7.x86_64 on two x86 hosts, do postcopy
> migration 7.3.z->7.4, the guest has four numa nodes, bind them to host
> node0, attach virtio-serial and virtio-png. After postcopy migration, DST
> qemu quits with error:
> (qemu) 2017-05-10T07:29:00.588468Z qemu-kvm: VQ 0 size 0x8 < last_avail_idx
> 0x4b - used_idx 0x4c
> 2017-05-10T07:29:00.588513Z qemu-kvm: Failed to load virtio-rng:virtio
> 2017-05-10T07:29:00.588521Z qemu-kvm: error while loading state for instance
> 0x0 of device '0000:00:1c.0/virtio-rng'
> 2017-05-10T07:29:00.588595Z qemu-kvm: load of migration failed: Operation
> not permitted
> 2017-05-10T07:29:00.588627Z qemu-kvm: error while loading state section id
> 2(ram)

The bug is on the source-side, since it's fixed in 7.4, you still hit it coming from 7.3.z

Dave

Comment 39 xianwang 2017-05-24 10:04:23 UTC

This bug is verify pass for powerpc both on qemu-kvm-rhev-2.6.0-28.el7_3.10.ppc64le and qemu-kvm-rhev-2.9.0-2.el7.ppc64le. 
Bug verification is as below:
Host version:
(host1)
RHEL-7.3-20161019.0 Server ppc64le
kernel-3.10.0-514.17.1.el7
qemu-kvm-rhev-2.6.0-28.el7_3.10.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
(host2)
RHEL-7.4-20170519.n.0 Server ppc64le
3.10.0-623.el7.ppc64le
qemu-kvm-rhev-2.9.0-2.el7.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch

Guest:
RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso
3.10.0-514.21.1.el7.ppc64le
mem: 4G
vcpu: 4

scenario I:
1)steps are same with bug description.
2)result:
migration completed and vm works well.
I have test this scenario for 20 times, and all the result are pass.

scenario II:
Refering to comment 32, I have changed the qemu cli to boot guest:
/usr/libexec/qemu-kvm \
    -name 'vm73z'  \
    -sandbox off  \
    -machine pseries-rhel7.3.0 \
    -nodefaults  \
    -vga std \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial0,server,nowait \
    -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \
    -chardev socket,path=/tmp/virtio_port-vs-20170207-030401-FfusAC1v,nowait,id=idQdLRHP,server \
    -device virtserialport,id=idBu8FQH,name=vs,bus=virtio_serial_pci0.0,chardev=idQdLRHP \
    -object rng-random,filename=/dev/random,id=passthrough-rOXjKxaC \
    -device virtio-rng-pci,id=virtio-rng-pci-GVn8yzUA,rng=passthrough-rOXjKxaC,bus=pci.0,addr=05 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=07 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/root/rhel73-ppc64le-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,bootindex=0 \
    -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=08  \
    -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 4096 \
    -smp 8,cores=1,threads=1,sockets=8 \
    -object memory-backend-ram,policy=bind,id=mem-0,size=1024M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-0,cpus=0,cpus=1 \
    -object memory-backend-ram,policy=bind,id=mem-1,size=1024M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-1,cpus=2-3 \
    -object memory-backend-ram,policy=bind,id=mem-2,size=2048M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-2,cpus=4-7 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -device usb-mouse,id=input1,bus=usb1.0,port=2 \
    -device usb-kbd,id=input2,bus=usb1.0,port=3 \
    -vnc :1  \
    -qmp tcp:0:8881,server,nowait \
    -monitor stdio \
    -rtc base=utc,clock=host  \
    -boot order=cdn,once=c,menu=on,strict=on  \
    -enable-kvm \
    -watchdog i6300esb \
    -watchdog-action reset \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=09
other steps are same with scenario I.
result:
migration completed and vm works well.

So, this bug is fixed for ppc , huiqing, could you help to verify this bug for x86_64? thanks

Comment 40 huiqingding 2017-05-24 10:54:53 UTC

Test 7.3.z->7.4 postcopy migration as comment #32, migration can be finished. 
7.3.z host:
kernel-3.10.0-514.24.1.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64

7.4 host:
kernel-3.10.0-671.el7.x86_64
qemu-kvm-rhev-2.9.0-6.el7.x86_64

The command line is as following:
/usr/libexec/qemu-kvm \
-name rhel7 \
-S -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off \
-realtime mlock=off \
-cpu SandyBridge,check \
-sandbox off \
-m 7680   \
-smp 8,cores=1,threads=1,sockets=8  \
-object memory-backend-ram,policy=bind,id=mem-0,size=512M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-0,cpus=0,cpus=1 \
-object memory-backend-ram,policy=bind,id=mem-1,size=1024M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-1,cpus=2-3 \
-object memory-backend-ram,policy=bind,id=mem-2,size=2048M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-2,cpus=4 \
-object memory-backend-ram,policy=bind,id=mem-3,size=4096M,prealloc=yes,host-nodes=0 -numa node,memdev=mem-3,cpus=5-7 \
-uuid 49a3438a-70a3-4ba8-92ce-3a05e0934608 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=discard \
-no-hpet \
-no-shutdown \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-boot order=c,menu=on,splash-time=3000,strict=on \
-device ich9-usb-ehci1,id=usb0,bus=pci.0,addr=0x5.0x7 \
-device ich9-usb-uhci1,masterbus=usb0.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 \
-device ich9-usb-uhci2,masterbus=usb0.0,firstport=2,bus=pci.0,addr=0x5.0x1 \
-device ich9-usb-uhci3,masterbus=usb0.0,firstport=4,bus=pci.0,addr=0x5.0x2 \
-device virtio-scsi-pci,id=scsi0,cmd_per_lun=234,bus=pci.0,addr=0x8 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 \
-drive file=/usr/share/virtio-win/virtio-win-1.9.0.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= \
-device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 \
-drive file=/usr/share/virtio-win/virtio-win_x86.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none \
-global isa-fdc.driveA=drive-fdc0-0-0 \
-drive file=/mnt/stable_guest_abi/rhel74-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=f65effa5-90a6-47f2-8487-a9f64c95d4f5,cache=none,discard=unmap,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-drive file=/mnt/virtio-scsi-disk,if=none,id=drive-scsi-disk,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-scsi-pci,id=scsi1,addr=0x13 \
-device scsi-hd,drive=drive-scsi-disk,bus=scsi1.0,id=data-disk2 \
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \
-chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/rhel7.org.qemu.guest_agent.0,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 \
-device usb-tablet,id=input0 \
-device intel-hda,id=sound0,bus=pci.0,addr=0x4 \
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \
-device intel-hda,id=sound1,bus=pci.0,addr=0x6 \
-device hda-micro,id=sound1-codec0,bus=sound1.0 \
-device intel-hda,id=sound2,bus=pci.0,addr=0x17 \
-device hda-output,id=sound2-codec0,bus=sound2.0,cad=0 \
-device ich9-intel-hda,id=sound3,bus=pci.0,addr=0x18 \
-device hda-duplex,id=sound3-codec0,bus=sound3.0,cad=0 \
-chardev spicevmc,id=charredir0,name=usbredir \
-device usb-redir,chardev=charredir0,id=redir0 \
-chardev spicevmc,id=charredir1,name=usbredir \
-device usb-redir,chardev=charredir1,id=redir1 \
-chardev spicevmc,id=charredir2,name=usbredir \
-device usb-redir,chardev=charredir2,id=redir2 \
-chardev spicevmc,id=charredir3,name=usbredir \
-device usb-redir,chardev=charredir3,id=redir3 \
-device usb-host,id=hostdev0 \
-device pvpanic,ioport=1285 -msg timestamp=on \
-netdev tap,id=hostnet0,vhost=on,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=b6:af:42:c8:66:18,bus=pci.0,addr=0x14 \
-netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup  \
-device e1000,netdev=hostnet1,id=virtio-net-pci1,mac=b6:2f:a8:85:82:7c,bus=pci.0,addr=0x15,multifunction=off \
-netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup \
-device rtl8139,netdev=hostnet2,id=virtio-net-pci2,mac=4e:63:28:bc:c1:75,bus=pci.0,addr=0x16,multifunction=off \
-netdev tap,id=hostnet3,vhost=on,script=/etc/qemu-ifup \
-device e1000-82540em,netdev=hostnet3,id=virtio-net-pci3,mac=4e:63:28:bc:c1:65,bus=pci.0,addr=0x1e,multifunction=off \
-netdev tap,id=hostnet4,vhost=on,script=/etc/qemu-ifup \
-device e1000e,netdev=hostnet4,id=virtio-net-pci4,mac=4e:63:28:bc:c1:85,bus=pci.0,addr=0x19,multifunction=off \
-drive file=/mnt/ide-disk,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop,copy-on-read=off,serial=fux-ide,media=disk \
-device ide-hd,drive=drive-data-disk,id=system-disk,logical_block_size=512,physical_block_size=512,min_io_size=32,opt_io_size=64,discard_granularity=512,ver=fuxc-ver,bus=ide.0,unit=0  \
-device ich9-usb-uhci6,id=uhci6,bus=pci.0,addr=0xa \
-device usb-kbd,id=kdb0,bus=uhci6.0 \
-device ich9-usb-uhci5,id=uhci5,bus=pci.0,addr=0xb \
-device usb-mouse,id=mouse0,bus=uhci5.0 \
-device nec-usb-xhci,id=xhci,bus=pci.0,addr=0xd \
-device usb-bot,id=bot1,bus=xhci.0 \
-drive file=/mnt/virtio-win-126.iso,if=none,id=usb-cdrom1,format=raw \
-device scsi-cd,bus=bot1.0,scsi-id=0,lun=1,drive=usb-cdrom1,id=usb-cdrom1 \
-drive file=/mnt/bot-disk1,id=usb-disk1,if=none,format=qcow2 \
-device scsi-hd,bus=bot1.0,scsi-id=0,lun=0,drive=usb-disk1,id=usb-disk1 \
-device usb-ehci,id=ehci,bus=pci.0,addr=0xe \
-device usb-bot,id=bot2,bus=ehci.0 \
-drive file=/mnt/virtio-win-126.iso,if=none,id=usb-cdrom2,format=raw \
-device scsi-cd,bus=bot2.0,scsi-id=0,lun=1,drive=usb-cdrom2,id=usb-cdrom2 \
-drive file=/mnt/bot-disk2,id=usb-disk2,if=none,format=qcow2 \
-device scsi-hd,bus=bot2.0,scsi-id=0,lun=0,drive=usb-disk2,id=usb-disk2 \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0xf \
-device usb-bot,id=bot3,bus=usb.0 \
-drive file=/mnt/virtio-win-126.iso,if=none,id=usb-cdrom3,format=raw \
-device scsi-cd,bus=bot3.0,scsi-id=0,lun=1,drive=usb-cdrom3,id=usb-cdrom3 \
-drive file=/mnt/bot-disk3,id=usb-disk3,if=none,format=qcow2 \
-device scsi-hd,bus=bot3.0,scsi-id=0,lun=0,drive=usb-disk3,id=usb-disk3 \
-device ich9-usb-uhci3,id=uhci,bus=pci.0,addr=0x10 \
-device usb-storage,drive=drive-usb-0,id=usb-0,removable=on,bus=uhci.0,port=1 \
-drive file=/mnt/usb-uhci,if=none,id=drive-usb-0,media=disk,format=qcow2 \
-device ich9-usb-ehci1,id=ehci1,bus=pci.0,addr=0x11 \
-device usb-storage,drive=drive-usb-1,id=usb-1,removable=on,bus=ehci1.0,port=1 \
-drive file=/mnt/usb-ehci,if=none,id=drive-usb-1,media=disk,format=qcow2 \
-device nec-usb-xhci,id=xhci1,bus=pci.0,addr=0x12 \
-device usb-storage,drive=drive-usb-2,id=usb-2,removable=on,bus=xhci1.0,port=1 \
-drive file=/mnt/usb-xhci,if=none,id=drive-usb-2,media=disk,format=qcow2 \
-watchdog ib700 -watchdog-action reset \
-device virtio-rng-pci,id=rng0,bus=pci.0,addr=0x1c \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x1d \
-monitor stdio \
-qmp tcp:0:4466,server,nowait -serial unix:/tmp/ttym,server,nowait \
-spice port=5910,addr=0.0.0.0,disable-ticketing,seamless-migration=on \
-k en-us \
-device qxl-vga,id=video0,ram_size=134217728,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 \
-chardev socket,id=seabioslog_id,path=/var/tmp/seabios,server,nowait \
-device isa-debugcon,chardev=seabioslog_id,iobase=0x402 \

Comment 41 huiqingding 2017-05-24 10:55:43 UTC

Based on comment #39 and #40, set this bug to be verified.

Comment 43 errata-xmlrpc 2017-08-02 03:39:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392