RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1549087 - Migration 2.6-2.9 to master (2.11.1+) fails on ppc64 host
Summary: Migration 2.6-2.9 to master (2.11.1+) fails on ppc64 host
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.6
Hardware: ppc64
OS: Unspecified
medium
high
Target Milestone: rc
: ---
Assignee: David Gibson
QA Contact: xianwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-26 11:28 UTC by Lukáš Doktor
Modified: 2018-05-05 04:59 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-17 02:37:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Avocado-vt results of all migrate tests (6.89 MB, application/zip)
2018-02-26 11:28 UTC, Lukáš Doktor
no flags Details

Description Lukáš Doktor 2018-02-26 11:28:49 UTC
Created attachment 1400818 [details]
Avocado-vt results of all migrate tests

Description of problem:
Migration of machines created by latest released 2.6.x, 2.7.x, 2.8.x, 2.9.x fails to migrate to the latest master (2.11.1+, 72f1094b6cd8191a5ed842bc57cffeefae95ac1e and f0fa81767555fe2c4b5f8c9e0725a80eac1d7f56) fails with:

09:07:12 INFO | [qemu output] qemu-system-ppc64: load of migration failed: Invalid argument
09:07:12 INFO | [qemu output] qemu-system-ppc64: socket_writev_buffer: Got err=104 for (32768/18446744073709551615)
09:07:12 INFO | [qemu output] (Process terminated with status 1)

Version-Release number of selected component (if applicable):
host: latest nightly RHEL.7, ppc64(be) POWER8
host-src-qemu: custom-compiled qemu (tag) v2.6.2, v2.7.1, v2.8.1.1, v2.9.1
host-dst-qemu: custom-compiled qemu 72f1094b6cd8191a5ed842bc57cffeefae95ac1e or f0fa81767555fe2c4b5f8c9e0725a80eac1d7f56
guest: latest nightly RHEL.7 both ppc64(be) and ppc64le

How reproducible:
always

Steps to Reproduce:
1. Run a vm using any of host-src-qemu
2. initiate migration via tcp ("-incoming tcp:0:5200" "migrate -d tcp:localhost:5200")
3. See "info migrate" which reports "active" several times

Actual results:
09:07:12 INFO | [qemu output] qemu-system-ppc64: Unknown savevm section or instance 'icp/server' 12
09:07:12 INFO | [qemu output] qemu-system-ppc64: load of migration failed: Invalid argument
09:07:12 INFO | [qemu output] qemu-system-ppc64: socket_writev_buffer: Got err=104 for (32768/18446744073709551615)
09:07:12 INFO | [qemu output] (Process terminated with status 1)
09:07:13 DEBUG| Waiting for migration to complete (12.025059 secs)
09:07:13 DEBUG| (monitor avocado-vt-vm1.hmp1) Sending command 'info migrate' 
09:07:13 DEBUG| Send command: info migrate
09:07:13 DEBUG| (monitor avocado-vt-vm1.hmp1) Response to 'info migrate'
09:07:13 DEBUG| (monitor avocado-vt-vm1.hmp1)    capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
09:07:13 DEBUG| (monitor avocado-vt-vm1.hmp1)    Migration status: failed
09:07:13 DEBUG| (monitor avocado-vt-vm1.hmp1)    total time: 0 milliseconds

Expected results:
No error messages, "info migrate" should report "Migration status: completed" and the machine should be accessible.

Additional info:
* migration master->old_qemu works well
* migration old_qemu->master works well on ppc64le host

Comment 3 Laurent Vivier 2018-02-26 15:22:49 UTC
(In reply to Lukas Doktor from comment #0)

> 09:07:12 INFO | [qemu output] qemu-system-ppc64: Unknown savevm section or
> instance 'icp/server' 12
> 09:07:12 INFO | [qemu output] qemu-system-ppc64: load of migration failed:
> Invalid argument
> 09:07:12 INFO | [qemu output] qemu-system-ppc64: socket_writev_buffer: Got
> err=104 for (32768/18446744073709551615)

Looks like the problem already fixed by following commit, but it doesn't explain why it doesn't work with BE host:

commit 46f7afa3709664c7fbc643b2221fd27d5d7762d3
Author: Greg Kurz <groug>
Date:   Wed Jun 14 15:29:19 2017 +0200

    spapr: fix migration of ICPState objects from/to older QEMU
    
    Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
    sPAPRCPUCore") moved ICPState objects from the machine to CPU cores.
    This is an improvement since we no longer allocate ICPState objects
    that will never be used. But it has the side-effect of breaking
    migration of older machine types from older QEMU versions.
    
    This patch allows spapr to register dummy "icp/server" entries to vmstate.
    These entries use a dedicated VMStateDescription that can swallow and
    discard state of an incoming migration stream, and that don't send anything
    on outgoing migration.
    
    As for real ICPState objects, the instance_id is the cpu_index of the
    corresponding vCPU, which happens to be equal to the generated instance_id
    of older machine types.
    
    The machine can unregister/register these entries when CPUs are dynamically
    plugged/unplugged.
    
    This is only available for pseries-2.9 and older machines, thanks to a
    compat property.
    
    Signed-off-by: Greg Kurz <groug>
    Signed-off-by: David Gibson <david.id.au>

Comment 4 Laurent Vivier 2018-02-26 16:55:20 UTC
(In reply to Lukas Doktor from comment #0)
> Created attachment 1400818 [details]
...
> * migration old_qemu->master works well on ppc64le host

According to logs in attachment it also fails with ppc64le host:

latest/test-results/13-2-6
latest/test-results/14-2-6
latest/test-results/15-2-7
latest/test-results/16-2-7
latest/test-results/17-2-8
latest/test-results/18-2-8
latest/test-results/19-2-9
latest/test-results/20-2-9

Comment 5 Laurent Vivier 2018-02-26 17:44:53 UTC
I think it happens because we have changed the way we compute the number of XICS servers:

46f7afa370 spapr: fix migration of ICPState objects from/to older QEMU

  static inline int xics_max_server_number(void)
  {
      return DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads);
  }

Now it is:

72194664c8 spapr: use spapr->vsmt to compute VCPU ids

  static int xics_max_server_number(sPAPRMachineState *spapr)
  {
      return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads);
  }

Comment 6 Greg Kurz 2018-02-27 11:51:38 UTC
(In reply to Laurent Vivier from comment #5)
> I think it happens because we have changed the way we compute the number of
> XICS servers:
> 
> 46f7afa370 spapr: fix migration of ICPState objects from/to older QEMU
> 
>   static inline int xics_max_server_number(void)
>   {
>       return DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads);
>   }
> 
> Now it is:
> 
> 72194664c8 spapr: use spapr->vsmt to compute VCPU ids
> 
>   static int xics_max_server_number(sPAPRMachineState *spapr)
>   {
>       return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads);
>   }

Hi Laurent,

Indeed but with a standard POWER8 host, both should be equal to 8... the
problem is that xics_max_server_number() is first called before spapr->vsmt
is set

spapr_machine_init()
 xics_system_init()
  xics_max_server_number() <== return 0 because spapr->vsmt is still 0

and we don't create the dummy ICPs that are needed when we migrate a 2.9 or
older machine type, hence the "unknown savevm section" error.

So we really want the VSMT mode to be set before setting up XICS.

Comment 8 David Gibson 2018-03-07 01:50:21 UTC
Lukáš,

Peter Maydell just merged my last pull request which has a fix I believe will address this.  Can you retry with the current git (either master or ppc-for-2.12 branches should be ok).

Comment 13 David Gibson 2018-03-08 05:57:51 UTC
I've had a look at this, and the bug as originally described can't be reproduced upstream, because we don't have the earlier patch that caused it.

I thought we might hit other problems related to the lack of that earlier patch, but I had a look and wasn't able to reproduce the problems.

So, I'm dropping the rhel-7.5.z and 0day flags.  I'm keeping 7.6, so that we recheck after rebased that we don't end up with this upstream bug.

Comment 14 Lukáš Doktor 2018-03-08 07:25:18 UTC
Hello guys,

this week the bug reproduced on both, ppc64 and ppc64le host so previously on the ppc64le older qemu was perhaps used.

Sorry for the confusion,
Lukáš

Comment 15 xianwang 2018-03-10 06:17:58 UTC
Hi, Lukas and David,
I have failed to reproduce this issue on power8 ppc64le host with ppc64le guest, the following is my test steps, could you give some advice to reproduce it? thanks
version:
src:
3.10.0-693.el7.ppc64le
qemu-2.9.1 (source code)

dst:
3.10.0-856.el7.ppc64le
qemu-2.11.1 (source code)

step:
1.Compile qemu package for src and dst host
src:
#wget https://download.qemu.org/qemu-2.9.1.tar.xz
#tar xvJf qemu-2.9.1.tar.xz
#cd qemu-2.9.1
# ./configure --target-list=ppc64-softmmu
#make
#make install

dst:
#wget https://download.qemu.org/qemu-2.11.1.tar.xz
#tar xvJf qemu-2.11.1.tar.xz
#cd qemu-2.11.1
# ./configure --target-list=ppc64-softmmu
#make
#make install

2.On src host, boot a rhel7.4 guest with the following qemu cli
/usr/local/bin/qemu-system-ppc64 \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-2.6 \
-nodefaults \
-vga std \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/rhel74-ppc64le-virtio-scsi.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=11 \
-netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 8G \
-smp 8 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :2 \
-qmp tcp:0:8882,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \

3.On dst host, boot guest with "-incoming tcp:0:5801" mode

4.On src host, do migration 
(qemu) migrate -d tcp:10.16.69.75:5801

5.check the migration status 
migration finish and vm work well
src:
(qemu) info migrate 
Migration status: completed

dst:
(qemu) info status 
VM status: running

Comment 16 David Gibson 2018-03-13 02:50:16 UTC
Xianxian,

I also tried to reproduce this problem and wasn't able.  I think the problem is only in the upstream qemu, not the downstream one.  That's why I removed the 7.5.z flag from this BZ.  I don't want to close the BZ yet, so that we don't accidentally introduce the bug downstream with future rebases and ports.

Comment 17 Lukáš Doktor 2018-03-13 14:10:59 UTC
This week the situation improved significantly and only master->2.6 fails with slightly different message:

Guest: ppc64/ppc64le
Host: Only ppc64 this week due to utilized machines, next week I'll get ppc64le results as well
  * kernel: 3.10.0-693.el7.ppc64
  * qemu: e4ae62b802cec437f877f2cadc4ef059cc0eca76 to 2.6.2

Comment 18 Lukáš Doktor 2018-03-13 15:12:59 UTC
Actual steps to reproduce:

cd /home/jenkins/ppc64le

cd qemu-2.6.2
git checkout v2.6.2
./configure --enable-kvm --block-drv-rw-whitelist=vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle
make -j 24

cd ../qemu-master
git checkout 22ef7ba8e8ce7fef297549b3defcac333742b804
./configure --enable-kvm --block-drv-rw-whitelist=vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle
make -j 24

avocado --show all run --vt-machine-type pseries --vt-arch ppc64le --vt-qemu-bin /home/jenkins/ppc64le/qemu-master/build/bin/qemu-system-ppc64 --vt-qemu-dst-bin /home/jenkins/ppc64le/qemu-2.6.2/build/bin/qemu-system-ppc64 --vt-extra-params machine_type=pseries-2.6 --vt-guest-os RHEL.7.devel -- migrate.default.tcp
...
|root: [qemu output] qemu-system-ppc64: error while loading state for instance 0x0 of device 'cpu'
root: [qemu output] qemu-system-ppc64: load of migration failed: Invalid argument
root: [qemu output] (Process terminated with status 1)
...
avocado.test: ERROR 1-type_specific.io-github-autotest-qemu.migrate.default.tcp -> VMDeadError: VM is dead    reason: 1    detail: "qemu-system-ppc64: error while loading state for instance 0x0 of device 'cpu'\nqemu-system-ppc64: load of migration failed: Invalid argument\n"
avocado.test: 
ERROR (154.19 s)

Comment 19 Qunfang Zhang 2018-03-14 01:03:03 UTC
Thanks Lucas for the reproducer, Xianxian could you pls give a try according to comment 18?

Comment 20 xianwang 2018-03-15 07:46:59 UTC
(In reply to Qunfang Zhang from comment #19)
> Thanks Lucas for the reproducer, Xianxian could you pls give a try according
> to comment 18?

Thanks for Lukas's help, I have reproduced this bug both for auto test and manual  test, the reproduction is as belows:
Host:
3.10.0-693.el7.ppc64le

Guest:
3.10.0-677.el7.ppc64le on an ppc64le

compile qemu (in one host compile two qemu):
# pwd
/home/xianwang
#git://git.qemu.org/qemu.git
#cd qemu
#git checkout v2.6.2
./configure --enable-kvm --block-drv-rw-whitelist=vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle
#make -j 24
# git log
commit 529d45e151d82a772cd9b9af64bb25f88fba6567
Author: Michael Roth <mdroth.ibm.com>
Date:   Thu Sep 29 14:57:09 2016 -0500

    Update version for 2.6.2 release
    
    Signed-off-by: Michael Roth <mdroth.ibm.com>

#cd ..
#mkdir qemu-master
#cd qemu-master
#git checkout 22ef7ba8e8ce7fef297549b3defcac333742b804
./configure --enable-kvm --block-drv-rw-whitelist=vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle
#make -j 24
# git log
commit 22ef7ba8e8ce7fef297549b3defcac333742b804
Merge: 834eddf 02f769b
Author: Peter Maydell <peter.maydell>
Date:   Tue Mar 13 11:42:45 2018 +0000

    Merge remote-tracking branch 'remotes/famz/tags/staging-pull-request' into staging
    
    docker patches
    
2.test qemu cmd
# /home/xianwang/qemu-master/qemu/ppc64-softmmu/qemu-system-ppc64
VNC server running on ::1:5900

# /home/xianwang/qemu/ppc64-softmmu/qemu-system-ppc64
VNC server running on '::1:5901'

manual test:
1.Boot a guest on newest upstream qemu with the following qemu cli:
# cat log.sh 
/home/xianwang/qemu-master/qemu/ppc64-softmmu/qemu-system-ppc64 \
    -name 'avocado-vt-vm1' \
    -machine pseries-2.6  \
    -nodefaults  \
    -vga std  \
    -chardev socket,id=hmp_id_hmp1,path=/var/tmp/socket1,server,nowait \
    -mon chardev=hmp_id_hmp1,mode=readline  \
    -chardev socket,id=hmp_id_catch_monitor,path=/var/tmp/socket2,server,nowait \
    -mon chardev=hmp_id_catch_monitor,mode=readline  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/console1,server,nowait \
    -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -drive id=drive_image1,if=none,format=qcow2,file=/home/xianwang/rhel74-ppc64le-virtio-scsi.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -device virtio-net-pci,mac=9a:8b:8c:8d:8e:8f,id=idjlQN53,vectors=4,netdev=idjlQN53,bus=pci.0,addr=0x5  \
    -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 1024  \
    -smp 12,maxcpus=12,cores=1,threads=1,sockets=12 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -monitor stdio \
    -rtc base=utc,clock=host  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm
# sh log.sh

2.boot a dst guest on qemu-2.6.2
# cat log_d.sh 
/home/xianwang/qemu/ppc64-softmmu/qemu-system-ppc64 \
    -name 'avocado-vt-vm1' \
    -machine pseries-2.6  \
    -nodefaults  \
    -vga std  \
    -chardev socket,id=hmp_id_hmp1,path=/var/tmp/socket1,server,nowait \
    -mon chardev=hmp_id_hmp1,mode=readline  \
    -chardev socket,id=hmp_id_catch_monitor,path=/var/tmp/socket2,server,nowait \
    -mon chardev=hmp_id_catch_monitor,mode=readline  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/console1,server,nowait \
    -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -drive id=drive_image1,if=none,format=qcow2,file=/home/xianwang/rhel74-ppc64le-virtio-scsi.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -device virtio-net-pci,mac=9a:8b:8c:8d:8e:8f,id=idjlQN53,vectors=4,netdev=idjlQN53,bus=pci.0,addr=0x5  \
    -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 1024  \
    -smp 12,maxcpus=12,cores=1,threads=1,sockets=12 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -incoming tcp:0:5801 \
    -monitor stdio \
    -rtc base=utc,clock=host  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm

3.do migration from newest qemu to 2.6.2
(qemu) migrate -d tcp:127.0.0.1:5801

4.result:
migration failed and vm is running on src end.
src:
(qemu) info migrate
Migration status: failed
total time: 0 milliseconds
(qemu) info status 
VM status: running

dst:
(qemu) qemu-system-ppc64: error while loading state for instance 0x0 of device 'cpu'
qemu-system-ppc64: load of migration failed: Invalid argument

Comment 21 xianwang 2018-03-15 07:58:42 UTC
reproduction via auto test:
Host:
3.10.0-693.el7.ppc64le

Guest:
3.10.0-677.el7.ppc64le on an ppc64le

compile qemu (in one host compile two qemu):
# pwd
/home/xianwang
#git://git.qemu.org/qemu.git
#cd qemu
#git checkout v2.6.2
./configure --enable-kvm --block-drv-rw-whitelist=vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle
#make -j 24
# git log
commit 529d45e151d82a772cd9b9af64bb25f88fba6567
Author: Michael Roth <mdroth.ibm.com>
Date:   Thu Sep 29 14:57:09 2016 -0500

    Update version for 2.6.2 release
    
    Signed-off-by: Michael Roth <mdroth.ibm.com>

#cd ..
#mkdir qemu-master
#cd qemu-master
#git://git.qemu.org/qemu.git
#git checkout 22ef7ba8e8ce7fef297549b3defcac333742b804
./configure --enable-kvm --block-drv-rw-whitelist=vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle
#make -j 24
# git log
commit 22ef7ba8e8ce7fef297549b3defcac333742b804
Merge: 834eddf 02f769b
Author: Peter Maydell <peter.maydell>
Date:   Tue Mar 13 11:42:45 2018 +0000

    Merge remote-tracking branch 'remotes/famz/tags/staging-pull-request' into staging
    
    docker patches
    
2.test qemu cmd
# /home/xianwang/qemu-master/qemu/ppc64-softmmu/qemu-system-ppc64
VNC server running on ::1:5900

# /home/xianwang/qemu/ppc64-softmmu/qemu-system-ppc64
VNC server running on '::1:5901'

auto test:
1.git and install avocado, avocado-vt and tp-qemu
# git clone git://qe-git.englab.nay.redhat.com/s2/staf-kvm-devel
# cd staf-kvm-devel/
# ./Bootstrap.sh --venv --upstream --develop
# avocado --show all run --vt-machine-type pseries --vt-arch ppc64le --vt-qemu-bin /home/xianwang/qemu-master/qemu/ppc64-softmmu/qemu-system-ppc64 --vt-qemu-dst-bin /home/xianwang/qemu/ppc64-softmmu/qemu-system-ppc64 --vt-extra-params machine_type=pseries-2.6 --vt-guest-os RHEL.7.4 -- migrate.default.tcp

result:
[qemu output] qemu-system-ppc64: error while loading state for instance 0x0 of device 'cpu'
[qemu output] qemu-system-ppc64: load of migration failed: Invalid argument

Traceback (most recent call last):

02:57:38 ERROR|   File "/home/xianwang/staf-kvm-devel/workspace/avocado/avocado/core/test.py", line 919, in _run_avocado
    raise test_exception

02:57:38 ERROR| VMMigrateFailedError: Migration failed

02:57:38 ERROR| ERROR 1-type_specific.io-github-autotest-qemu.migrate.default.tcp -> VMMigrateFailedError: Migration failed

Comment 22 Greg Kurz 2018-03-15 09:00:35 UTC
With traces enabled on the destination we get:

vmstate_load_field_error field "env.msr_mask" load failed, ret = -22

QEMU 2.6.2 expects the msr_mask to be equal in both source and destination.

This is a regression caused by this recent commit:

commit 21b786f607b11d888f90bbb8c3414500515d11e7
Author: Simon Guo <wei.guo.simon>
Date:   Mon Mar 5 18:53:48 2018 +0800

    PowerPC: Add TS bits into msr_mask
    
    During migration, after MSR bits is synced, cpu_post_load() will use
    msr_mask to determine which PPC MSR bits will be applied into the target
    side. Hardware Transaction Memory(HTM) has been supported since Power8,
    but TS0/TS1 bit was not in msr_mask yet. That will prevent target KVM
    from loading TM checkpointed values.
    
    This patch adds TS bits into msr_mask for Power8, so that transactional
    application can be migrated across qemu.
    
    Signed-off-by: Simon Guo <wei.guo.simon>
    Signed-off-by: David Gibson <david.id.au>

This calls for some compat code.

Comment 23 David Gibson 2018-03-19 02:51:47 UTC
No, I don't think that explains it (although it is a bug we should fix).

The msr_mask change would explain a failure migrating from current to 2.6 or 2.7.  However coming 2.6->current should ignore the incoming msr_mask, and for 2.8 and later the msr_mask is removed from the migration stream entirely.

Comment 24 David Gibson 2018-03-19 03:03:21 UTC
Duh, sorry, was thinking of the original report.  In the light of comment 17, comment 22 does indeed explain the remaining failure.

Comment 25 David Gibson 2018-03-20 02:52:16 UTC
Just posted a fix for the bug described in comment 22 for upstream review.

Comment 26 David Gibson 2018-04-12 05:53:41 UTC
Lukas,

The fix is now merged upstream (in time for 2.12).  Can you confirm that the problem is now fixed with the latest upstream snapshot?

Comment 27 David Gibson 2018-04-17 02:37:45 UTC
Lukas has confirmed this is fixed with the latest upstream snapshots.  In particular the fix made it into 2.12, and therefore will be included in RHEL7.6 by rebase.  The bug hadn't appeared in RHEL7.5, so there's no further downstream impact and we can close this.

Comment 28 Lukáš Doktor 2018-05-04 16:36:07 UTC
Hello guys,

I'm sorry for such a late response, yes, it worked well at that time. Unfortunately last week's results brought back similar issue on master -> 2.6/2.7 migration on all variants (ppc64/ppc64le host, ppc64/ppc64le guest). The output is:

    19:03:05 INFO | [qemu output] qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr'
    19:03:05 INFO | [qemu output] qemu-system-ppc64: load of migration failed: No such file or directory
    19:03:05 INFO | [qemu output] (Process terminated with status 1)

More details are in the `Weekly check` report on power-kvm-tech list.

Comment 30 Greg Kurz 2018-05-04 20:26:29 UTC
Hi Lukas,

I think the fix for this was merged upstream today:

https://git.qemu.org/?p=qemu.git;a=commit;h=aef19c04bf88e0f5f936301e6c29b239e488fbc6

Comment 31 Lukáš Doktor 2018-05-05 04:57:51 UTC
Yes, I can confirm the migrate job is passing well among master (c8b7e627b4269a3bc3ae41d9f420547a47e6d9b9), 2.6.2, 2.7.1, 2.8.1.1, 2.9.1, 2.10.2, 2.11.0. Thank you.


Note You need to log in before you can comment on or make changes to this bug.