Bug 1465799

Summary:	When do migration from RHEL7.4 host to RHEL7.3.Z host, dst host prompt "error while loading state for instance 0x0 of device 'spapr_pci'"
Product:	Red Hat Enterprise Linux 7	Reporter:	xianwang <xianwang>
Component:	qemu-kvm-rhev	Assignee:	Laurent Vivier <lvivier>
Status:	CLOSED ERRATA	QA Contact:	xianwang <xianwang>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.4	CC:	knoel, lvivier, michen, mrezanin, mtessun, qzhang, virt-maint, xianwang
Target Milestone:	rc
Target Release:	---
Hardware:	ppc64le
OS:	Linux
Whiteboard:
Fixed In Version:	qemu-kvm-rhev-2.10.0-1.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-11 00:26:27 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1473046

Description xianwang 2017-06-28 08:50:58 UTC

Description of problem:
When do migration from rhel7.4 host to rhel7.3.z host, there is error message prompt on dst host"
(qemu) qemu-kvm: error while loading state for instance 0x0 of device 'spapr_pci'
load of migration failed: Invalid argument"

Version-Release number of selected component (if applicable):
(host1)
RHEL-7.3-20161019.0 Server ppc64le
kernel-3.10.0-514.27.1.el7
qemu-kvm-rhev-2.6.0-28.el7_3.10
SLOF-20160223-6.gitdbbfda4.el7.noarch
(host2)
RHEL-7.4-20170525.7 Server ppc64le
kernel-3.10.0-686.el7
qemu-kvm-rhev-2.9.0-14.el7
SLOF-20170303-4.git66d250e.el7

How reproducible:


Steps to Reproduce:
1.On src host(rhel7.4 host)
/usr/libexec/qemu-kvm -monitor stdio -M pseries-rhel7.3.0 -nodefaults
2.On dst host(rhel7.3 host)
/usr/libexec/qemu-kvm -monitor stdio -M pseries-rhel7.3.0 -nodefaults -incoming tcp:0:5801
3.on src host:
(qemu)migrate -d tcp:10.16.42.48:5801

Actual results:
on src host, migration completed while on dst host, there is error message prompted
on src host:
(qemu)info migration
migration status:completed

on dst host:
(qemu) qemu-kvm: error while loading state for instance 0x0 of device'spapr_pci'
load of migration failed: Invalid argument

Expected results:
migration completed and vm works well

Additional info:

Comment 2 Laurent Vivier 2017-06-28 08:57:31 UTC

It fails in get_uint32_equal() for dma_liobn field, because received value is 0, while it should be 0x80000000. This part has been reworked between 2.7 and 2.8, there is a special case and the value is "forged" to be able to migrate to machine before 2.8 (before RHEL 7.4). I think the value is not initialized correctly. It works when the OS is started because I guess the OS has put the good value in the field.

Comment 3 xianwang 2017-06-28 09:00:01 UTC

1.Spapr-pci is a device of ppc only, so this bug is powerpc only, not for x86_64.
2.When do migration as same qemu cli as bug report from rhel7.3.z host to rhel7.4 host, the result is normal i.e,the migration completed and vm is running on dst host

Comment 4 Laurent Vivier 2017-06-28 09:27:54 UTC

(In reply to xianwang from comment #3)
> 1.Spapr-pci is a device of ppc only, so this bug is powerpc only, not for
> x86_64.
> 2.When do migration as same qemu cli as bug report from rhel7.3.z host to
> rhel7.4 host, the result is normal i.e,the migration completed and vm is
> running on dst host

3. If the migration is started while the guest OS is running, the migration from rhel7.4 host to rhel7.3.z host works well

Comment 5 Laurent Vivier 2017-06-28 13:23:01 UTC

The problem is related to the hack to allow the migration from 2.8 and latter to pre-2.8.

In the pre_save function, the part copying the new fields to the migration fields is short-circuited because the function returns when the number of MSI devices is 0.

Moving the copy of the fields before this part fixes the problem.

Comment 7 Laurent Vivier 2017-06-28 16:20:16 UTC

Upstream patch:

https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06373.html

Comment 8 xianwang 2017-06-29 05:55:10 UTC

(In reply to Laurent Vivier from comment #4)
> (In reply to xianwang from comment #3)
> > 1.Spapr-pci is a device of ppc only, so this bug is powerpc only, not for
> > x86_64.
> > 2.When do migration as same qemu cli as bug report from rhel7.3.z host to
> > rhel7.4 host, the result is normal i.e,the migration completed and vm is
> > running on dst host
> 
> 3. If the migration is started while the guest OS is running, the migration
> from rhel7.4 host to rhel7.3.z host works well

Hi, Laurent,
Yes, you are right, I have re-test this scenario with the latest version just now, if the guest OS is running, the migration works well, but if use the simple cli, migration failed as bug report. Detail is as below:
version:
host1(rhel7.4)
3.10.0-690.el7.ppc64le
qemu-kvm-rhev-2.9.0-14.el7.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch

host2(rhel7.3)
3.10.0-514.27.1.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.12.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

scenario I:
qemu cli:
1.On rhel7.4 host boot guest:
/usr/libexec/qemu-kvm -monitor stdio -M pseries-rhel7.3.0 -nodefaults -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=09 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/root/rhel74-ppc64le-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,bootindex=0 -vnc :1 -vga std   
2.On rhel7.3 host launch listening mode with "-incoming tcp:0:5801"
3.Migrate guest form rhel7.4 host to rhel7.3 host
(qemu) migrate -d tcp:10.16.42.48:5801
4.Check the status of migration and reboot guest
on src host(rhel7.4)
(qemu) info migrate
Migration status: completed

on dst host(rhel7.3)
(qemu) info status 
VM status: running
(qemu) system_reset 
(qemu) KVM: Failed to create TCE table for liobn 0x80000001
vm works well though there is this message prompt.

5.Then, migrate guest from rhel7.3 to rhel7.4 host
migration completed and "system_reset",there is no message prompt.

scenario II:
use the simple qemu cli as bug report, the result is same with bug.

Now, I will test the build in comment 6.

Comment 10 xianwang 2017-06-29 07:34:38 UTC

For the message prompt in qemu that "(qemu) KVM: Failed to create TCE table for liobn 0x80000001" in comment8, there is another bug for this issue which is https://bugzilla.redhat.com/show_bug.cgi?id=1440619, this bug is fixed for rhel7.4 but not for rhel7.3.z, due to this bug is triggered once the memory of guest is very small and it is not critical enough, so, we should ignore it.

Comment 14 Laurent Vivier 2017-09-20 13:56:44 UTC

Fixed in qemu 2.10. Fix will allow to migrate from RHEL7.5.0 to RHEL7.3.z and before.

Comment 16 xianwang 2017-11-13 08:35:20 UTC

This bug is verified pass on qemu-kvm-rhev-2.10.0-5.el7.ppc64le
version:
host1(rhel7.5)
3.10.0-776.el7.ppc64le
qemu-kvm-rhev-2.10.0-5.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

host2(rhel7.3)
3.10.0-776.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.14.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

steps:
1.On src host(rhel7.5 host)
/usr/libexec/qemu-kvm -monitor stdio -M pseries-rhel7.3.0 -nodefaults
2.On dst host(rhel7.3 host)
/usr/libexec/qemu-kvm -monitor stdio -M pseries-rhel7.3.0 -nodefaults -incoming tcp:0:5801
3.on src host:
(qemu)migrate -d tcp:10.16.42.46:5801

result:
migration complete successfully.
src end:
(qemu) info migrate
Migration status: completed
(qemu) info status 
VM status: paused (postmigrate)

dst end:
(qemu) info status 
VM status: running

So, this bug is fixed.

Comment 18 errata-xmlrpc 2018-04-11 00:26:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104