Bug 1083973

Summary: vectors >32 cause migration fail from RHEL6.5 to RHEL7.0
Product: Red Hat Enterprise Linux 7 Reporter: FuXiangChun <xfu>
Component: qemu-kvmAssignee: Michael S. Tsirkin <mst>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 7.0CC: bcao, dgilbert, hhuang, jherrman, juzhang, knoel, michen, mst, pbonzini, qzhang, rbalakri, rhod, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
If a virtio device is created where the number of vectors is set to a value higher than 32, the device behaves as if it was set to a zero value on Red Hat Enterprise Linux 6, but not on Enterprise Linux 7. The resulting vector setting mismatch causes a migration error if the number of vectors on any virtio device on either platform is set to 33 or higher. It is, therefore, not recommended to set the "vector" value to be greater than 32.
Story Points: ---
Clone Of:
: 1159611 1159613 (view as bug list) Environment:
Last Closed: 2014-12-03 22:08:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1159611, 1159613    

Description FuXiangChun 2014-04-03 10:32:28 UTC
Description of problem:
Boot guest with virto-blk-pci ctroller and vectores=33 in RHEL6.5 host. Do migrate from rhel6.5 to rhel7.0.  qemu-kvm process quit in des host when migration is finished.

Version-Release number of selected component (if applicable):
rhel6.5 host:
2.6.32-452.el6.x86_64
qemu-kvm-0.12.1.2-2.415.el6_5.7.x86_64

rhel7.0 host:
3.10.0-118.el7.x86_64
qemu-kvm-1.5.3-60.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.In src host
/usr/libexec/qemu-kvm -M rhel6.5.0 -cpu Opteron_G2 -enable-kvm -m 2G  -smp 4 -name rhel6.5 -uuid 6afa5f93-2d4f-420f-81c6-e5fdddbd1c83 -drive file=/mnt/RHEL-Server-6.5-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=40c061dd-5d60-4fc5-865f-55db700407f0,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,vectors=33 -net none -vnc :1  -monitor stdio -serial unix:/tmp/monitor2,server,nowait

2.In des host
/usr/libexec/qemu-kvm -M rhel6.5.0 -cpu Opteron_G2 -enable-kvm -m 2G  -smp 4 -name rhel6.5 -uuid 6afa5f93-2d4f-420f-81c6-e5fdddbd1c83 -drive file=/mnt/RHEL-Server-6.5-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=40c061dd-5d60-4fc5-865f-55db700407f0,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,vectors=33 -net none -vnc :1  -monitor stdio -serial unix:/tmp/monitor2,server,nowait -incoming tcp:0:5555

3.do migration

Actual results:
qemu-kvm process quit.

(qemu) qemu: warning: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-blk'
load of migration failed

Expected results:


Additional info:

Comment 1 FuXiangChun 2014-04-03 10:50:04 UTC
I tested qemu-kvm-1.5.3-50.el7.x86_64 as well, hit this issue as well. So, this might not a regression.

If you need qe test more earlier qemu-kvm version, please let us know.

Comment 5 Dr. David Alan Gilbert 2014-04-08 09:24:46 UTC
I've had a look at this, but I'm hitting the limits of my PCI knowledge;

using a testbed consisting of a RHEL6 qemu build (on RHEL7) and a rhel7 qemu:

rhel6/qemu-system-x86_64 -nographic -nodefaults -device virtio-serial-pci,id=virtio-serial0,vectors=33,bus=pci.0,addr=0x5 -M rhel6.5.0 --chardev socket,port=4000,host=localhost,id=mon,server,nowait,telnet -mon chardev=mon,id=mon

rhel7/qemu-kvm -nographic -nodefaults -device virtio-serial-pci,id=virtio-serial0,vectors=33,bus=pci.0,addr=0x5 -M rhel6.5.0 --chardev socket,port=4001,host=localhost,id=mon,server,nowait,telnet -mon chardev=mon,id=mon -incoming tcp:localhost:4444


vmstate_load_state loop for PCIDevice/config
get_pci_config_device: EINVAL for 6 config=0 s->config=10 cmask=10 wmask=0 w1cmask=0
vmstate_load_state loop exit(a) for PCIDevice/config ret=-22
qemu: warning: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-console'

If I'm reading that right it's objecting to bit 4 in PCI status register which I think is the 'new capabilities' bit.

Is this something like a previous limit of 32 interrupts/msi's that RHEL6 didn't check for?

Comment 6 Paolo Bonzini 2014-05-14 14:03:51 UTC
This bit is added automatically when a capability is added to the device.

Can you dump the whole contents of config and s->config?

Comment 7 Dr. David Alan Gilbert 2014-05-14 17:40:01 UTC
OK, I'll add some code to dump the full configs; however my guess was something like that 'more than 32 interrupts' was itself the thing that required the capability.

Comment 8 Michael S. Tsirkin 2014-05-15 06:43:34 UTC
no, msi always requires a capability.

Comment 9 Dr. David Alan Gilbert 2014-05-15 08:39:11 UTC
This dump is with 33:

get_pci_config_device: EINVAL for 6 config=0 s->config=10 cmask=10 wmask=0 w1cmask=0
config: 0000:  f4 1a 03 10  03 00 00 00  00 00 80 07  00 00 00 00
config: 0010:  21 c0 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0020:  00 00 00 00  00 00 00 00  00 00 00 00  f4 1a 03 00
config: 0030:  00 00 00 00  00 00 00 00  00 00 00 00  0a 01 00 00
config: 0040:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0050:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0060:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0070:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0080:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0090:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00a0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00b0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00c0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00d0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00e0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00f0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0000:  f4 1a 03 10  00 00 10 00  00 00 80 07  00 00 00 00
s->config: 0010:  01 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0020:  00 00 00 00  00 00 00 00  00 00 00 00  f4 1a 03 00
s->config: 0030:  00 00 00 00  40 00 00 00  00 00 00 00  00 01 00 00
s->config: 0040:  11 00 20 00  01 00 00 00  01 08 00 00  00 00 00 00
s->config: 0050:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0060:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0070:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0080:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0090:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00a0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00b0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00c0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00d0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00e0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00f0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
vmstate_load_state loop exit(a) for PCIDevice/config ret=-22

Comment 10 Michael S. Tsirkin 2014-05-15 16:16:23 UTC
okay so the limitation is in virtio blk in rhel6
that one can not support 33 vectors and disables msix.

we could just document the limitation for rhel6.
alternatively we could add code to make it exit immediately
fo rhel6 machine type and > 32 vectors.

Comment 11 Michael S. Tsirkin 2014-11-02 08:48:07 UTC
As this didn't work properly in rhel6,  let's just document this limitation:

- in rhel6, attempts to use more than 32 vectors per device cause msix to be disabled
- additionally, migration to rhel7 will fail