Bug 1083973 - vectors >32 cause migration fail from RHEL6.5 to RHEL7.0
Summary: vectors >32 cause migration fail from RHEL6.5 to RHEL7.0
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Michael S. Tsirkin
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1159611 1159613
TreeView+ depends on / blocked
 
Reported: 2014-04-03 10:32 UTC by FuXiangChun
Modified: 2014-12-03 22:08 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If a virtio device is created where the number of vectors is set to a value higher than 32, the device behaves as if it was set to a zero value on Red Hat Enterprise Linux 6, but not on Enterprise Linux 7. The resulting vector setting mismatch causes a migration error if the number of vectors on any virtio device on either platform is set to 33 or higher. It is, therefore, not recommended to set the "vector" value to be greater than 32.
Clone Of:
: 1159611 1159613 (view as bug list)
Environment:
Last Closed: 2014-12-03 22:08:58 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description FuXiangChun 2014-04-03 10:32:28 UTC
Description of problem:
Boot guest with virto-blk-pci ctroller and vectores=33 in RHEL6.5 host. Do migrate from rhel6.5 to rhel7.0.  qemu-kvm process quit in des host when migration is finished.

Version-Release number of selected component (if applicable):
rhel6.5 host:
2.6.32-452.el6.x86_64
qemu-kvm-0.12.1.2-2.415.el6_5.7.x86_64

rhel7.0 host:
3.10.0-118.el7.x86_64
qemu-kvm-1.5.3-60.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.In src host
/usr/libexec/qemu-kvm -M rhel6.5.0 -cpu Opteron_G2 -enable-kvm -m 2G  -smp 4 -name rhel6.5 -uuid 6afa5f93-2d4f-420f-81c6-e5fdddbd1c83 -drive file=/mnt/RHEL-Server-6.5-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=40c061dd-5d60-4fc5-865f-55db700407f0,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,vectors=33 -net none -vnc :1  -monitor stdio -serial unix:/tmp/monitor2,server,nowait

2.In des host
/usr/libexec/qemu-kvm -M rhel6.5.0 -cpu Opteron_G2 -enable-kvm -m 2G  -smp 4 -name rhel6.5 -uuid 6afa5f93-2d4f-420f-81c6-e5fdddbd1c83 -drive file=/mnt/RHEL-Server-6.5-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=40c061dd-5d60-4fc5-865f-55db700407f0,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,vectors=33 -net none -vnc :1  -monitor stdio -serial unix:/tmp/monitor2,server,nowait -incoming tcp:0:5555

3.do migration

Actual results:
qemu-kvm process quit.

(qemu) qemu: warning: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-blk'
load of migration failed

Expected results:


Additional info:

Comment 1 FuXiangChun 2014-04-03 10:50:04 UTC
I tested qemu-kvm-1.5.3-50.el7.x86_64 as well, hit this issue as well. So, this might not a regression.

If you need qe test more earlier qemu-kvm version, please let us know.

Comment 5 Dr. David Alan Gilbert 2014-04-08 09:24:46 UTC
I've had a look at this, but I'm hitting the limits of my PCI knowledge;

using a testbed consisting of a RHEL6 qemu build (on RHEL7) and a rhel7 qemu:

rhel6/qemu-system-x86_64 -nographic -nodefaults -device virtio-serial-pci,id=virtio-serial0,vectors=33,bus=pci.0,addr=0x5 -M rhel6.5.0 --chardev socket,port=4000,host=localhost,id=mon,server,nowait,telnet -mon chardev=mon,id=mon

rhel7/qemu-kvm -nographic -nodefaults -device virtio-serial-pci,id=virtio-serial0,vectors=33,bus=pci.0,addr=0x5 -M rhel6.5.0 --chardev socket,port=4001,host=localhost,id=mon,server,nowait,telnet -mon chardev=mon,id=mon -incoming tcp:localhost:4444


vmstate_load_state loop for PCIDevice/config
get_pci_config_device: EINVAL for 6 config=0 s->config=10 cmask=10 wmask=0 w1cmask=0
vmstate_load_state loop exit(a) for PCIDevice/config ret=-22
qemu: warning: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-console'

If I'm reading that right it's objecting to bit 4 in PCI status register which I think is the 'new capabilities' bit.

Is this something like a previous limit of 32 interrupts/msi's that RHEL6 didn't check for?

Comment 6 Paolo Bonzini 2014-05-14 14:03:51 UTC
This bit is added automatically when a capability is added to the device.

Can you dump the whole contents of config and s->config?

Comment 7 Dr. David Alan Gilbert 2014-05-14 17:40:01 UTC
OK, I'll add some code to dump the full configs; however my guess was something like that 'more than 32 interrupts' was itself the thing that required the capability.

Comment 8 Michael S. Tsirkin 2014-05-15 06:43:34 UTC
no, msi always requires a capability.

Comment 9 Dr. David Alan Gilbert 2014-05-15 08:39:11 UTC
This dump is with 33:

get_pci_config_device: EINVAL for 6 config=0 s->config=10 cmask=10 wmask=0 w1cmask=0
config: 0000:  f4 1a 03 10  03 00 00 00  00 00 80 07  00 00 00 00
config: 0010:  21 c0 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0020:  00 00 00 00  00 00 00 00  00 00 00 00  f4 1a 03 00
config: 0030:  00 00 00 00  00 00 00 00  00 00 00 00  0a 01 00 00
config: 0040:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0050:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0060:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0070:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0080:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 0090:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00a0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00b0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00c0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00d0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00e0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
config: 00f0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0000:  f4 1a 03 10  00 00 10 00  00 00 80 07  00 00 00 00
s->config: 0010:  01 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0020:  00 00 00 00  00 00 00 00  00 00 00 00  f4 1a 03 00
s->config: 0030:  00 00 00 00  40 00 00 00  00 00 00 00  00 01 00 00
s->config: 0040:  11 00 20 00  01 00 00 00  01 08 00 00  00 00 00 00
s->config: 0050:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0060:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0070:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0080:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 0090:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00a0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00b0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00c0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00d0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00e0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
s->config: 00f0:  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
vmstate_load_state loop exit(a) for PCIDevice/config ret=-22

Comment 10 Michael S. Tsirkin 2014-05-15 16:16:23 UTC
okay so the limitation is in virtio blk in rhel6
that one can not support 33 vectors and disables msix.

we could just document the limitation for rhel6.
alternatively we could add code to make it exit immediately
fo rhel6 machine type and > 32 vectors.

Comment 11 Michael S. Tsirkin 2014-11-02 08:48:07 UTC
As this didn't work properly in rhel6,  let's just document this limitation:

- in rhel6, attempts to use more than 32 vectors per device cause msix to be disabled
- additionally, migration to rhel7 will fail


Note You need to log in before you can comment on or make changes to this bug.