Bug 1089610 - Migration failed from RHEL6.5 to RHEL7.0 host with "-global qxl-vga.vram_size<=8388608"
Summary: Migration failed from RHEL6.5 to RHEL7.0 host with "-global qxl-vga.vram_size...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Gerd Hoffmann
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-21 07:20 UTC by huiqingding
Modified: 2014-07-03 07:22 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
The minimum VRAM size for a qxl device is 16MB in Red Hat Enterprise Linux 6, but it is 4KB in Red Hat Enterprise Linux 7. Consequently, specifying a VRAM size that is less than or equal to 8MB causes the actual VRAM size to differ between the versions, and live migration from Red Hat Enterprise Linux 6 to Red Hat Enterprise Linux 7 fails. To work around this problem, do not specify VRAM sizes that are less than or equal to 8MB on Red Hat Enterprise Linux 6. The default size of 9MB does not cause the described problem.
Clone Of:
Environment:
Last Closed: 2014-07-03 07:22:07 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description huiqingding 2014-04-21 07:20:53 UTC
Description of problem:
Migration guest (rhel6.5-64 and win7-32.qcow2) from rhel6.5 host to rhel7.0 host, set "-global qxl-vga.vram_size <= 8388608", migration is failed.

Version-Release number of selected component (if applicable):
RHEL6.5 host:
2.6.32-456.el6.x86_64
qemu-kvm-0.12.1.2-2.423.el6.x86_64

RHEL7.0 host:
3.10.0-121.el7.x86_64
qemu-kvm-1.5.3-60.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot guest on src and dest host
# /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu Westmere,hv_relaxed -enable-kvm  -m 4096 -smp 4,sockets=2,cores=2,threads=1,maxcpus=160 -k en-us -drive file=/mnt/RHEL-Server-6.5-64-virtio.qcow2,if=none,id=drive-virtio-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,media=disk,snapshot=off,bus=1,unit=1 -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk,id=virtio-disk,bus=pci.0,addr=0x7,bootindex=1 -monitor stdio -serial unix:/tmp/monitor,server,nowait -net none -vnc :1 -vga qxl -global qxl-vga.vram_size=8388608
2. do migration
3.

Actual results:
the dest qemu-kvm quits with the error info:
(qemu) qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed


Expected results:
migration is successful.

Additional info:
1. when qxl vram set to be > 8388608, migration is successful. 
2. do local migration: rhel6.5->rhel6.5, rhel7.0->rhel7.0, set "-global qxl-vga.vram_size=8388608", migration is successful.
3. libvirt can set the value of qxl-vga.vram_size and cannot hide this issue:
<video>
      <model type='qxl' ram='7608' vram='7608' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
4. if this is not a bug, please free to close it.

Comment 2 huiqingding 2014-04-21 07:48:39 UTC
It might not a regression since I tried qemu-kvm-1.5.3-40.el7.x86_64 and qemu-kvm-1.5.3-10.el7.x86_64, also hit this issue. Any further testing, let me know.

Comment 3 Gerd Hoffmann 2014-04-22 14:58:36 UTC
vram sizes which are not a power of two are not valid (both RHEL 6+7).
qxl will silently round up the sizes to the next power of two.

vram sizes smaller than 16M are not valid in RHEL-6.  If you ask for smaller vram sizes RHEL-6 will silently use 16M instead.  in RHEL-7 this is relaxed and you can ask for vram sizes as small as a single page (4k).

So the root cause is that rhel6 uses a 16M vram bar whereas rhel7 uses a 8M vram bar with the specified command line, and the size mismatch leads to the migration error.

I'd suggest to adapt the test matrix to the actual rhel6 capabilities and drop the tests with vram sizes smaller than 16M.

Sounds ok?

Comment 4 juzhang 2014-04-23 01:45:11 UTC
(In reply to Gerd Hoffmann from comment #3)
> vram sizes which are not a power of two are not valid (both RHEL 6+7).
> qxl will silently round up the sizes to the next power of two.
> 
> vram sizes smaller than 16M are not valid in RHEL-6.  If you ask for smaller
> vram sizes RHEL-6 will silently use 16M instead.  in RHEL-7 this is relaxed
> and you can ask for vram sizes as small as a single page (4k).
> 
> So the root cause is that rhel6 uses a 16M vram bar whereas rhel7 uses a 8M
> vram bar with the specified command line, and the size mismatch leads to the
> migration error.
> 
> I'd suggest to adapt the test matrix to the actual rhel6 capabilities and
> drop the tests with vram sizes smaller than 16M.
> 
> Sounds ok?

Hi Gerd,

From QE POV, it's ok if we can make sure the real customer can not hit this issue.

Hi Huding,

Can you have a try migration again with vram=16M since gerd mentioned that this is limitation value. 

If works, we could test with vram>=16M in the feature.

Best Regards,
Junyi

Comment 5 huiqingding 2014-04-23 02:48:12 UTC
> Hi Huding,
> 
> Can you have a try migration again with vram=16M since gerd mentioned that
> this is limitation value. 
> 
> If works, we could test with vram>=16M in the feature.
> 

I test vram=16M and the result is that the migration of rhel6.5-64 guest from rhel6.5 host to rhel7.0 host is successful. The command line is as following:

/usr/libexec/qemu-kvm -M rhel6.5.0 -cpu Westmere,hv_relaxed -enable-kvm  -m 4096 -smp 4,sockets=2,cores=2,threads=1,maxcpus=160 -k en-us -drive file=/mnt/RHEL-Server-6.5-64-virtio.q=none,id=drive-virtio-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,media=disk,snapshot=off,bus=1,unit=1 -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk,id=virtio-disk,bus=pci.0,addr=0x7,bootindex=1 -monitor stdio -serial unix:/tmp/monitor,server,nowait -net none -vnc :1 -vga qxl -global qxl-vga.vram_size=16777216

Comment 6 Gerd Hoffmann 2014-04-23 08:17:51 UTC
> From QE POV, it's ok if we can make sure the real customer can not hit this
> issue.

The only sensible thing I can see is document it.

The problem is in the old RHEL-6 version, not RHEL-7.  And trying to fix it in RHEL-6.6 (by making it behave like RHEL-7) will not really fix the fundamental underlying issue.  6.6 -> 7 migration will work then, but 6.5 -> 6.6 will break instead ...

Comment 7 juzhang 2014-04-23 08:29:37 UTC
(In reply to Gerd Hoffmann from comment #6)
> > From QE POV, it's ok if we can make sure the real customer can not hit this
> > issue.
> 
> The only sensible thing I can see is document it.
> 
> The problem is in the old RHEL-6 version, not RHEL-7.  And trying to fix it
> in RHEL-6.6 (by making it behave like RHEL-7) will not really fix the
> fundamental underlying issue.  6.6 -> 7 migration will work then, but 6.5 ->
> 6.6 will break instead ...

Thanks for the explanation. Yes it is. Document it in release notes or some place else seems reasonable.

Best Regards,
Junyi

Comment 8 Dr. David Alan Gilbert 2014-04-23 08:33:19 UTC
We could do with adding some more diagnostics to the RAM loading so that the error was more obviously QXL related; at the moment arch_init.c:ram_load just spits -EINVAL in the case of mismatched block lengths, it could easily print the block name and the two lengths and a message saying they didn't match.

Comment 9 Gerd Hoffmann 2014-04-23 08:57:56 UTC
Added doc text.

Comment 10 Paolo Bonzini 2014-05-14 13:48:00 UTC
I think we could have added a property "min_vram_size" to RHEL7, and forced it to 16MB in the RHEL6 machine types.  However, changing the RHEL6 machine types after the 7.0 release would probably cause more trouble than benefit.


Note You need to log in before you can comment on or make changes to this bug.