Bug 1633562

Summary: VM with "hugepage" + "file backend memory" fails to migrate from RHEL7.4 to RHEL7.6
Product: Red Hat Enterprise Linux 7 Reporter: Fangge Jin <fjin>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED WONTFIX QA Contact: Jing Qi <jinqi>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.6CC: dgilbert, dyuan, fjin, jdenemar, lmen, mdeng, qzhang, xianwang, xuzhang, yalzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-02 16:23:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs and xml none

Description Fangge Jin 2018-09-27 09:50:53 UTC
Description of problem:
VM with "hugepage" + "file backend memory" fails to migrate from RHEL7.4 to RHEL7.6

Version-Release number of selected component (if applicable):
RHEL7.6
libvirt-4.5.0-10.el7.x86_64
qemu-kvm-rhev-2.12.0-18.el7.x86_64
kernel-3.10.0-944.el7.x86_64

RHEL7.4
libvirt-3.2.0-14.el7_4.12.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.18.x86_64
kernel-3.10.0-693.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Prepare a guest on RHEL7.4 host with hugepage+file backend memory:
# virsh dumpxml rhel7.4
...
  <memoryBacking>
    <hugepages/>
    <nosharepages/>
    <source type='file'/>
  </memoryBacking>
...

2.Start guest, migrate it to RHEL7.6 host with postcopy:
# virsh migrate rhel7.4 qemu+ssh://192.168.122.225/system --live --verbose --postcopy --postcopy-after-precopy
error: internal error: qemu unexpectedly closed the monitor: 2018-09-27T09:48:01.014059Z qemu-kvm: Postcopy needs matching RAM page sizes (s=1000 d=201000)
2018-09-27T09:48:01.016917Z qemu-kvm: load of migration failed: Operation not permitted

3.Compare the qemu command line between source host and target host, they are different.
Source host:
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram,size=536870912 -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 -object memory-backend-file,id=ram-node1,mem-path=/var/lib/libvirt/qemu/ram,size=536870912

Target host:
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-rhel7.4,size=536870912 -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 -object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-rhel7.4,size=536870912


Actual results:
Migration fails

Expected results:
Migration succeeds.

Additional info:

Comment 2 Fangge Jin 2018-09-27 10:01:33 UTC
Created attachment 1487699 [details]
logs and xml

Comment 4 Michal Privoznik 2018-10-15 12:39:49 UTC
(In reply to Fangge Jin from comment #0)
> 
> 3.Compare the qemu command line between source host and target host, they
> are different.
> Source host:
> -object
> memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram,
> size=536870912 -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 -object
> memory-backend-file,id=ram-node1,mem-path=/var/lib/libvirt/qemu/ram,
> size=536870912
> 
> Target host:
> -object
> memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/
> libvirt/qemu/1-rhel7.4,size=536870912 -numa
> node,nodeid=0,cpus=0-7,memdev=ram-node0 -object
> memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/
> libvirt/qemu/1-rhel7.4,size=536870912
> 

The problem is not hugepages + file combination per se, but the fact that due to bug 1214369 libvirt preferred source='file' over hugepages. In fix for bug 1214369 the preference was fixed and therefore with never versions libvirt chooses hugepages. However, migration is broken.

I don't think there is much we can do here. Anyway, I am sending a patch that explicitly forbids this nonsensical combination at define time:

https://www.redhat.com/archives/libvir-list/2018-October/msg00794.html

Comment 5 Michal Privoznik 2018-11-02 13:01:29 UTC
The upstream is not in favour of the patches. Truth to be told, I am not much convinced myself. Originally, the misconfiguration should have been declined in 7.4 but that ship sailed long time ago. As I said earlier, I don't think there is much we can do.

Fangge, what are your thoughts? I'm inclined to close this as WONTFIX/CANTFIX.

Comment 6 Fangge Jin 2018-11-02 16:08:01 UTC
(In reply to Michal Privoznik from comment #5)
> The upstream is not in favour of the patches. Truth to be told, I am not
> much convinced myself. Originally, the misconfiguration should have been
> declined in 7.4 but that ship sailed long time ago. As I said earlier, I
> don't think there is much we can do.
> 
> Fangge, what are your thoughts? I'm inclined to close this as
> WONTFIX/CANTFIX.

I agree with you to close it if there is nothing we can do. As long as no customer meets this issue, it is not a problem

Comment 7 Michal Privoznik 2018-11-02 16:23:17 UTC
Closing based on the discussion.