Bug 1449037

Summary: Dst qemu quit when migrate guest with hugepage and total memory is not a multiple of pagesize
Product: Red Hat Enterprise Linux 7 Reporter: Yumei Huang <yuhuang>
Component: qemu-kvm-rhevAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED ERRATA QA Contact: xianwang <xianwang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: chayang, hhuang, jinzhao, juzhang, mdeng, mrezanin, peterx, quintela, qzhang, virt-maint, xfu
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-6.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:38:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1376765    

Description Yumei Huang 2017-05-09 06:22:19 UTC
Description of problem:
Boot guest with hugepage and guest total memory is not a multiple of pagesize, then do local migration, migration fail and destination qemu process quit with error message:

(qemu) qemu-kvm: Illegal RAM offset 40100000
qemu-kvm: error while loading state section id 4(ram)
qemu-kvm: load of migration failed: Invalid argument


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-3.el7
3.10.0-661.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. Set hugepage on host

# cat /proc/meminfo  | grep -i huge
AnonHugePages:   1705984 kB
HugePages_Total:    3000
HugePages_Free:     3000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

# mount | grep mnt
none on /mnt/kvm_hugepage type hugetlbfs (rw,relatime,seclabel)


2. Boot src guest with hugepage and '-m 1025'

# /usr/libexec/qemu-kvm -m 1025 rhel74-1-1.qcow2 -mem-path /mnt/kvm_hugepage -monitor stdio -vnc :0


3. Boot dst guest with same cmdline and '-incoming tcp:0:5555'

# /usr/libexec/qemu-kvm -m 1025 rhel74-1-1.qcow2 -mem-path /mnt/kvm_hugepage -monitor stdio -vnc :1 -incoming tcp:0:5555

4. Do migration from src guest
(qemu)  migrate -d tcp:127.0.0.1:5555


Actual results:
Dst qemu quit with:
(qemu) qemu-kvm: Illegal RAM offset 40100000
qemu-kvm: error while loading state section id 4(ram)
qemu-kvm: load of migration failed: Invalid argument


Expected results:
Migration success and guest work well.

Additional info:
1. Hit same issue with both 2M and 1G hugepage
2. Can NOT reproduce with qemu-kvm-rhev-2.6.0-28.el7

Comment 4 Dr. David Alan Gilbert 2017-05-16 17:32:25 UTC
Yes, I can repeat that here.  2.6->2.9 works.
                              2.9->2.9 fails
                              2.9->2.6 fails

I think the problem is that the new code makes sure it sends whole hugepages, but in this case the usedlength is probably not a multiple of a hugepage.

There's then a fun question of what happens on postcopy.

Comment 5 Dr. David Alan Gilbert 2017-05-17 17:04:54 UTC
Fixes posted upstream:

0001-migration-Fix-non-multiple-of-page-size-migration.patch
0002-postcopy-Require-RAMBlocks-that-are-whole-pages.patch

Comment 6 Dr. David Alan Gilbert 2017-05-18 11:21:44 UTC
Upstream has Rb's - posted downstream while waiting for the merge.

Comment 7 Miroslav Rezanina 2017-05-23 08:15:48 UTC
Fix included in qemu-kvm-rhev-2.9.0-6.el7

Comment 9 Min Deng 2017-06-01 08:55:15 UTC
QE reproduced the bug on builds
qemu-kvm-rhev-2.9.0-3.el7
Steps,please refer to comment0
Actual results,
[root@hp-dl385pg8-13 home]# /usr/libexec/qemu-kvm -m 1025 rhel74.qcow2 -mem-path /mnt/kvm_hugepage -monitor stdio -vnc :1 -incoming tcp:0:5555
QEMU 2.9.0 monitor - type 'help' for more information
(qemu) qemu-kvm: Illegal RAM offset 40100000
qemu-kvm: error while loading state section id 4(ram)
qemu-kvm: load of migration failed: Invalid argument
Expected results,
There is no error and migration should succeed

QE verified the bug on the builds
qemu-kvm-rhev-2.9.0-7.el7.x86_64
kernel-3.10.0-671.el7.x86_64
Steps please refer to comment0
Actual results,
Migration succeeded
Expected results,
Migration succeeded 

In brief,the bug has been fixed already,thanks.

Comment 10 Min Deng 2017-06-01 08:56:21 UTC
Base on comment9 so QE move it to verified

Comment 12 errata-xmlrpc 2017-08-02 04:38:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392