Description of problem:
Hot-plug memory will induce error: kvm run failed Bad address on ppc
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.mount -t hugetlbfs none /mnt/hugetlbfs/
2.echo 2048 > /proc/sys/vm/nr_hugepages
3.boot up guest with
/usr/libexec/qemu-kvm -M pseries-rhel7.4.0 -name avocado-vt-vm1 -sandbox off -machine pseries -nodefaults -vga std -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/1,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=qmp_id_catch_monitor,path=/tmp/2,server,nowait -mon chardev=qmp_id_catch_monitor,mode=control -chardev socket,id=serial_id_serial0,path=/tmp/3,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -device pci-ohci,id=usb1,bus=pci.0,addr=03 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=04 -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=rhel73-ppc64le-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -m 1G,slots=256,maxmem=32G -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 -numa node -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :10 -rtc base=utc,clock=host -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -monitor stdio -device virtio-net-pci,mac=9a:09:0a:0b:0c:0d,id=idLLoQ97,vectors=4,netdev=hostnet0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -mem-path /mnt/hugetlbfs/
(qemu) error: kvm run failed Bad address
NIP c00000000005947c LR c0000000002a388c CTR 0000000000000200 XER 0000000000000000 CPU#0
MSR 8000000000009033 HID0 0000000000000000 HF 8000000000000000 iidx 3 didx 3
TB 00000000 00000000 DECR 00000000
GPR00 c0000000002a382c c00000002e3e7870 c0000000011c9100 c00000005fff0000
GPR04 00003fff7a790000 f00000000014ffc8 0000000000000000 0000000000000000
GPR08 f000000000150000 0000000000000080 0000000000000200 0000000000000000
GPR12 0000000028002882 c00000000fb80000 0000000000000000 0000000000000000
GPR16 0000000000000000 0000000000000000 c000000032c633c8 c000000032c63000
GPR20 00000000000003c8 0000000000000000 0000000000005fff c00000002e3e4000
GPR24 f00000000014ffc8 c00000002aac8980 0000000000000029 c0000000309b0bd0
GPR28 00003fff7a790000 0000000000000000 0000000000001bd0 c00000002e190000
CR 88002828 [ L L - - E L E L ] RES ffffffffffffffff
FPR00 72756769666e6f63 0000000000000000 0000000000000000 0000000000000000
FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR12 00003fff7a790000 0000000000000000 0000000000000000 0000000000000000
FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
SRR0 c000000000005900 SRR1 9000000000001033 PVR 00000000004b0201 VRSAVE 00000000ffffffff
SPRG0 0000000000000000 SPRG1 c00000000fb80000 SPRG2 c00000000fb80000 SPRG3 0000000000000000
SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000
HSRR0 0000000000000000 HSRR1 0000000000000000
SDR1 000000000000000a DAR 00003fff7a790000 DSISR 0000000042000000
The memory is added successfully.
There is no explicit declaration that memory is not supported for hotpluging while guest was with "-mem-path /mnt/hugetlbfs" so QE open it.Correct me if I was wrong.
QE cannot reproduce it on x86
Looks very similar to:
Author: Thomas Huth <email@example.com>
Date: Mon Jul 18 15:19:04 2016 +0200
ppc: Huge page detection mechanism fixes - Episode III
After already fixing two issues with the huge page detection mechanism
(see commit 159d2e39a860 and 86b50f2e1bef), Greg Kurz noticed another
case that caused the guest to crash where QEMU announces huge pages
though they should not be available for the guest:
qemu-system-ppc64 -enable-kvm ... -mem-path /dev/hugepages \
-object memory-backend-ram,policy=default,size=1G,id=mem-mem1 \
-device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
-numa node,nodeid=0 -numa node,nodeid=1
That means if there is a global mem-path option, we still have
to look at the memory-backend objects that have been specified
additionally and return their minimum page size if that value
is smaller than the page size of the main memory.
But this commit is already in 2.8.0 (since 2.7.0).
Looks like kvmppc_book3s_hv_page_fault() in arch/powerpc/kvm/book3s_64_mmu_hv.c of the kernel KVM code returns -EFAULT - and this then causes the "error: kvm run failed Bad address" in QEMU... I'll try to find out why this happens...
FWIW, I can also reproduce the crash with upstream QEMU, and without Numa, by simply running:
qemu-system-ppc64 -enable-kvm -nographic -vga none \
-m 1G,slots=256,maxmem=32G -mem-path /mnt/hugetlbfs -hda disk.img
OK, I think I've now understood what's happening here: If you add a "memory-backend-ram" object, it get's created with "normal" memory, i.e. without the hugetlbfs backup. Now, on POWER the guest can not mix normal memory regions and huge page regions (see BZ 1347498 for example), and we have to tell the guest the lowest common denominator of page sizes during boot. Since the guest has been started with huge pages only here, it thinks it can always use huge pages for all memory. So it is not possible to add memory with smaller page sizes later, and there is also no way to fix this problem on POWER, since the page sizes are communicated as CPU property during boot, i.e. it can not be communicated to the guest for memory regions that are added later.
On x86, we do not have this problem since page sizes are not communicated as property of the CPU, as far as I know.
So if you want to hot plug memory in this case, you have got to use a "memory-backend-file" object with "mem-path=/mnt/hugetlbfs" property instead. The only thing that we could do here is to avoid the crash by refusing to create a "memory-backend-ram" object in this case, and inform the user with an appropriate error message instead. I'll have a look at that next.
(In reply to Thomas Huth from comment #6)
> So if you want to hot plug memory in this case, you have got to use a
> "memory-backend-file" object with "mem-path=/mnt/hugetlbfs" property
> instead. The only thing that we could do here is to avoid the crash by
> refusing to create a "memory-backend-ram" object in this case, and inform
> the user with an appropriate error message instead. I'll have a look at that
I do not see any better solution either.
I've now suggested a patch upstream:
Patch has been merged upstream:
We'll get it via rebase to 2.9, so setting the state to POST now.
Considering comment6 and the current test results by terms of the following builds,QE verified the bug.
kernel-3.10.0-655.el7.ppc64le (guest and host)
Steps,please refer to comment 0
Actual results,an error messages pops up comparing with its before,it is much more friendly and accurately.
(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
Memory backend has bad page size. Use 'memory-backend-file' with correct mem-path.
An proper error message should pop up.
The issue has been fixed through providing an error message so far,it is acceptable.So QE move the bug to status verified.Thanks a lot.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.