Hide Forgot
Description of problem: Hot-plug memory will induce error: kvm run failed Bad address on ppc Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.8.0-3.el7.ppc64le kernel-3.10.0-556.el7.ppc64le How reproducible: 3/3 Steps to Reproduce: 1.mount -t hugetlbfs none /mnt/hugetlbfs/ 2.echo 2048 > /proc/sys/vm/nr_hugepages 3.boot up guest with /usr/libexec/qemu-kvm -M pseries-rhel7.4.0 -name avocado-vt-vm1 -sandbox off -machine pseries -nodefaults -vga std -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/1,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=qmp_id_catch_monitor,path=/tmp/2,server,nowait -mon chardev=qmp_id_catch_monitor,mode=control -chardev socket,id=serial_id_serial0,path=/tmp/3,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -device pci-ohci,id=usb1,bus=pci.0,addr=03 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=04 -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=rhel73-ppc64le-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -m 1G,slots=256,maxmem=32G -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 -numa node -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :10 -rtc base=utc,clock=host -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -monitor stdio -device virtio-net-pci,mac=9a:09:0a:0b:0c:0d,id=idLLoQ97,vectors=4,netdev=hostnet0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -mem-path /mnt/hugetlbfs/ 3.object_add memory-backend-ram,id=mem1,size=1G 4.device_add pc-dimm,id=dimm1,memdev=mem1 Actual results: (qemu) error: kvm run failed Bad address NIP c00000000005947c LR c0000000002a388c CTR 0000000000000200 XER 0000000000000000 CPU#0 MSR 8000000000009033 HID0 0000000000000000 HF 8000000000000000 iidx 3 didx 3 TB 00000000 00000000 DECR 00000000 GPR00 c0000000002a382c c00000002e3e7870 c0000000011c9100 c00000005fff0000 GPR04 00003fff7a790000 f00000000014ffc8 0000000000000000 0000000000000000 GPR08 f000000000150000 0000000000000080 0000000000000200 0000000000000000 GPR12 0000000028002882 c00000000fb80000 0000000000000000 0000000000000000 GPR16 0000000000000000 0000000000000000 c000000032c633c8 c000000032c63000 GPR20 00000000000003c8 0000000000000000 0000000000005fff c00000002e3e4000 GPR24 f00000000014ffc8 c00000002aac8980 0000000000000029 c0000000309b0bd0 GPR28 00003fff7a790000 0000000000000000 0000000000001bd0 c00000002e190000 CR 88002828 [ L L - - E L E L ] RES ffffffffffffffff FPR00 72756769666e6f63 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 00003fff7a790000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 0000000000000000 SRR0 c000000000005900 SRR1 9000000000001033 PVR 00000000004b0201 VRSAVE 00000000ffffffff SPRG0 0000000000000000 SPRG1 c00000000fb80000 SPRG2 c00000000fb80000 SPRG3 0000000000000000 SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000 HSRR0 0000000000000000 HSRR1 0000000000000000 CFAR 0000000000000000 SDR1 000000000000000a DAR 00003fff7a790000 DSISR 0000000042000000 Expected results: The memory is added successfully. Notes, There is no explicit declaration that memory is not supported for hotpluging while guest was with "-mem-path /mnt/hugetlbfs" so QE open it.Correct me if I was wrong.
QE cannot reproduce it on x86
Looks very similar to: commit 3d4f2534834cd9f9bbb3dd145fa61fd2ac0dd535 Author: Thomas Huth <thuth@redhat.com> Date: Mon Jul 18 15:19:04 2016 +0200 ppc: Huge page detection mechanism fixes - Episode III After already fixing two issues with the huge page detection mechanism (see commit 159d2e39a860 and 86b50f2e1bef), Greg Kurz noticed another case that caused the guest to crash where QEMU announces huge pages though they should not be available for the guest: qemu-system-ppc64 -enable-kvm ... -mem-path /dev/hugepages \ -m 1G,slots=4,maxmem=32G -object memory-backend-ram,policy=default,size=1G,id=mem-mem1 \ -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \ -numa node,nodeid=0 -numa node,nodeid=1 That means if there is a global mem-path option, we still have to look at the memory-backend objects that have been specified additionally and return their minimum page size if that value is smaller than the page size of the main memory. But this commit is already in 2.8.0 (since 2.7.0).
Looks like kvmppc_book3s_hv_page_fault() in arch/powerpc/kvm/book3s_64_mmu_hv.c of the kernel KVM code returns -EFAULT - and this then causes the "error: kvm run failed Bad address" in QEMU... I'll try to find out why this happens...
FWIW, I can also reproduce the crash with upstream QEMU, and without Numa, by simply running: qemu-system-ppc64 -enable-kvm -nographic -vga none \ -m 1G,slots=256,maxmem=32G -mem-path /mnt/hugetlbfs -hda disk.img
OK, I think I've now understood what's happening here: If you add a "memory-backend-ram" object, it get's created with "normal" memory, i.e. without the hugetlbfs backup. Now, on POWER the guest can not mix normal memory regions and huge page regions (see BZ 1347498 for example), and we have to tell the guest the lowest common denominator of page sizes during boot. Since the guest has been started with huge pages only here, it thinks it can always use huge pages for all memory. So it is not possible to add memory with smaller page sizes later, and there is also no way to fix this problem on POWER, since the page sizes are communicated as CPU property during boot, i.e. it can not be communicated to the guest for memory regions that are added later. On x86, we do not have this problem since page sizes are not communicated as property of the CPU, as far as I know. So if you want to hot plug memory in this case, you have got to use a "memory-backend-file" object with "mem-path=/mnt/hugetlbfs" property instead. The only thing that we could do here is to avoid the crash by refusing to create a "memory-backend-ram" object in this case, and inform the user with an appropriate error message instead. I'll have a look at that next.
(In reply to Thomas Huth from comment #6) > So if you want to hot plug memory in this case, you have got to use a > "memory-backend-file" object with "mem-path=/mnt/hugetlbfs" property > instead. The only thing that we could do here is to avoid the crash by > refusing to create a "memory-backend-ram" object in this case, and inform > the user with an appropriate error message instead. I'll have a look at that > next. I do not see any better solution either.
I've now suggested a patch upstream: http://marc.info/?i=1487150504-30335-1-git-send-email-thuth@redhat.com
Patch has been merged upstream: http://git.qemu-project.org/?p=qemu.git;a=commitdiff;h=df58713396f8b2deb923e39c00b10744c5c63909 We'll get it via rebase to 2.9, so setting the state to POST now.
Considering comment6 and the current test results by terms of the following builds,QE verified the bug. kernel-3.10.0-655.el7.ppc64le (guest and host) qemu-kvm-rhev-2.9.0-1.el7.ppc64le SLOF-20170303-1.git66d250e.el7.noarch Steps,please refer to comment 0 Actual results,an error messages pops up comparing with its before,it is much more friendly and accurately. (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 Memory backend has bad page size. Use 'memory-backend-file' with correct mem-path. (qemu) Expected results, An proper error message should pop up. The issue has been fixed through providing an error message so far,it is acceptable.So QE move the bug to status verified.Thanks a lot.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392