Bug 1282833 - [PPC64LE] Guest freezes if qemu allocates smaller page table than requested
[PPC64LE] Guest freezes if qemu allocates smaller page table than requested
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
ppc64le Linux
high Severity high
: pre-dev-freeze
: 7.3
Assigned To: David Gibson
Virtualization Bugs
virt
: Automation, ZStream
: 1285474 (view as bug list)
Depends On:
Blocks: 1284775 1304300 RHV4.1PPC 1285337 1288337 1305498
  Show dependency treegraph
 
Reported: 2015-11-17 10:00 EST by Shira Maximov
Modified: 2016-11-30 20:16 EST (History)
20 users (show)

See Also:
Fixed In Version: qemu-kvm-rhev-2.5.0-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1284775 1285337 (view as bug list)
Environment:
Last Closed: 2016-11-07 16:37:12 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
too much memory (13.94 KB, image/png)
2015-11-17 10:00 EST, Shira Maximov
no flags Details
vdsm_logs (19.51 MB, text/plain)
2015-11-19 05:03 EST, Shira Maximov
no flags Details
vdsm logs (14.53 MB, text/plain)
2015-11-23 07:48 EST, Shira Maximov
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 133676 None None None Never

  None (edit)
Description Shira Maximov 2015-11-17 10:00:54 EST
Created attachment 1095502 [details]
too much memory

Description of problem:
when vm memory is set to 61GB for example, the VM can't start

Version-Release number of selected component (if applicable):
 Red Hat Enterprise Virtualization Manager Version: 3.6.0.3-0.1.el6 

How reproducible:
100&

Steps to Reproduce:
1.(have a ppc env and host with at least 60 GB memory ) create vm with 2 GB make sure the vm is running 
2.shutdown the vm and set the memroy to 60 GB
3. try to run the vm 

Actual results:
the vm can't run

Expected results:
the vm should run 

Additional info:
print screen is attached
Comment 1 Michal Skrivanek 2015-11-18 06:18:07 EST
the screenshot's memory layout seems to indicate 8GB? not sure if ram_top is the right thing to look at...if it is, shouldn't it be enough anyway?
David, thoughts?
Comment 2 Yaniv Kaul 2015-11-18 14:18:43 EST
Shira - is this a RHEV or KVM bug? 
Does it work with smaller sizes? How small?
Where is the VDSM log, to at least see the command line used to launch the VM?
Comment 3 David Gibson 2015-11-18 19:21:04 EST
There's not much I can tell from that screenshot alone.  An awful lot happens between "Returning from prom_init" and the next message.

The messages do seem to show 8GiB of RAM, which should be fine.

A copy of the qemu command line for the failing run would probably be more useful.
Comment 4 Shira Maximov 2015-11-19 05:03 EST
Created attachment 1096575 [details]
vdsm_logs

the accurate size of the vm is : 62464 MB ( you can search the logs this string) 

you can also look at this automation run and download the other logs :
https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/3.6_Dev/job/3.6-GE-compute/412/
Comment 5 Shira Maximov 2015-11-19 05:08:44 EST
Yaniv,

1. VM can run on smaller memory sizes ( for example 4G)
2. I attached the vdsm logs now
3. i can't find the command now, i will reproduce the bug and post the command as soon as i will have the PPC env again
Comment 6 Shira Maximov 2015-11-19 06:40:47 EST
Update -
I tried to reproduce the bug again of different PPC environment, and this time the VM succeeded to run. 

I need to further investigate this bug and due to lack of PPC environments it can't be done now. 
I will update again soon.
Comment 7 David Gibson 2015-11-19 23:51:30 EST
I'm having trouble making any sense of the vdsm log.  I've searched for '62464' as suggested in comment 4, but all I'm finding are MigrationCreate entries, no initial VM creation commands, and no corresponding libvirt XML or qemu command lines.

What should I be looking for in the vdsm log to find where the failing instance is happening.
Comment 8 Shira Maximov 2015-11-23 07:37:07 EST
Update : 
I re-run the test and found that if I'm running one vm with 60GB memory, the VM can run fine, but if I'm running two (or more) VMs then the second VM can't run and gets the error.. (in the print screen)


the vm command : 
qemu      28760      1 99 07:26 ?        00:04:48 /usr/libexec/qemu-kvm -name mom-3 -S -machine pseries-rhel7.2.0,accel=kvm,usb=off -cpu POWER8 -m size=65011712k,slots=16,maxmem=1073741824k -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=63488 -uuid 70603690-83a2-4345-aaf2-57947f6ee7a1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-mom-3/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2015-11-23T12:26:31,driftfix=slew -no-shutdown -boot strict=on -device spapr-vscsi,id=scsi0,reg=0x2000 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x3 -usb -drive if=none,id=drive-scsi0-0-0-0,readonly=on,format=raw,serial= -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -drive file=/rhev/data-center/e13ff945-73ee-48f2-a299-c8c7f0bcd49e/593927a4-3341-4034-9e00-97cb04351b7a/images/2309c586-fac3-4567-8c4b-c5345e22616d/b0ee8a4c-3be7-4291-9cea-0f6b5ecd005a,if=none,id=drive-virtio-disk0,format=qcow2,serial=2309c586-fac3-4567-8c4b-c5345e22616d,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:58,bus=pci.0,addr=0x1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/70603690-83a2-4345-aaf2-57947f6ee7a1.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/70603690-83a2-4345-aaf2-57947f6ee7a1.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -device usb-kbd,id=input1 -device usb-mouse,id=input2 -vnc 10.16.160.29:1,password -device VGA,id=video0,vgamem_mb=32,bus=pci.0,addr=0x6 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on


I will attach a newer logs
Comment 9 Shira Maximov 2015-11-23 07:48 EST
Created attachment 1097639 [details]
vdsm logs
Comment 10 Shira Maximov 2015-11-23 07:49:06 EST
for the logs : (vm_id:  70603690-83a2-4345-aaf2-57947f6ee7a1)
Comment 11 Michal Skrivanek 2015-11-23 10:16:53 EST
note -m size=65011712k,slots=16,maxmem=1073741824k
Comment 12 Shira Maximov 2015-11-23 11:20:57 EST
I tested again this bug, this time with disabling hot plug, 
and it worked fine. 

steps to reproduce :
1. create two VMs that are pinned to a specific host and each have 60 GB memory
2. turn on both and see that one can't run 
3. disable the hot plug : 

engine=# insert into vdc_options (option_name, option_value, version)  VALUES ('HotPlugMemorySupported', '{"x86_64":"true","ppc64":"false"}' ,'3.6');
INSERT 0 1
engine=# select * from vdc_options where option_name ='HotPlugMemorySupported';
 option_id |      option_name       |            option_value            | version 
-----------+------------------------+------------------------------------+---------
       178 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.0
       179 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.1
       180 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.2
       181 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.3
       182 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.4
       183 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.5
       840 | HotPlugMemorySupported | {"x86_64":"true","ppc64":"false"}  | 3.6
(7 rows)

4. restart the engine service 
5. stop the vm and try to run them again
Comment 13 Qunfang Zhang 2015-11-24 01:00:21 EST
I can reproduce this bug but the behaviours are different on different hosts.

1. On host A:

kernel-3.10.0-327.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7.ppc64le (The host is using by others and I don't update the packages)

Host: 256G mem

Test scenarios:

Boot a VM1 first and it boots up successfully. Then start booting VM2 and check whether VM2 succeeds.

VM1: -m 60G,slots=16,maxmem=1024G 

1) VM2: -m 60G,slots=16,maxmem=1024G ==> Reproduced 
2) VM2:  -m 4G,slots=4,maxmem=1024G  ==> Pass
3) VM2: -m 16G,slots=4,maxmem=1024G  ==> Pass
4) VM2 :  -m 32G,slots=4,maxmem=1024G  ==> Pass
5)VM1: -m 60G  VM2: -m 60G  ==> Pass


2. On host B:

kernel-3.10.0-327.2.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.3.ppc64le

Host: 128G mem

(1)  Only boot VM1:  -m 60G,slots=16,maxmem=1024G  ==> Reproduced
(2)  Only VM1:        -m 60G,slots=16,maxmem=512G ==> Pass
(3)
VM1:  -m 32G,slots=4,maxmem=1024G
VM2:   -m 32G,slots=4,maxmem=1024G  ==> Pass
Comment 14 Qunfang Zhang 2015-11-24 01:04:05 EST
 /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 60G,slots=16,maxmem=1024G -smp 1,sockets=1,cores=1,threads=1 -uuid 1212b7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -serial stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=rhel7.2-virtio_blk-le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -qmp tcp:0:6666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5b
VNC server running on `::1:5900'


SLOF **********************************************************************
QEMU Starting
 Build Date = Sep 18 2015 06:25:39
 FW Version = mockbuild@ release 20150313
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/v-scsi@1000
       SCSI: Looking for devices
          8000000000000000 CD-ROM   : "QEMU     QEMU CD-ROM      2.3."
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
                     00 1000 (D) : 1af4 1000    virtio [ net ]
                     00 0800 (D) : 1af4 1001    virtio [ block ]
                     00 0000 (D) : 106b 003f    serial bus [ usb-ohci ]
No NVRAM common partition, re-initializing...
Scanning USB 
  OHCI: initializing
Using default console: /vdevice/vty@71000000
     
  Welcome to Open Firmware

  Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /pci@800000020000000/scsi@1 ...   Successfully loaded





      Red Hat Enterprise Linux Server (3.10.0-326.el7.ppc64le) 7.2 (Maipo)      
      Red Hat Enterprise Linux Server (3.10.0-316.el7.ppc64le) 7.2 (Maipo)     
      Red Hat Enterprise Linux Server (0-rescue-2335bae863a843baa1bfb8503ab94e>
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                                

      Use the ^ and v keys to change the selection.                       
      Press 'e' to edit the selected item, or 'c' for a command prompt.   
   The selected entry will be started automatically in 0s.                     



OF stdout device is: /vdevice/vty@71000000
Preparing to boot Linux version 3.10.0-326.el7.ppc64le (mockbuild@ppc-028.build.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Fri Oct 23 11:14:00 EDT 2015
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/vmlinuz-3.10.0-326.el7.ppc64le root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 0000000005500000
  alloc_top    : 0000000030000000
  alloc_top_hi : 0000000200000000
  rmo_top      : 0000000030000000
  ram_top      : 0000000200000000
instantiating rtas at 0x000000002fff0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000005510000 -> 0x0000000005510a5a
Device tree struct  0x0000000005520000 -> 0x0000000005560000
Calling quiesce...
returning from prom_init
Comment 15 David Gibson 2015-11-24 19:20:09 EST
Qunfang,

Thanks for reproducing this with pure qemu invocations.  I've now reproduced myself and am investigating.
Comment 16 David Gibson 2015-11-24 22:16:10 EST
I believe I've located the problem.  The second VM is getting a much smaller hash page table than qemu requests - 16MiB instead of 8GiB.  The 60GiB of actual memory is too much to map into the smaller hash table - we run out of slots for the kernel's linear mapping of memory.

We try to size the hash table as 1/128th of the max memory size (rounded up to a power of 2), so a 60GiB VM wants a 512Mib hash table - 16MiB is just not enough.

The trouble is that the hash table is unswappable, contiguous host kernel memory, and so the first VM has pretty much consumed the only big contiguous chunk we can get.

Unfortunately, it's really hard to see what we can do about this - the RHEV approach of setting a huge maxmem value whenever memory hotplug is enabled is pretty much fundamentally problematic on Power.  Apart from this problem it means that every VM, even a 1 or 2 GiB one, with memory hotplug enabled will consume an additional 8GiB of host memory.
Comment 17 David Gibson 2015-11-24 22:20:51 EST
In the short term, I can backport Bharata's patches which will make qemu abort if it doesn't get a large enough hash table from the host kernel.  That won't help the VM run, of course, but it should at least give us better error reporting.
Comment 19 IBM Bug Proxy 2015-11-30 10:21:56 EST
------- Comment From fnovak@us.ibm.com 2015-11-30 15:13 EDT-------
reverse mirror of RHBZ 1282833 - [PPC64LE] Guest freezes if qemu allocates smaller page table than requested
Comment 20 David Gibson 2015-11-30 22:05:43 EST
*** Bug 1285474 has been marked as a duplicate of this bug. ***
Comment 22 Qunfang Zhang 2016-06-03 03:17:58 EDT
Reproduced this bug with comment 13 and comment 14 steps with the following version:

kernel-3.10.0-418.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.8.ppc64le.rpm

Boot up VM1 with -m 60G,slots=16,maxmem=1024G, and boot up VM2 with -m *G,slots=*,maxmem=1024.  Then VM2 will freeze at the point in comment 14. 

Verified this issue with qemu-kvm-rhev-2.6.0-4.el7.ppc64le:

First, still boot up VM1 with "-m 60G,slots=16,maxmem=1024G". Guest boots up successfully.

Then Boot up VM2 with "-m *G,slots=*,maxmem=1024", it prompts:

#  /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 4G,slots=16,maxmem=1024G -smp 1,sockets=1,cores=1,threads=1 -uuid 1818b7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=vm2.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -qmp tcp:0:6688,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:52 
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) VNC server running on '::1;5901'
2016-06-03T06:51:04.413108Z qemu-kvm: Failed to allocate KVM HPT of order 33 (try smaller maxmem?): Cannot allocate memory
2016-06-03T06:51:04.441339Z qemu-kvm: network script /etc/qemu-ifdown failed with status 256

Hi, David

Is this the expected result for this bug?  

Thanks!
Comment 23 David Gibson 2016-06-05 22:45:52 EDT
Hi qzhang,

Yes, that's the expected result, the guest fails to start with the error message about allocating an HPT.
Comment 24 Qunfang Zhang 2016-06-05 22:47:59 EDT
Okay, thanks David.
Comment 26 errata-xmlrpc 2016-11-07 16:37:12 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html

Note You need to log in before you can comment on or make changes to this bug.