Bug 1282833

Summary: [PPC64LE] Guest freezes if qemu allocates smaller page table than requested
Product: Red Hat Enterprise Linux 7 Reporter: Shira Maximov <mshira>
Component: qemu-kvm-rhevAssignee: David Gibson <dgibson>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: bugproxy, bugs, dgibson, fnovak, gklein, hannsj_uhl, istein, jkurik, knoel, lance, michal.skrivanek, michen, mrezanin, mshira, qzhang, snagar, virt-maint, xuhan, xuma, zhengtli
Target Milestone: pre-dev-freezeKeywords: Automation, ZStream
Target Release: 7.3   
Hardware: ppc64le   
OS: Linux   
Whiteboard: virt
Fixed In Version: qemu-kvm-rhev-2.5.0-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1284775 1285337 (view as bug list) Environment:
Last Closed: 2016-11-07 21:37:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1284775, 1285337, 1285474, 1288337, 1304300, 1305498, 1359843    
Attachments:
Description Flags
too much memory
none
vdsm_logs
none
vdsm logs none

Description Shira Maximov 2015-11-17 15:00:54 UTC
Created attachment 1095502 [details]
too much memory

Description of problem:
when vm memory is set to 61GB for example, the VM can't start

Version-Release number of selected component (if applicable):
 Red Hat Enterprise Virtualization Manager Version: 3.6.0.3-0.1.el6 

How reproducible:
100&

Steps to Reproduce:
1.(have a ppc env and host with at least 60 GB memory ) create vm with 2 GB make sure the vm is running 
2.shutdown the vm and set the memroy to 60 GB
3. try to run the vm 

Actual results:
the vm can't run

Expected results:
the vm should run 

Additional info:
print screen is attached

Comment 1 Michal Skrivanek 2015-11-18 11:18:07 UTC
the screenshot's memory layout seems to indicate 8GB? not sure if ram_top is the right thing to look at...if it is, shouldn't it be enough anyway?
David, thoughts?

Comment 2 Yaniv Kaul 2015-11-18 19:18:43 UTC
Shira - is this a RHEV or KVM bug? 
Does it work with smaller sizes? How small?
Where is the VDSM log, to at least see the command line used to launch the VM?

Comment 3 David Gibson 2015-11-19 00:21:04 UTC
There's not much I can tell from that screenshot alone.  An awful lot happens between "Returning from prom_init" and the next message.

The messages do seem to show 8GiB of RAM, which should be fine.

A copy of the qemu command line for the failing run would probably be more useful.

Comment 4 Shira Maximov 2015-11-19 10:03:10 UTC
Created attachment 1096575 [details]
vdsm_logs

the accurate size of the vm is : 62464 MB ( you can search the logs this string) 

you can also look at this automation run and download the other logs :
https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/3.6_Dev/job/3.6-GE-compute/412/

Comment 5 Shira Maximov 2015-11-19 10:08:44 UTC
Yaniv,

1. VM can run on smaller memory sizes ( for example 4G)
2. I attached the vdsm logs now
3. i can't find the command now, i will reproduce the bug and post the command as soon as i will have the PPC env again

Comment 6 Shira Maximov 2015-11-19 11:40:47 UTC
Update -
I tried to reproduce the bug again of different PPC environment, and this time the VM succeeded to run. 

I need to further investigate this bug and due to lack of PPC environments it can't be done now. 
I will update again soon.

Comment 7 David Gibson 2015-11-20 04:51:30 UTC
I'm having trouble making any sense of the vdsm log.  I've searched for '62464' as suggested in comment 4, but all I'm finding are MigrationCreate entries, no initial VM creation commands, and no corresponding libvirt XML or qemu command lines.

What should I be looking for in the vdsm log to find where the failing instance is happening.

Comment 8 Shira Maximov 2015-11-23 12:37:07 UTC
Update : 
I re-run the test and found that if I'm running one vm with 60GB memory, the VM can run fine, but if I'm running two (or more) VMs then the second VM can't run and gets the error.. (in the print screen)


the vm command : 
qemu      28760      1 99 07:26 ?        00:04:48 /usr/libexec/qemu-kvm -name mom-3 -S -machine pseries-rhel7.2.0,accel=kvm,usb=off -cpu POWER8 -m size=65011712k,slots=16,maxmem=1073741824k -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=63488 -uuid 70603690-83a2-4345-aaf2-57947f6ee7a1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-mom-3/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2015-11-23T12:26:31,driftfix=slew -no-shutdown -boot strict=on -device spapr-vscsi,id=scsi0,reg=0x2000 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x3 -usb -drive if=none,id=drive-scsi0-0-0-0,readonly=on,format=raw,serial= -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -drive file=/rhev/data-center/e13ff945-73ee-48f2-a299-c8c7f0bcd49e/593927a4-3341-4034-9e00-97cb04351b7a/images/2309c586-fac3-4567-8c4b-c5345e22616d/b0ee8a4c-3be7-4291-9cea-0f6b5ecd005a,if=none,id=drive-virtio-disk0,format=qcow2,serial=2309c586-fac3-4567-8c4b-c5345e22616d,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:58,bus=pci.0,addr=0x1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/70603690-83a2-4345-aaf2-57947f6ee7a1.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/70603690-83a2-4345-aaf2-57947f6ee7a1.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -device usb-kbd,id=input1 -device usb-mouse,id=input2 -vnc 10.16.160.29:1,password -device VGA,id=video0,vgamem_mb=32,bus=pci.0,addr=0x6 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on


I will attach a newer logs

Comment 9 Shira Maximov 2015-11-23 12:48:26 UTC
Created attachment 1097639 [details]
vdsm logs

Comment 10 Shira Maximov 2015-11-23 12:49:06 UTC
for the logs : (vm_id:  70603690-83a2-4345-aaf2-57947f6ee7a1)

Comment 11 Michal Skrivanek 2015-11-23 15:16:53 UTC
note -m size=65011712k,slots=16,maxmem=1073741824k

Comment 12 Shira Maximov 2015-11-23 16:20:57 UTC
I tested again this bug, this time with disabling hot plug, 
and it worked fine. 

steps to reproduce :
1. create two VMs that are pinned to a specific host and each have 60 GB memory
2. turn on both and see that one can't run 
3. disable the hot plug : 

engine=# insert into vdc_options (option_name, option_value, version)  VALUES ('HotPlugMemorySupported', '{"x86_64":"true","ppc64":"false"}' ,'3.6');
INSERT 0 1
engine=# select * from vdc_options where option_name ='HotPlugMemorySupported';
 option_id |      option_name       |            option_value            | version 
-----------+------------------------+------------------------------------+---------
       178 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.0
       179 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.1
       180 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.2
       181 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.3
       182 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.4
       183 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.5
       840 | HotPlugMemorySupported | {"x86_64":"true","ppc64":"false"}  | 3.6
(7 rows)

4. restart the engine service 
5. stop the vm and try to run them again

Comment 13 Qunfang Zhang 2015-11-24 06:00:21 UTC
I can reproduce this bug but the behaviours are different on different hosts.

1. On host A:

kernel-3.10.0-327.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7.ppc64le (The host is using by others and I don't update the packages)

Host: 256G mem

Test scenarios:

Boot a VM1 first and it boots up successfully. Then start booting VM2 and check whether VM2 succeeds.

VM1: -m 60G,slots=16,maxmem=1024G 

1) VM2: -m 60G,slots=16,maxmem=1024G ==> Reproduced 
2) VM2:  -m 4G,slots=4,maxmem=1024G  ==> Pass
3) VM2: -m 16G,slots=4,maxmem=1024G  ==> Pass
4) VM2 :  -m 32G,slots=4,maxmem=1024G  ==> Pass
5)VM1: -m 60G  VM2: -m 60G  ==> Pass


2. On host B:

kernel-3.10.0-327.2.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.3.ppc64le

Host: 128G mem

(1)  Only boot VM1:  -m 60G,slots=16,maxmem=1024G  ==> Reproduced
(2)  Only VM1:        -m 60G,slots=16,maxmem=512G ==> Pass
(3)
VM1:  -m 32G,slots=4,maxmem=1024G
VM2:   -m 32G,slots=4,maxmem=1024G  ==> Pass

Comment 14 Qunfang Zhang 2015-11-24 06:04:05 UTC
 /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 60G,slots=16,maxmem=1024G -smp 1,sockets=1,cores=1,threads=1 -uuid 1212b7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -serial stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=rhel7.2-virtio_blk-le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -qmp tcp:0:6666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5b
VNC server running on `::1:5900'


SLOF **********************************************************************
QEMU Starting
 Build Date = Sep 18 2015 06:25:39
 FW Version = mockbuild@ release 20150313
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/v-scsi@1000
       SCSI: Looking for devices
          8000000000000000 CD-ROM   : "QEMU     QEMU CD-ROM      2.3."
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
                     00 1000 (D) : 1af4 1000    virtio [ net ]
                     00 0800 (D) : 1af4 1001    virtio [ block ]
                     00 0000 (D) : 106b 003f    serial bus [ usb-ohci ]
No NVRAM common partition, re-initializing...
Scanning USB 
  OHCI: initializing
Using default console: /vdevice/vty@71000000
     
  Welcome to Open Firmware

  Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /pci@800000020000000/scsi@1 ...   Successfully loaded





      Red Hat Enterprise Linux Server (3.10.0-326.el7.ppc64le) 7.2 (Maipo)      
      Red Hat Enterprise Linux Server (3.10.0-316.el7.ppc64le) 7.2 (Maipo)     
      Red Hat Enterprise Linux Server (0-rescue-2335bae863a843baa1bfb8503ab94e>
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
                                                                                

      Use the ^ and v keys to change the selection.                       
      Press 'e' to edit the selected item, or 'c' for a command prompt.   
   The selected entry will be started automatically in 0s.                     



OF stdout device is: /vdevice/vty@71000000
Preparing to boot Linux version 3.10.0-326.el7.ppc64le (mockbuild.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Fri Oct 23 11:14:00 EDT 2015
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/vmlinuz-3.10.0-326.el7.ppc64le root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 0000000005500000
  alloc_top    : 0000000030000000
  alloc_top_hi : 0000000200000000
  rmo_top      : 0000000030000000
  ram_top      : 0000000200000000
instantiating rtas at 0x000000002fff0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000005510000 -> 0x0000000005510a5a
Device tree struct  0x0000000005520000 -> 0x0000000005560000
Calling quiesce...
returning from prom_init

Comment 15 David Gibson 2015-11-25 00:20:09 UTC
Qunfang,

Thanks for reproducing this with pure qemu invocations.  I've now reproduced myself and am investigating.

Comment 16 David Gibson 2015-11-25 03:16:10 UTC
I believe I've located the problem.  The second VM is getting a much smaller hash page table than qemu requests - 16MiB instead of 8GiB.  The 60GiB of actual memory is too much to map into the smaller hash table - we run out of slots for the kernel's linear mapping of memory.

We try to size the hash table as 1/128th of the max memory size (rounded up to a power of 2), so a 60GiB VM wants a 512Mib hash table - 16MiB is just not enough.

The trouble is that the hash table is unswappable, contiguous host kernel memory, and so the first VM has pretty much consumed the only big contiguous chunk we can get.

Unfortunately, it's really hard to see what we can do about this - the RHEV approach of setting a huge maxmem value whenever memory hotplug is enabled is pretty much fundamentally problematic on Power.  Apart from this problem it means that every VM, even a 1 or 2 GiB one, with memory hotplug enabled will consume an additional 8GiB of host memory.

Comment 17 David Gibson 2015-11-25 03:20:51 UTC
In the short term, I can backport Bharata's patches which will make qemu abort if it doesn't get a large enough hash table from the host kernel.  That won't help the VM run, of course, but it should at least give us better error reporting.

Comment 19 IBM Bug Proxy 2015-11-30 15:21:56 UTC
------- Comment From fnovak.com 2015-11-30 15:13 EDT-------
reverse mirror of RHBZ 1282833 - [PPC64LE] Guest freezes if qemu allocates smaller page table than requested

Comment 20 David Gibson 2015-12-01 03:05:43 UTC
*** Bug 1285474 has been marked as a duplicate of this bug. ***

Comment 22 Qunfang Zhang 2016-06-03 07:17:58 UTC
Reproduced this bug with comment 13 and comment 14 steps with the following version:

kernel-3.10.0-418.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.8.ppc64le.rpm

Boot up VM1 with -m 60G,slots=16,maxmem=1024G, and boot up VM2 with -m *G,slots=*,maxmem=1024.  Then VM2 will freeze at the point in comment 14. 

Verified this issue with qemu-kvm-rhev-2.6.0-4.el7.ppc64le:

First, still boot up VM1 with "-m 60G,slots=16,maxmem=1024G". Guest boots up successfully.

Then Boot up VM2 with "-m *G,slots=*,maxmem=1024", it prompts:

#  /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 4G,slots=16,maxmem=1024G -smp 1,sockets=1,cores=1,threads=1 -uuid 1818b7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=vm2.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -qmp tcp:0:6688,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:52 
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) VNC server running on '::1;5901'
2016-06-03T06:51:04.413108Z qemu-kvm: Failed to allocate KVM HPT of order 33 (try smaller maxmem?): Cannot allocate memory
2016-06-03T06:51:04.441339Z qemu-kvm: network script /etc/qemu-ifdown failed with status 256

Hi, David

Is this the expected result for this bug?  

Thanks!

Comment 23 David Gibson 2016-06-06 02:45:52 UTC
Hi qzhang,

Yes, that's the expected result, the guest fails to start with the error message about allocating an HPT.

Comment 24 Qunfang Zhang 2016-06-06 02:47:59 UTC
Okay, thanks David.

Comment 26 errata-xmlrpc 2016-11-07 21:37:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html