Bug 1304300

Summary: [ppc64le] Failed to start VM: Failed to allocate HTAB of requested size
Product: [oVirt] ovirt-engine Reporter: Israel Pinto <ipinto>
Component: BLL.VirtAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED CURRENTRELEASE QA Contact: Liran Rotenberg <lrotenbe>
Severity: high Docs Contact: Rolfe Dlugy-Hegwer <rdlugyhe>
Priority: medium    
Version: 3.6.0.3CC: bugs, hannsj_uhl, mavital, mgoldboi, michal.skrivanek, rdlugyhe
Target Milestone: ovirt-4.3.0Flags: michal.skrivanek: ovirt-4.3?
rule-engine: planning_ack?
ipinto: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
Large guest operating systems have a significant overhead on the host. The host requires a consecutive non-swapped block of memory that is 1/128th of the virtual machine's memory size. Previously, this overhead was not accounted for when scheduling the virtual machine. If the memory requirement was not satisfied, the virtual machine failed to start with an error message similar to this one: "libvirtError: internal error: process exited while connecting to monitor: ... qemu-kvm: Failed to allocate HTAB of requested size, try with smaller maxmem" The current release fixes this issue by using dynamic hash page table resizing.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-18 12:47:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1282833, 1304346    
Bug Blocks: 1284775, 1305498, 1444027    
Attachments:
Description Flags
engine_log
none
vdsm_log
none
libvirt_log none

Description Israel Pinto 2016-02-03 09:45:45 UTC
Description of problem:
Failed to start VM on PPC twice with error:
libvirtError: internal error: process exited while connecting to monitor: 2016-02-03T08:52:53.899885Z qemu-kvm: Failed to allocate HTAB of requested size, try with smaller maxmem



Version-Release number of selected component (if applicable):
RHEVM Version: 3.6.3-0.1.el6
vdsm-4.17.19-0.el7ev
libvirt-1.2.17-13.el7_2.3


How reproducible:
happened twice 

Steps to Reproduce:
Start VM

Actual results:
Failed to start VM

Additional info:
Engine log:
2016-02-03 10:53:13,236 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-37) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM golden_env_mixed_virtio_1_1 is down with error. Exit message: internal error: process exited while connecting to monitor: 2016-02-03T08:52:53.899885Z qemu-kvm: Failed to allocate HTAB of requested size, try with smaller maxmem

vdsm log:
Thread-37::DEBUG::2016-02-03 03:52:54,414::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-37::DEBUG::2016-02-03 03:52:54,414::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-127 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36090a06890b406919966e5b5232bccb6|/dev/mapper/36090a06890b46688996685b5232bdc38|/dev/mapper/36090a06890b4b68d9966b5b5232b0cc0|/dev/mapper/360a98000324669436c2b45666c594b48|/dev/mapper/360a98000324669436c2b45666c594b4a|/dev/mapper/360a98000324669436c2b45666c594b4c|/dev/mapper/360a98000324669436c2b45666c594b4e|/dev/mapper/360a98000324669436c2b45666c594b50|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 0eb900dc-f4fe-4972-b728-ca67bf051dde (cwd None)
Thread-27415::ERROR::2016-02-03 03:52:54,557::vm::759::virt.vm::(_startUnderlyingVm) vmId=`a2d23419-e7f6-45d9-b97c-a87bd2c3cd3d`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 703, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1941, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3611, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error: process exited while connecting to monitor: 2016-02-03T08:52:53.899885Z qemu-kvm: Failed to allocate HTAB of requested size, try with smaller maxmem

Thread-27415::INFO::2016-02-03 03:52:54,559::vm::1330::virt.vm::(setDownStatus) vmId=`a2d23419-e7f6-45d9-b97c-a87bd2c3cd3d`::Changed state to Down: internal error: process exited while connecting to monitor: 2016-02-03T08:52:53.899885Z qemu-kvm: Failed to allocate HTAB of requested size, try with smaller maxmem
 (code=1)

Comment 1 Israel Pinto 2016-02-03 09:47:40 UTC
Created attachment 1120706 [details]
engine_log

Comment 2 Israel Pinto 2016-02-03 09:52:06 UTC
Created attachment 1120708 [details]
vdsm_log

Comment 3 Israel Pinto 2016-02-03 10:09:10 UTC
Created attachment 1120709 [details]
libvirt_log

Comment 4 Israel Pinto 2016-02-03 11:39:38 UTC
update with version:
KVM: 2.3.0 - 31.el7_2.7
kernel: 3.10.0 - 327.10.1.el7.ppc64le

Comment 5 Israel Pinto 2016-02-03 12:54:07 UTC
Note that memory hotplug is enabled

Comment 6 Michal Skrivanek 2016-02-03 14:30:02 UTC
(In reply to Israel Pinto from comment #5)
> Note that memory hotplug is enabled

this should behave the same with hotplug disabled if you would use large enough guests
The root cause is https://bugzilla.redhat.com/show_bug.cgi?id=1282833#c16
and there is currently no good solution on RHEV side, so this need to be treated as a known issue until it's resolved

Comment 7 Yaniv Kaul 2016-02-04 17:58:33 UTC
(In reply to Michal Skrivanek from comment #6)
> (In reply to Israel Pinto from comment #5)
> > Note that memory hotplug is enabled
> 
> this should behave the same with hotplug disabled if you would use large
> enough guests
> The root cause is https://bugzilla.redhat.com/show_bug.cgi?id=1282833#c16

So this bug should depend on bug 1282833 ?
And move away from 3.6.4 please.

> and there is currently no good solution on RHEV side, so this need to be
> treated as a known issue until it's resolved

Comment 8 Michal Skrivanek 2016-02-08 12:30:08 UTC
real fix is tracked in bug 1284775
moving to 4.0 as the bugs 1284775 depends on are heavily in progress and won't be ready before RHEL 7.3

Comment 9 Moran Goldboim 2016-03-24 08:32:04 UTC
deferred to next version due to dependency on platform bug.

Comment 12 Michal Skrivanek 2018-09-18 12:47:46 UTC
this has been fixed by 4.2 GA by moving to pseries-7.5.0 machine type

Comment 13 Rolfe Dlugy-Hegwer 2019-01-21 18:03:47 UTC
Hi Michal. Please review and update the Doc Text as you see fit. In particular, please review the speculative statement, "The current release fixes this issue by accounting for the overhead when scheduling a virtual machine that has a large guest operating system.", which I added based on comment#12.

Comment 14 Michal Skrivanek 2019-01-22 12:44:02 UTC
The doc text is correct about the past state, but the fix is different.
it was actually solved by dynamic hash page table resizing (bug 1305398 and bug 1305399 in qemu/kernel).