Bug 1304300 - [ppc64le] Failed to start VM: Failed to allocate HTAB of requested size
Summary: [ppc64le] Failed to start VM: Failed to allocate HTAB of requested size
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 3.6.0.3
Hardware: ppc64le
OS: Unspecified
medium
high
Target Milestone: ovirt-4.3.0
: ---
Assignee: Michal Skrivanek
QA Contact: Liran Rotenberg
Rolfe Dlugy-Hegwer
URL:
Whiteboard:
Depends On: 1282833 1304346
Blocks: 1284775 1305498 1444027
TreeView+ depends on / blocked
 
Reported: 2016-02-03 09:45 UTC by Israel Pinto
Modified: 2019-01-22 15:16 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-09-18 12:47:46 UTC
oVirt Team: Virt
Embargoed:
michal.skrivanek: ovirt-4.3?
rule-engine: planning_ack?
ipinto: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
engine_log (64.19 KB, application/zip)
2016-02-03 09:47 UTC, Israel Pinto
no flags Details
vdsm_log (948.32 KB, application/zip)
2016-02-03 09:52 UTC, Israel Pinto
no flags Details
libvirt_log (2.36 MB, application/zip)
2016-02-03 10:09 UTC, Israel Pinto
no flags Details

Description Israel Pinto 2016-02-03 09:45:45 UTC
Description of problem:
Failed to start VM on PPC twice with error:
libvirtError: internal error: process exited while connecting to monitor: 2016-02-03T08:52:53.899885Z qemu-kvm: Failed to allocate HTAB of requested size, try with smaller maxmem



Version-Release number of selected component (if applicable):
RHEVM Version: 3.6.3-0.1.el6
vdsm-4.17.19-0.el7ev
libvirt-1.2.17-13.el7_2.3


How reproducible:
happened twice 

Steps to Reproduce:
Start VM

Actual results:
Failed to start VM

Additional info:
Engine log:
2016-02-03 10:53:13,236 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-37) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM golden_env_mixed_virtio_1_1 is down with error. Exit message: internal error: process exited while connecting to monitor: 2016-02-03T08:52:53.899885Z qemu-kvm: Failed to allocate HTAB of requested size, try with smaller maxmem

vdsm log:
Thread-37::DEBUG::2016-02-03 03:52:54,414::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-37::DEBUG::2016-02-03 03:52:54,414::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-127 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36090a06890b406919966e5b5232bccb6|/dev/mapper/36090a06890b46688996685b5232bdc38|/dev/mapper/36090a06890b4b68d9966b5b5232b0cc0|/dev/mapper/360a98000324669436c2b45666c594b48|/dev/mapper/360a98000324669436c2b45666c594b4a|/dev/mapper/360a98000324669436c2b45666c594b4c|/dev/mapper/360a98000324669436c2b45666c594b4e|/dev/mapper/360a98000324669436c2b45666c594b50|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 0eb900dc-f4fe-4972-b728-ca67bf051dde (cwd None)
Thread-27415::ERROR::2016-02-03 03:52:54,557::vm::759::virt.vm::(_startUnderlyingVm) vmId=`a2d23419-e7f6-45d9-b97c-a87bd2c3cd3d`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 703, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1941, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3611, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error: process exited while connecting to monitor: 2016-02-03T08:52:53.899885Z qemu-kvm: Failed to allocate HTAB of requested size, try with smaller maxmem

Thread-27415::INFO::2016-02-03 03:52:54,559::vm::1330::virt.vm::(setDownStatus) vmId=`a2d23419-e7f6-45d9-b97c-a87bd2c3cd3d`::Changed state to Down: internal error: process exited while connecting to monitor: 2016-02-03T08:52:53.899885Z qemu-kvm: Failed to allocate HTAB of requested size, try with smaller maxmem
 (code=1)

Comment 1 Israel Pinto 2016-02-03 09:47:40 UTC
Created attachment 1120706 [details]
engine_log

Comment 2 Israel Pinto 2016-02-03 09:52:06 UTC
Created attachment 1120708 [details]
vdsm_log

Comment 3 Israel Pinto 2016-02-03 10:09:10 UTC
Created attachment 1120709 [details]
libvirt_log

Comment 4 Israel Pinto 2016-02-03 11:39:38 UTC
update with version:
KVM: 2.3.0 - 31.el7_2.7
kernel: 3.10.0 - 327.10.1.el7.ppc64le

Comment 5 Israel Pinto 2016-02-03 12:54:07 UTC
Note that memory hotplug is enabled

Comment 6 Michal Skrivanek 2016-02-03 14:30:02 UTC
(In reply to Israel Pinto from comment #5)
> Note that memory hotplug is enabled

this should behave the same with hotplug disabled if you would use large enough guests
The root cause is https://bugzilla.redhat.com/show_bug.cgi?id=1282833#c16
and there is currently no good solution on RHEV side, so this need to be treated as a known issue until it's resolved

Comment 7 Yaniv Kaul 2016-02-04 17:58:33 UTC
(In reply to Michal Skrivanek from comment #6)
> (In reply to Israel Pinto from comment #5)
> > Note that memory hotplug is enabled
> 
> this should behave the same with hotplug disabled if you would use large
> enough guests
> The root cause is https://bugzilla.redhat.com/show_bug.cgi?id=1282833#c16

So this bug should depend on bug 1282833 ?
And move away from 3.6.4 please.

> and there is currently no good solution on RHEV side, so this need to be
> treated as a known issue until it's resolved

Comment 8 Michal Skrivanek 2016-02-08 12:30:08 UTC
real fix is tracked in bug 1284775
moving to 4.0 as the bugs 1284775 depends on are heavily in progress and won't be ready before RHEL 7.3

Comment 9 Moran Goldboim 2016-03-24 08:32:04 UTC
deferred to next version due to dependency on platform bug.

Comment 12 Michal Skrivanek 2018-09-18 12:47:46 UTC
this has been fixed by 4.2 GA by moving to pseries-7.5.0 machine type

Comment 13 Rolfe Dlugy-Hegwer 2019-01-21 18:03:47 UTC
Hi Michal. Please review and update the Doc Text as you see fit. In particular, please review the speculative statement, "The current release fixes this issue by accounting for the overhead when scheduling a virtual machine that has a large guest operating system.", which I added based on comment#12.

Comment 14 Michal Skrivanek 2019-01-22 12:44:02 UTC
The doc text is correct about the past state, but the fix is different.
it was actually solved by dynamic hash page table resizing (bug 1305398 and bug 1305399 in qemu/kernel).


Note You need to log in before you can comment on or make changes to this bug.