Bug 1522878
Summary: | GetCapabilitiesVDS failed: Not enough resources | ||
---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Evgheni Dereveanchin <ederevea> |
Component: | General | Assignee: | Milan Zamazal <mzamazal> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Israel Pinto <ipinto> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.20.9 | CC: | bugs, ederevea, fromani, michal.skrivanek, pkliczew |
Target Milestone: | ovirt-4.2.0 | Flags: | rule-engine:
ovirt-4.2+
rule-engine: blocker+ |
Target Release: | 4.20.9.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-02-12 10:09:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Evgheni Dereveanchin
2017-12-06 16:29:32 UTC
I can reproduce this by migrating a few more VMs to the host so it does seem related to live migration. On a side note, attempts to migrate VMs off vdsm-4.20.9 back to vdsm-4.19.31 cause VMs to crash but that's probably material for another BZ It seems to be related to well known mom issue: 2017-12-06 16:07:42,853+0000 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/4 running <Task <JsonRpcTask {'params': {}, 'jsonrpc': '2.0', 'method': u'Host.getAllVmIoTunePolicies', 'id': u'55585710-3120-4d94-86c8-3383db23f794'} at 0x3f17990> timeout=60, duration=60 at 0x3f17290> task#=930 at 0x3233290> (executor:358) yes, but supposedly that's happening due to the worker queue being full of DriveWatermarkMonitors <Executor periodic workers=4 max_workers=30 <TaskQueue periodic max_tasks=400 tasks(400) Could be a regression in high watermark events, might be also related to bug 1522901 as a trigger. Evgheni, you can try to rule out or confirm the watermark changes by flipping the enable_block_threshold_event option in vdsm.conf to false (In reply to Michal Skrivanek from comment #4) > yes, but supposedly that's happening due to the worker queue being full of > DriveWatermarkMonitors I remember asking for a feature to dump all workers when the Q is full... No idea where it is. > > <Executor periodic workers=4 max_workers=30 <TaskQueue periodic > max_tasks=400 tasks(400) > > Could be a regression in high watermark events, might be also related to bug > 1522901 as a trigger. Evgheni, you can try to rule out or confirm the > watermark changes by flipping the enable_block_threshold_event option in > vdsm.conf to false (In reply to Michal Skrivanek from comment #4) > Could be a regression in high watermark events, might be also related to bug > 1522901 as a trigger. Evgheni, you can try to rule out or confirm the > watermark changes by flipping the enable_block_threshold_event option in > vdsm.conf to false I set enable_block_threshold_event = false in [vars] of vdsm.conf and sent an inbound live migration to force a restart of VDSM on the host. I assume at this point it would re-read the config file. In any case, nothing seems to have changed since after the host came back up, incoming migrations still cause VDSM restarts. Patch applied to affected host. Incoming migration no longer causes VDSM to restart. (In reply to Yaniv Kaul from comment #5) > (In reply to Michal Skrivanek from comment #4) > > yes, but supposedly that's happening due to the worker queue being full of > > DriveWatermarkMonitors > > I remember asking for a feature to dump all workers when the Q is full... > No idea where it is. So that's there, in the log you can see it's full of DriveWatermarkMonitor threads (In reply to Michal Skrivanek from comment #10) > (In reply to Yaniv Kaul from comment #5) > > (In reply to Michal Skrivanek from comment #4) > > > yes, but supposedly that's happening due to the worker queue being full of > > > DriveWatermarkMonitors > > > > I remember asking for a feature to dump all workers when the Q is full... > > No idea where it is. > > So that's there, in the log you can see it's full of DriveWatermarkMonitor > threads Yep: https://gerrit.ovirt.org/#/c/81624/ Verify with: Engine:4.2.1.3-0.1.el7 Host 4.1: OS Version:RHEL - 7.4 - 18.el7 Kernel Version:3.10.0 - 693.17.1.el7.x86_64 KVM Version:2.9.0 - 16.el7_4.13.1 LIBVIRT Version:libvirt-3.2.0-14.el7_4.7 VDSM Version:vdsm-4.20.17-1.el7ev Host 4.2: OS Version:RHEL - 7.4 - 18.el7 Kernel Version:3.10.0 - 693.el7.x86_64 KVM Version:2.9.0 - 16.el7_4.14 LIBVIRT Version:libvirt-3.2.0-14.el7_4.9 VDSM Version:vdsm-4.19.45-1.el7ev Steps: Migrate VM from 4.1 host to 4.2 host. Note: I found https://bugzilla.redhat.com/show_bug.cgi?id=1542117 while migrate vm. This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |