Bug 1660451
| Summary: | Executor queue can get full if vm.destroy takes some time to complete | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | nijin ashok <nashok> |
| Component: | vdsm | Assignee: | Milan Zamazal <mzamazal> |
| Status: | CLOSED ERRATA | QA Contact: | Guilherme Santos <gdeolive> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.2.7 | CC: | fromani, gveitmic, lleistne, lsurette, michal.skrivanek, mtessun, myllynen, mzamazal, nashok, oliver.albl, rbarry, rdlugyhe, srevivo, ycui |
| Target Milestone: | ovirt-4.3.6 | Keywords: | Rebase, ZStream |
| Target Release: | 4.3.6 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | vdsm-4.30.29 | Doc Type: | Bug Fix |
| Doc Text: |
Previously, some virtual machine operations unnecessarily blocked other VM operations. This led to some problems with monitoring while shutting down large virtual machines. The current release fixes these issues: It relaxes some conditions for blocking virtual machine operations and makes blocking safer. This should reduce monitoring problems experienced in some scenarios.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-10 15:36:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
nijin ashok
2018-12-18 10:57:29 UTC
*** Bug 1660452 has been marked as a duplicate of this bug. *** Thanks for the bug Nijin, this behaviour can indeed be triggered by real large VM, so better address it. As you noticed, Vdsm is supposed to recover cleanly from those cases, but there is a room for improvement indeed. From Vdsm perspective to suspend/stop monitoring while the Vm is shutting down makes sense. There are two caveats: 1. we need to make sure Engine is OK and can cope with that 2. monitoring is quite a performance-sensitive area, for Vdsm (for various technical reasons I can elaborate). We should be careful adding checks there, and try hard to assess the performance impact of them. The real issue is #1. We can solve #2 if Engine can cope with this change of behaviour. Nijin, could you please file a bug against libvirt(/qemu?) to inquiry if the behaviour described in https://bugzilla.redhat.com/show_bug.cgi?id=1660451#c0 could be improved? Maybe it is an unfixable design quirk, maybe it could be improved in the future. (In reply to Francesco Romani from comment #4) > Nijin, could you please file a bug against libvirt(/qemu?) to inquiry if the > behaviour described in > https://bugzilla.redhat.com/show_bug.cgi?id=1660451#c0 could be improved? > Maybe it is an unfixable design quirk, maybe it could be improved in the > future. I have opened bug 1663859. Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both I can't reproduce the bug using the scenario from the bug description or another way. There is also no vdsm.log to check what all blocks in the executor. The sosreport I looked into is failing on something that was fixed before 4.20.39, so it's not relevant. Nevertheless I tried to emulate the problem artificially and some periodic tasks indeed produce blocked tasks in the executor. Adding check for self._monitorable to Vm.isDomainReadyForCommands should remedy the problem. Unless a blocked QEMU monitor blocks all libvirt calls, not just for the given VM. I can't check that without a reproducer. Another issue we need to take care of is vmContainerLock taken during the whole VM.destroy call. vmContainer locking has been improved. Since I can't reproduce the original problem, I could check only that it fixes an artificially induced blocking. It should fix the reported issue as well, unless QEMU performs a globally, rather than VM specific, locking operations. This must be checked in verification. So what are the proper verification steps please? Those described in Comment 0 would be if you can reproduce the problem, expected results would be that the executor keeps working rather than getting full (it can be examined in vdsm.log). I couldn't reproduce the original problem; please ask Nijin for help if you have also trouble to reproduce it. WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]
For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]
For more info please contact: rhv-devops
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]
For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:
[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]
For more info please contact: rhv-devops
Verified on: vdsm-4.30.30-1.el7ev.x86_64 Steps: 1. Run "stress -m 12 -i 4" (yum package) and wait until almost 100% of memory gets used 2. Migrate the VM and, during the migration, shutdown the VM. 3. Check the vdsm log. Results: No logs about executors being used for the action Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3009 |