Bug 1723881
Summary: | [RHOS-15] User configurable time out for instance destroy to prevent error: libvirtError: Failed to terminate process <pid> with SIGKILL: Device or resource busy | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Kashyap Chamarthy <kchamart> |
Component: | openstack-nova | Assignee: | Kashyap Chamarthy <kchamart> |
Status: | CLOSED ERRATA | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 15.0 (Stein) | CC: | dasmith, jhakimra, kchamart, lyarwood, mbooth, sbauza, sgordon, vromanso |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | 15.0 (Stein) | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | openstack-nova-19.0.2-0.20190701170413.b01bc2f.el8ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-09-21 11:23:34 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1636190, 1759125, 1789339 |
Description
Kashyap Chamarthy
2019-06-25 15:39:31 UTC
This is already merged in this commit upstream[*]: commit 10d50ca4e210039aeae84cb9bd5d18895948af54 Author: Kashyap Chamarthy <kchamart> Date: Mon Feb 25 13:26:24 2019 +0100 libvirt: Rework 'EBUSY' (SIGKILL) error handling code path Change ID I128bf6b939 (libvirt: handle code=38 + sigkill (ebusy) in _destroy()) handled the case where a QEMU process "refuses to die" within a given timeout period set by libvirt. Originally, libvirt sent SIGTERM (allowing the process to clean-up resources), then waited 10 seconds, if the guest didn't go away. Then it sent, the more lethal, SIGKILL and waited another 5 seconds for it to take effect. From libvirt v4.7.0 onwards, libvirt increased[1][2] the time it waits for a guest hard shutdown to complete. It now waits for 30 seconds for SIGKILL to work (instead of 5). Also, additional wait time is added if there are assigned PCI devices, as some of those tend to slow things down. In this change: - Increment the counter to retry the _destroy() call from 3 to 6, thus increasing the total time from 15 to 30 seconds, before SIGKILL takes effect. And it matches the (more graceful) behaviour of libvirt v4.7.0. This also gives breathing room for Nova instances running in environments with large compute nodes with high instance creation or delete churn, where the current timout may not be sufficient. - Retry the _destroy() API call _only_ if MIN_LIBVIRT_VERSION is lower than 4.7.0. [1] https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=9a4e4b9 (process: wait longer 5->30s on hard shutdown) [2] https://libvirt.org/git/?p=libvirt.git;a=commit;h=be2ca04 ("process: wait longer on kill per assigned Hostdev") Related-bug: #1353939 Change-Id: If2035cac931c42c440d61ba97ebc7e9e92141a28 Signed-off-by: Kashyap Chamarthy <kchamart> [*] https://opendev.org/openstack/nova/commit/10d50ca4e2 — "libvirt: Rework 'EBUSY' (SIGKILL) error handling code path" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811 |