| Summary: | [RFE] Always provide timeout for operations blocked in the QEMU driver | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Francesco Romani <fromani> |
| Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> |
| Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.2 | CC: | berrange, jdenemar, michal.skrivanek, pkrempa, rbalakri |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-07 08:27:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Francesco Romani
2016-11-07 07:49:18 UTC
The "always" part is impossible on our side. Once we send a command to the monitor there's no way to cancel it if it got stuck. This means that if the storage unblocks for any reason the command will be executed. If libvirt reported an error due to timeout the user or applications on top of that would assume that the command failed and not that it will be eventually finished later. Most of the libvirt APIs are synchronous in this aspect. (In reply to Peter Krempa from comment #1) > The "always" part is impossible on our side. > > Once we send a command to the monitor there's no way to cancel it if it got > stuck. This means that if the storage unblocks for any reason the command > will be executed. If libvirt reported an error due to timeout the user or > applications on top of that would assume that the command failed and not > that it will be eventually finished later. > > Most of the libvirt APIs are synchronous in this aspect. I wonder if for any 'read' kind of command the 'always' makes sense though. > Sooner (minutes) or later (hours) all the libvirtd worker threads will get
> stuck trying to access the unresponsive qemu monitor
NB, libvirt added a concept of "high priority" worker threads explicitly to let managment apps get themselves out of this problem. Certain API calls that we know can never block, always get directed to the dedicated pool of high priority worker threads. In particular virDomainDestroy is high priority, so it is always possible to unblock the worker threads by killing the guest that is non-responsive.
it does still require the monitoring part of the application to deal with that when we use bulk calls to get stats for all VMs. Or the management to be intelligent enough and monitor for stuck calls and shut down VMs in this case, which can be a bit too harsh and anyway not good enough as such detection would take some time, and for that time the monitoring is not going to work for all the other VMs Are you suggesting that the problem is with the bulk stats API, which gets blocked because some domains are in D state? If so, please file a bug for that issue and we'll work on a reasonable solution for it. (In reply to Jiri Denemark from comment #5) > Are you suggesting that the problem is with the bulk stats API, which gets > blocked because some domains are in D state? If so, please file a bug for > that issue and we'll work on a reasonable solution for it. Yes, this is one of the few APIs we care most about. I will review the list and file more fine grained RFEs about them. |