Bug 1392316 - [RFE] Always provide timeout for operations blocked in the QEMU driver
Summary: [RFE] Always provide timeout for operations blocked in the QEMU driver
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Libvirt Maintainers
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-07 07:49 UTC by Francesco Romani
Modified: 2016-11-07 11:38 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-07 08:27:06 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Francesco Romani 2016-11-07 07:49:18 UTC
Description of problem:
The libvirt QEMU driver uses the QEMU monitor to query the hypervisor state.
If the hypervisor encounters a storage layer error, perhaps because it uses shared storage and there is a network failure, it can get stuck in I/O inside the kernel and enter the D state.
In this case the QEMU monitor can become unresponsive, and all the libvirt APIs
which need to enter the QEMU monitor can block, ultimately leading to the exaustion of the libvirt worker pool.

This in turn makes life of the management application (e.g. oVirt) harder, and can lead to a chain of failures.

We would like libvirt to always timeout, or signal the error to the upper layer, instead of sometimes block forever.

Version-Release number of selected component (if applicable):
Experienced with libvirt 1.3.3 and QEMU 2.6, but libvirt 2.0.0 should behave the same (I'm not aware of any reason not to)

We acknowledge that to fix this bug changes to QEMU (monitor protocol?) may be needed, and that is a complex scenario. This bug is to track the progress of fixing this scenario.

How reproducible:
100% given enough time, frequent enough libvirt usage and unresponsive storage

Steps to Reproduce:
1. set up 1+ QEMU VMs on shared storage, perhaps NFS or ISCSI
2. make sure libvirt is frequently used, involving APIs which needs to access the QEMU monitor, a good example are bulk stats (but not just that)
3. wait for libvirt APIs to block forever. It could take a random amount of time depending on a lot of factors, but it will happen

Actual results:
Sooner (minutes) or later (hours) all the libvirtd worker threads will get stuck trying to access the unresponsive qemu monitor

Expected results:
Insted of blocking forever, all the APIs which should quickly (seconds) return timeout or errors in every case.

Comment 1 Peter Krempa 2016-11-07 08:14:49 UTC
The "always" part is impossible on our side.

Once we send a command to the monitor there's no way to cancel it if it got stuck. This means that if the storage unblocks for any reason the command will be executed. If libvirt reported an error due to timeout the user or applications on top of that would assume that the command failed and not that it will be eventually finished later.

Most of the libvirt APIs are synchronous in this aspect.

Comment 2 Yaniv Kaul 2016-11-07 09:55:34 UTC
(In reply to Peter Krempa from comment #1)
> The "always" part is impossible on our side.
> 
> Once we send a command to the monitor there's no way to cancel it if it got
> stuck. This means that if the storage unblocks for any reason the command
> will be executed. If libvirt reported an error due to timeout the user or
> applications on top of that would assume that the command failed and not
> that it will be eventually finished later.
> 
> Most of the libvirt APIs are synchronous in this aspect.

I wonder if for any 'read' kind of command the 'always' makes sense though.

Comment 3 Daniel Berrangé 2016-11-07 10:00:53 UTC
> Sooner (minutes) or later (hours) all the libvirtd worker threads will get
> stuck trying to access the unresponsive qemu monitor

NB, libvirt added a concept of "high priority" worker threads explicitly to let managment apps get themselves out of this problem. Certain API calls that we know can never block, always get directed to the dedicated pool of high priority worker threads. In particular virDomainDestroy is high priority, so it is always possible to unblock the worker threads by killing the guest that is non-responsive.

Comment 4 Michal Skrivanek 2016-11-07 10:31:44 UTC
it does still require the monitoring part of the application to deal with that when we use bulk calls to get stats for all VMs. Or the management to be intelligent enough and monitor for stuck calls and shut down VMs in this case, which can be a bit too harsh and anyway not good enough as such detection would take some time, and for that time the monitoring is not going to work for all the other VMs

Comment 5 Jiri Denemark 2016-11-07 11:22:46 UTC
Are you suggesting that the problem is with the bulk stats API, which gets blocked because some domains are in D state? If so, please file a bug for that issue and we'll work on a reasonable solution for it.

Comment 6 Francesco Romani 2016-11-07 11:38:10 UTC
(In reply to Jiri Denemark from comment #5)
> Are you suggesting that the problem is with the bulk stats API, which gets
> blocked because some domains are in D state? If so, please file a bug for
> that issue and we'll work on a reasonable solution for it.

Yes, this is one of the few APIs we care most about. I will review the list and file more fine grained RFEs about them.


Note You need to log in before you can comment on or make changes to this bug.