Bug 665820

Summary: [RFE] Send monitor event upon stuck storage
Product: Red Hat Enterprise Linux 7 Reporter: Dor Laor <dlaor>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: ASSIGNED --- QA Contact: CongLi <coli>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: aliang, areis, bazulay, chayang, coli, danken, drjones, hachen, huding, istein, juzhang, kwolf, lyarwood, meyang, mgoldboi, michal.skrivanek, michen, mkalinin, mkenneth, ngu, pingl, rpacheco, srevivo, stefanha, tburke, virt-maint, xuwei
Target Milestone: rcKeywords: FutureFeature, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 654023 Environment:
Last Closed: 2012-06-08 11:35:03 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 1035038, 654023    
Bug Blocks: 580948, 756082    

Comment 7 Dor Laor 2011-06-13 15:37:46 EDT
*** Bug 711374 has been marked as a duplicate of this bug. ***
Comment 9 Ademar Reis 2012-02-17 12:40:00 EST
Please check https://bugzilla.redhat.com/show_bug.cgi?id=695082#c15 for a description of the root problem and why there's no simple way of fixing it.
Comment 10 Markus Armbruster 2012-02-17 12:45:51 EST
This bug is related to bug 695082 - KVM: QEMU stuck when attempting to read from unreachable CD-ROM (disconnected 'nfs-iso-storage-domain').  See https://bugzilla.redhat.com/show_bug.cgi?id=695082#c15 for an analysis of the underlying problem, and why we can't avoid QEMU getting stuck, at least not in RHEL-6.

Unlike bug 695082, this bug doesn't ask for avoiding the hang.  Instead, it asks for QEMU emitting an event when it happens.  I'm afraid that isn't in the cards, either.

We can't know we got stuck until some time after getting stuck (the analysis referenced above explains why).  The shorter that timeout, the larger the chance to turn a recoverable network hiccup into a non-recoverable virtual hardware failure.  Thus, the timeout value is policy.  Policy needs to be set by the management application.

However, I can't think of a way to implement a "detect system call got stuck on NFS" feature in QEMU that doesn't require major architectural surgery, with a considerable risk of introducing nasty bugs.  Would be quite inappropriate for a RHEL minor release even if it wasn't entirely infeasible.

Perhaps management applications could try one or more of the following to better deal with NFS outages:

* Monitor the NFS storage in the management application instead of relying on QEMU detecting and reporting outages in a timely manner.

* Poll QEMU to detect when it hangs.  Pretty schlocky :)

* Use NFS mount options to implement the desired timeout on the NFS level.
Comment 11 Ademar Reis 2012-06-08 11:35:03 EDT
As said in comment 10, there's no way we can fix this at the qemu level without major architectural changes and a consideralbe risk of introducing bugs. Maybe management layers can workaround this, as suggested in comment 10.

Closing as WONTFIX.
Comment 12 Dor Laor 2012-06-17 02:14:05 EDT
(In reply to comment #11)
> As said in comment 10, there's no way we can fix this at the qemu level
> without major architectural changes and a consideralbe risk of introducing
> bugs. Maybe management layers can workaround this, as suggested in comment
> 10.
> 
> Closing as WONTFIX.

It may be fine to close it for rhel6.x but we should keep it open for rhel7
Comment 13 Dor Laor 2012-07-16 05:40:38 EDT
IMHO some qemu threads shouldn't get stuck. One of them is the monitor.
The problem is that today the monitor is executed by the main loop and needs to global qemu lock. If a thread that accesses the storage w/ the lock on (is there such a case?) get stuck, it will cause a deadlock. The monitor should always be functional and we should add a info monitor command that provides data about the outgoing IO and the time since they got sent. W/ this data, management would be able to take the right decision.

Kevin/Markus, what's your take on the above
Comment 14 Kevin Wolf 2012-07-16 05:50:47 EDT
Then I guess the monitor would have run in a separate thread and make sure to call the block layer only from bottom halves that are executed in the I/O thread that can get stuck. This is easy to say, but I'm relatively sure that it's a quite massive change.

Also, what do you think is the right decision that management could take? If the maximum number of threads in the thread pool (or the maximum number of Linux AIO requests) is used up, you can either wait and hope that the situation will be fixed, or you can kill qemu. I don't see much more options.
Comment 17 Michal Skrivanek 2013-06-12 01:31:47 EDT
bug is in ASSIGNED, is there anything going on? The "right decision for management" is a matter of identifying if there is any pending IO to be written or not. Then we can only wait, or pause and migrate the VM to a different host without storage issues.
Comment 18 Kevin Wolf 2013-06-12 03:57:24 EDT
It's unlikely that anything is going to happen as long as we haven't even
defined yet what we even want to have implemented in the end.

You cannot migrate the VM to a different host as long as requests are hanging
in the kernel because qemu must flush them first before it has a consistent
state that can be migrated.

Leaves you with waiting until the network comes back, which you already do
today. Then the difference is merely that the monitor will still be responsive
while the block subsystem is blocked. I think we'll eventually get there with
the dataplane work, but that's of course a massive project and certainly not a
quick fix.

This doesn't look much like RHEL 7.0 to me, if not even WONTFIX.
Comment 19 Ademar Reis 2014-04-24 10:26:54 EDT
*** Bug 1053561 has been marked as a duplicate of this bug. ***
Comment 20 Kevin Wolf 2014-07-17 09:53:40 EDT
Actually, even dataplane probably doesn't help much if the monitor then tries
accessing a hanging block device or draining its requests. Moving to 7.2, still
with the prospect of closing as WONTFIX eventually.
Comment 21 Stefan Hajnoczi 2015-08-24 12:47:02 EDT
Ademar has started closing bugs like this as WONTFIX.  For now, I prefer to defer to RHEL 7.3 since we do need to work on reducing code paths where QEMU blocks on pending I/O.
Comment 22 Ademar Reis 2016-05-20 10:22:32 EDT
(In reply to Stefan Hajnoczi from comment #21)
> Ademar has started closing bugs like this as WONTFIX.  For now, I prefer to
> defer to RHEL 7.3 since we do need to work on reducing code paths where QEMU
> blocks on pending I/O.

Still working on it as a long term goal. Deferring it to 7.4.
Comment 23 Kevin Wolf 2016-06-21 11:16:38 EDT
Now actually deferring it to 7.4. Also moving it to Stefan, both because he is
the I/O path maintainer and because he requested to leave it open in comment 21.
Comment 24 Stefan Hajnoczi 2017-01-16 11:36:47 EST
Moving this long-term architectural BZ to RHEL 7.5.