Bug 1451470
Summary: | RHEL 7.2 based VM (Virtual Machine) hung for several hours apparently waiting for lock held by main_loop | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Bimal Chollera <bcholler> | |
Component: | qemu-kvm | Assignee: | Fam Zheng <famz> | |
Status: | CLOSED ERRATA | QA Contact: | Sitong Liu <siliu> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 7.2 | CC: | chayang, coli, dvacek, ehabkost, famz, jcoscia, jherrman, juzhang, knoel, michen, mkalinin, mrezanin, pbonzini, rbalakri, rhodain, salmy, sjohnsto, slopezpa, tlavigne, virt-bugs, virt-maint, xfu | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-1.5.3-140.el7 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, guest virtual machines in some cases became unresponsive when the "pty" back end of a serial device performed an irregular I/O communication. This update improves the handling of serial I/O on guests, which prevents the described problem from occurring.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1452331 1452332 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-01 17:49:19 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1452331, 1452332 |
Description
Bimal Chollera
2017-05-16 17:53:18 UTC
> I believe that Fam's scratch build will fix it. serial_xmit() is the only
> caller of qemu_chr_fe_add_watch() that ignores its return value.
It wouldn't necessarily fix it, but it would assert as soon as two watches are set up at the same time.
My plan is to:
1) try and reproduce with downstream QEMU
2) try and reproduce with Fam's patch
3) try moving the patch to add the assertion early in the series, to see which patch fixes it
(In reply to Paolo Bonzini from comment #14) > > I believe that Fam's scratch build will fix it. serial_xmit() is the only > > caller of qemu_chr_fe_add_watch() that ignores its return value. > > It wouldn't necessarily fix it, but it would assert as soon as two watches > are set up at the same time. That's true about a1df76da57aa8772a75e7c49f8e3829d07b4c46c. However, the scratch build also includes commit f702e62a193e9ddb41cef95068717e5582b39a64, which rewrites the retry logic completely. Right. So I think this hunk is the one that fixes the bug: @@ -293,7 +298,9 @@ static void serial_ioport_write(void *opaque, s->thr_ipending = 0; s->lsr &= ~UART_LSR_THRE; serial_update_irq(s); - serial_xmit(NULL, G_IO_OUT, s); + if (s->tsr_retry <= 0) { + serial_xmit(NULL, G_IO_OUT, s); + } } break; case 1: although I still support backporting all the patches that Fam identified, especially 0d931d70 ("serial: clean up THRE/TEMT handling") and 62c339c52 ("qemu-char: ignore flow control if a PTY's slave is not connected"). Fix included in qemu-kvm-1.5.3-138.el7 Not sure about 2), but O_NONBLOCK was added to pty fd in upstream: commit fac6688a18574b6f2caa8c699a936e729ed53ece Author: Don Slutz <dslutz> Date: Mon Dec 22 10:04:00 2014 -0500 Do not hang on full PTY Signed-off-by: Don Slutz <dslutz> Reviewed-by: Paolo Bonzini <pbonzini> Signed-off-by: Michael Tokarev <mjt.ru> diff --git a/qemu-char.c b/qemu-char.c index 5430b87..98d4342 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -1402,6 +1402,7 @@ static CharDriverState *qemu_chr_open_pty(const char *id, } close(slave_fd); + qemu_set_nonblock(master_fd); chr = qemu_chr_alloc(); (In reply to Fam Zheng from comment #41) > Not sure about 2), but O_NONBLOCK was added to pty fd in upstream: I've tried just with O_NONBLOCK first, and while it QEMU no longer blocks on write(), the cost of writing to the serial port is so high for the VM, it still feels like it's hung. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1856 |