Bug 1592817 - Retrying on serial_xmit if the pipe is broken may compromise the Guest
Summary: Retrying on serial_xmit if the pipe is broken may compromise the Guest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Marc-Andre Lureau
QA Contact: Qianqian Zhu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-19 11:22 UTC by Sergio Lopez
Modified: 2019-12-03 06:09 UTC (History)
8 users (show)

Fixed In Version: qemu-kvm-rhev-2.12.0-8.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-01 11:10:36 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3443 None None None 2018-11-01 11:12:30 UTC

Description Sergio Lopez 2018-06-19 11:22:17 UTC
Description of problem:

On a QEMU process with a serial device redirected to a PIPE (a common circumstance when virtlogd is configured as log backend), if the PIPE is broken QEMU will install a callback to retry the operation a number of times, even though this is not a recoverable error.

Additionally, if the vCPU issuing the request and the emulator thread happen (by chance or pinning) to share the same pCPU, the Guest stability may get compromised, as both threads will be competing for pCPU time and the qemu_global mutex.

Some debugging info from a simulation:

 - vCPU thread waiting to acquire qemu_global_mutex:

Thread 3 (Thread 0x7fb6c7fff700 (LWP 31641)):
#0  0x00007fb6e591c4cd in __lll_lock_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fb6e5917dcb in _L_lock_812 () at /lib64/libpthread.so.0
#2  0x00007fb6e5917c98 in __GI___pthread_mutex_lock (address@hidden 
<qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:79
#3  0x0000556dfaebcfd7 in qemu_mutex_lock_impl (address@hidden 
<qemu_global_mutex>, address@hidden "/root/Projects/qemu/cpus.c", 
address@hidden)
    at util/qemu-thread-posix.c:67
#4  0x0000556dfaacfe58 in qemu_mutex_lock_iothread () at 
/root/Projects/qemu/cpus.c:1765
#5  0x0000556dfaa90bcd in prepare_mmio_access (mr=0x556dfe71c0e0, 
mr=0x556dfe71c0e0)
    at /root/Projects/qemu/exec.c:3068
#6  0x0000556dfaa95f98 in flatview_read_continue (address@hidden, 
address@hidden, attrs=..., address@hidden, address@hidden "", address@hidden, 
addr1=5, l=1, mr=0x556dfe71c0e0)
    at /root/Projects/qemu/exec.c:3189
---Type <return> to continue, or q <return> to quit---
#7  0x0000556dfaa9617c in flatview_read (fv=0x7fb6c028aa80, addr=1021, 
attrs=..., buf=0x7fb6e739f000 "", len=1) at /root/Projects/qemu/exec.c:3255
#8  0x0000556dfaa9629f in address_space_read_full (as=<optimized out>, 
address@hidden, attrs=..., buf=<optimized out>, address@hidden) at 
/root/Projects/qemu/exec.c:3268
#9  0x0000556dfaa963fa in address_space_rw (as=<optimized out>, address@hidden, 
attrs=...,
    address@hidden, buf=<optimized out>, address@hidden, address@hidden)
    at /root/Projects/qemu/exec.c:3298
#10 0x0000556dfaaf6536 in kvm_cpu_exec (count=1, size=1, direction=<optimized 
out>, data=<optimized out>, attrs=..., port=1021) at 
/root/Projects/qemu/accel/kvm/kvm-all.c:1730
#11 0x0000556dfaaf6536 in kvm_cpu_exec (address@hidden)
    at /root/Projects/qemu/accel/kvm/kvm-all.c:1970
#12 0x0000556dfaad0006 in qemu_kvm_cpu_thread_fn (arg=0x556dfd3d8020) at 
/root/Projects/qemu/cpus.c:1215
#13 0x00007fb6e5915dd5 in start_thread (arg=0x7fb6c7fff700) at 
pthread_create.c:308
#14 0x00007fb6d9e97b3d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:113


 - The owner is LWP 31634, the main thread

 (gdb) p qemu_global_mutex
$1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 31634, __nusers = 1, 
__kind = 0, __spins = 0,
      __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
    __size = "\002\000\000\000\000\000\000\000\222{\000\000\001", '\000' 
<repeats 26 times>,
    __align = 2}, initialized = true}


 - The main thead is in the callback

 Thread 1 (Thread 0x7fb6e7355cc0 (LWP 31634)):
---Type <return> to continue, or q <return> to quit---
#0  0x0000556dfac59420 in serial_watch_cb (chan=0x556dfd321400, cond=G_IO_OUT, 
opaque=0x556dfe71c000)
    at hw/char/serial.c:233
#1  0x00007fb6e68638f9 in g_main_context_dispatch (context=0x556dfd314210) at 
gmain.c:3146
#2  0x00007fb6e68638f9 in g_main_context_dispatch (address@hidden) at 
gmain.c:3811
#3  0x0000556dfaeba126 in main_loop_wait () at util/main-loop.c:215
#4  0x0000556dfaeba126 in main_loop_wait (timeout=<optimized out>) at 
util/main-loop.c:263
#5  0x0000556dfaeba126 in main_loop_wait (address@hidden) at 
util/main-loop.c:522
#6  0x0000556dfaa89a2f in main () at vl.c:1943
#7  0x0000556dfaa89a2f in main (argc=<optimized out>, argv=<optimized out>, 
envp=<optimized out>)
    at vl.c:4679


Version-Release number of selected component (if applicable):

All QEMU versions including upstream.


How reproducible:

Always.


Steps to Reproduce:
1. Launch a QEMU process with a serial port redirected to a PIPE, and all threads constrained to a single pCPU:

taskset -c 0 qemu-system-x86_64 -enable-kvm -m 8G -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -drive file=/home/VirtualMachines/rhel74-16gb-1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,snapshot=on -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -net user,hostfwd=tcp::6666-:22 -net nic,model=virtio -device isa-serial,chardev=charserial0,id=serial0 -chardev stdio,id=charserial0 -spice port=5910,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1 2> /tmp/qemu.err | cat > /tmp/qemu.out

2. On the Guest, redirect the console to the serial port, and have it generate some output regularly. An easy way to do this is adding a LOG rule for the ICMP protocol to iptables' OUTPUT chain, and then leave a continuous ping to a reachable address.

3. Kill the "cat" process serving the PIPE.


Actual results:

From Host perspective, the thread backing the vCPU will start hogging the pCPU. From the Guest, the machine will become increasingly unresponsive.


Expected results:

The Guest shouldn't be affected by this condition.


Additional info:

There's already a patch upstream, please consider backporting it:

https://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg00827.html

Comment 2 Marc-Andre Lureau 2018-07-04 10:44:41 UTC
The patch has been applied upstream:

commit 019288bf137183bf3407c9824655b753bfafc99f
Author: Sergio Lopez <slp@redhat.com>
Date:   Tue Jun 5 03:54:55 2018 -0400

    hw/char/serial: Only retry if qemu_chr_fe_write returns 0

I'll send a backport.

Comment 3 Marc-Andre Lureau 2018-07-10 23:33:47 UTC
the upstream patch isn't enough and creates some regression

Comment 4 Marc-Andre Lureau 2018-07-18 12:21:32 UTC
sent
[RHEL-7.6 qemu-kvm-rhev PATCH] hw/char/serial: retry write if EAGAIN

Comment 7 Miroslav Rezanina 2018-07-24 14:31:36 UTC
Fix included in qemu-kvm-rhev-2.12.0-8.el7

Comment 9 FuXiangChun 2018-07-25 09:13:36 UTC
Sergio,

According to comment0, QE cann't reproduce this bug.  The following is the detailed testing process.  If my understanding is wrong, please correct me. 

1.Boot Guest like this.

#taskset -c 0 /usr/libexec/qemu-kvm -enable-kvm -m 8G -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -drive file=/home/rhel76-64-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,snapshot=on -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -net user,hostfwd=tcp::6666-:22 -net nic,model=virtio -device isa-serial,chardev=charserial0,id=serial0 -chardev stdio,id=charserial0 -spice port=5910,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1 2> /tmp/qemu.err | cat > /tmp/qemu.out

2. Inside guest

2.1)# iptables -A OUTPUT -p ICMP -j LOG

2.2)#ping localhost

2.3)#execute a small script

while true;
do
echo "this is console testing" >/dev/ttyS0
done

3. check /tmp/qemu.out's content
#tailf /tmp/qemu.out

result:output guest console's content.

4. kill cat process on host

#kill -9 `pidof cat`

Result: after 4 hours, qemu-kvm process and guest work well.

Comment 10 FuXiangChun 2018-08-28 08:48:01 UTC
Sergio,
As QE cann't reproduce this bug. Can QE use sanity test of chardev to verify it?

Comment 11 FuXiangChun 2018-09-06 01:37:18 UTC
As can not reproduce this bug, QE did sanity test to verify it with chardev test run. no fund regression issue.  I will set this bug as verified. If you need to other test. please let QE know. 

This is test run result. 

https://polarion.engineering.redhat.com/polarion/#/project/RedHatEnterpriseLinux7/testrun?id=virtkvmqe-chardev-sanity-test-RHEL76

Comment 12 errata-xmlrpc 2018-11-01 11:10:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443


Note You need to log in before you can comment on or make changes to this bug.