Bug 1592817
| Summary: | Retrying on serial_xmit if the pipe is broken may compromise the Guest | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Sergio Lopez <slopezpa> |
| Component: | qemu-kvm-rhev | Assignee: | Marc-Andre Lureau <marcandre.lureau> |
| Status: | CLOSED ERRATA | QA Contact: | Qianqian Zhu <qizhu> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.5 | CC: | chayang, juzhang, marjones, michen, qizhu, slopezpa, virt-maint, xfu |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-rhev-2.12.0-8.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-01 11:10:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The patch has been applied upstream:
commit 019288bf137183bf3407c9824655b753bfafc99f
Author: Sergio Lopez <slp>
Date: Tue Jun 5 03:54:55 2018 -0400
hw/char/serial: Only retry if qemu_chr_fe_write returns 0
I'll send a backport.
the upstream patch isn't enough and creates some regression sent [RHEL-7.6 qemu-kvm-rhev PATCH] hw/char/serial: retry write if EAGAIN Fix included in qemu-kvm-rhev-2.12.0-8.el7 Sergio, According to comment0, QE cann't reproduce this bug. The following is the detailed testing process. If my understanding is wrong, please correct me. 1.Boot Guest like this. #taskset -c 0 /usr/libexec/qemu-kvm -enable-kvm -m 8G -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -drive file=/home/rhel76-64-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,snapshot=on -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -net user,hostfwd=tcp::6666-:22 -net nic,model=virtio -device isa-serial,chardev=charserial0,id=serial0 -chardev stdio,id=charserial0 -spice port=5910,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1 2> /tmp/qemu.err | cat > /tmp/qemu.out 2. Inside guest 2.1)# iptables -A OUTPUT -p ICMP -j LOG 2.2)#ping localhost 2.3)#execute a small script while true; do echo "this is console testing" >/dev/ttyS0 done 3. check /tmp/qemu.out's content #tailf /tmp/qemu.out result:output guest console's content. 4. kill cat process on host #kill -9 `pidof cat` Result: after 4 hours, qemu-kvm process and guest work well. Sergio, As QE cann't reproduce this bug. Can QE use sanity test of chardev to verify it? As can not reproduce this bug, QE did sanity test to verify it with chardev test run. no fund regression issue. I will set this bug as verified. If you need to other test. please let QE know. This is test run result. https://polarion.engineering.redhat.com/polarion/#/project/RedHatEnterpriseLinux7/testrun?id=virtkvmqe-chardev-sanity-test-RHEL76 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3443 |
Description of problem: On a QEMU process with a serial device redirected to a PIPE (a common circumstance when virtlogd is configured as log backend), if the PIPE is broken QEMU will install a callback to retry the operation a number of times, even though this is not a recoverable error. Additionally, if the vCPU issuing the request and the emulator thread happen (by chance or pinning) to share the same pCPU, the Guest stability may get compromised, as both threads will be competing for pCPU time and the qemu_global mutex. Some debugging info from a simulation: - vCPU thread waiting to acquire qemu_global_mutex: Thread 3 (Thread 0x7fb6c7fff700 (LWP 31641)): #0 0x00007fb6e591c4cd in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fb6e5917dcb in _L_lock_812 () at /lib64/libpthread.so.0 #2 0x00007fb6e5917c98 in __GI___pthread_mutex_lock (address@hidden <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:79 #3 0x0000556dfaebcfd7 in qemu_mutex_lock_impl (address@hidden <qemu_global_mutex>, address@hidden "/root/Projects/qemu/cpus.c", address@hidden) at util/qemu-thread-posix.c:67 #4 0x0000556dfaacfe58 in qemu_mutex_lock_iothread () at /root/Projects/qemu/cpus.c:1765 #5 0x0000556dfaa90bcd in prepare_mmio_access (mr=0x556dfe71c0e0, mr=0x556dfe71c0e0) at /root/Projects/qemu/exec.c:3068 #6 0x0000556dfaa95f98 in flatview_read_continue (address@hidden, address@hidden, attrs=..., address@hidden, address@hidden "", address@hidden, addr1=5, l=1, mr=0x556dfe71c0e0) at /root/Projects/qemu/exec.c:3189 ---Type <return> to continue, or q <return> to quit--- #7 0x0000556dfaa9617c in flatview_read (fv=0x7fb6c028aa80, addr=1021, attrs=..., buf=0x7fb6e739f000 "", len=1) at /root/Projects/qemu/exec.c:3255 #8 0x0000556dfaa9629f in address_space_read_full (as=<optimized out>, address@hidden, attrs=..., buf=<optimized out>, address@hidden) at /root/Projects/qemu/exec.c:3268 #9 0x0000556dfaa963fa in address_space_rw (as=<optimized out>, address@hidden, attrs=..., address@hidden, buf=<optimized out>, address@hidden, address@hidden) at /root/Projects/qemu/exec.c:3298 #10 0x0000556dfaaf6536 in kvm_cpu_exec (count=1, size=1, direction=<optimized out>, data=<optimized out>, attrs=..., port=1021) at /root/Projects/qemu/accel/kvm/kvm-all.c:1730 #11 0x0000556dfaaf6536 in kvm_cpu_exec (address@hidden) at /root/Projects/qemu/accel/kvm/kvm-all.c:1970 #12 0x0000556dfaad0006 in qemu_kvm_cpu_thread_fn (arg=0x556dfd3d8020) at /root/Projects/qemu/cpus.c:1215 #13 0x00007fb6e5915dd5 in start_thread (arg=0x7fb6c7fff700) at pthread_create.c:308 #14 0x00007fb6d9e97b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 - The owner is LWP 31634, the main thread (gdb) p qemu_global_mutex $1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 31634, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\222{\000\000\001", '\000' <repeats 26 times>, __align = 2}, initialized = true} - The main thead is in the callback Thread 1 (Thread 0x7fb6e7355cc0 (LWP 31634)): ---Type <return> to continue, or q <return> to quit--- #0 0x0000556dfac59420 in serial_watch_cb (chan=0x556dfd321400, cond=G_IO_OUT, opaque=0x556dfe71c000) at hw/char/serial.c:233 #1 0x00007fb6e68638f9 in g_main_context_dispatch (context=0x556dfd314210) at gmain.c:3146 #2 0x00007fb6e68638f9 in g_main_context_dispatch (address@hidden) at gmain.c:3811 #3 0x0000556dfaeba126 in main_loop_wait () at util/main-loop.c:215 #4 0x0000556dfaeba126 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:263 #5 0x0000556dfaeba126 in main_loop_wait (address@hidden) at util/main-loop.c:522 #6 0x0000556dfaa89a2f in main () at vl.c:1943 #7 0x0000556dfaa89a2f in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4679 Version-Release number of selected component (if applicable): All QEMU versions including upstream. How reproducible: Always. Steps to Reproduce: 1. Launch a QEMU process with a serial port redirected to a PIPE, and all threads constrained to a single pCPU: taskset -c 0 qemu-system-x86_64 -enable-kvm -m 8G -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -drive file=/home/VirtualMachines/rhel74-16gb-1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,snapshot=on -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -net user,hostfwd=tcp::6666-:22 -net nic,model=virtio -device isa-serial,chardev=charserial0,id=serial0 -chardev stdio,id=charserial0 -spice port=5910,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1 2> /tmp/qemu.err | cat > /tmp/qemu.out 2. On the Guest, redirect the console to the serial port, and have it generate some output regularly. An easy way to do this is adding a LOG rule for the ICMP protocol to iptables' OUTPUT chain, and then leave a continuous ping to a reachable address. 3. Kill the "cat" process serving the PIPE. Actual results: From Host perspective, the thread backing the vCPU will start hogging the pCPU. From the Guest, the machine will become increasingly unresponsive. Expected results: The Guest shouldn't be affected by this condition. Additional info: There's already a patch upstream, please consider backporting it: https://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg00827.html