Created attachment 1732362 [details] backtraces, client logs, vm xml qemu deadlock after poweroff in guest during backup in nbd_export_close_all() Description of problem: If you poweroff a guest during backup, qemu deadlock during shutdown, leaving the vm in "shutdown" state. qemu monitor blocks, blocking libvirt. When this happens in RHV, there is no way to recover from this state without killing qemu manually. This is worse than a crash, since it cause unlimited downtime until manual innervation. Version-Release number of selected component (if applicable): qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64 How reproducible: 50% (4 of 8 attempts) Steps to Reproduce: 1. Start full backup in RHV (imageio create 4 connections to qemu nbd server, and start reading data extents concurrently) 2. Wait until downlaod started 3. In the guest, poweroff Actual results: After few seconds, qemu deadlocks. Libvirt reports the vm in "shutdown" state. Expected results: qemu terminates cleanly. Additional info: It would be nicer if we qemu could pause the guest cpus and keep runing untilt the backup was completed, but this is a topic for another bug. Looks like nbd_export_close_all() deadlocks, waiting for io events that will never arrive since all clients are disconneted. Here is a typical backtrace of qemu whenn this happens: Thread 7 (Thread 0x7fc9875fe700 (LWP 34822)): #0 0x00007fc9dfa682fc in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x0000561d5c94d1fd in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x561d5f14f438, file=0x561d5cace868 "/builddir/build/BUILD/qemu-5.1.0/ui/vnc-jobs.c", line=214) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:174 #2 0x0000561d5c8742d1 in vnc_worker_thread_loop (queue=queue@entry=0x561d5f14f400) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/ui/vnc-jobs.c:214 #3 0x0000561d5c874c00 in vnc_worker_thread (arg=0x561d5f14f400) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/ui/vnc-jobs.c:324 #4 0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d8bd0a0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521 #5 0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6 Thread 6 (Thread 0x7fc987dff700 (LWP 34821)): #0 0x00007fc9df788ca1 in poll () from target:/lib64/libc.so.6 #1 0x00007fc9e3d899b6 in g_main_context_iterate.isra () from target:/lib64/libglib-2.0.so.0 #2 0x00007fc9e3d89d72 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0 #3 0x00007fc9e196e8db in red_worker_main () from target:/lib64/libspice-server.so.1 #4 0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0 #5 0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6 Thread 5 (Thread 0x7fc9d2fde700 (LWP 34818)): #0 0x00007fc9dfa682fc in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x0000561d5c94d1fd in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x561d5d1841c0 <qemu_global_mutex>, file=0x561d5ca125d8 "/builddir/build/BUILD/qemu-5.1.0/softmmu/cpus.c", line=1145) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:174 #2 0x0000561d5c6a9277 in qemu_wait_io_event (cpu=cpu@entry=0x561d5d618eb0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/cpus.c:1145 #3 0x0000561d5c6aaaa8 in qemu_kvm_cpu_thread_fn (arg=0x561d5d618eb0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/cpus.c:1193 #4 qemu_kvm_cpu_thread_fn (arg=0x561d5d618eb0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/cpus.c:1160 #5 0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d63f7e0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521 #6 0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0 #7 0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6 Thread 4 (Thread 0x7fc9d37df700 (LWP 34817)): #0 0x00007fc9df788ca1 in poll () from target:/lib64/libc.so.6 #1 0x00007fc9e3d899b6 in g_main_context_iterate.isra () from target:/lib64/libglib-2.0.so.0 #2 0x00007fc9e3d89d72 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0 #3 0x0000561d5c726931 in iothread_run (opaque=0x561d5d4abe60) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/iothread.c:82 #4 0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d616730) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521 #5 0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6 Thread 3 (Thread 0x7fc9d901e700 (LWP 34812)): #0 0x00007fc9df788d96 in ppoll () from target:/lib64/libc.so.6 #1 0x0000561d5c963bf9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-timer.c:335 #3 0x0000561d5c94ae21 in fdmon_poll_wait (ctx=0x561d5d59c300, ready_list=0x7fc9d901d6b0, timeout=-1) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/fdmon-poll.c:79 #4 0x0000561d5c94a62e in aio_poll (ctx=0x561d5d59c300, blocking=blocking@entry=true) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/aio-posix.c:600 #5 0x0000561d5c726904 in iothread_run (opaque=0x561d5d4b9800) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/iothread.c:75 #6 0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d59c810) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521 #7 0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0 #8 0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6 Thread 2 (Thread 0x7fc9d981f700 (LWP 34811)): #0 0x00007fc9df78e78d in syscall () from target:/lib64/libc.so.6 #1 0x0000561d5c94d6af in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/include/qemu/futex.h:29 #2 qemu_event_wait (ev=ev@entry=0x561d5d1999e8 <rcu_call_ready_event>) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:460 #3 0x0000561d5c9558f2 in call_rcu_thread (opaque=<optimized out>) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/rcu.c:258 #4 0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d3b0760) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521 #5 0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6 Thread 1 (Thread 0x7fc9e4a97680 (LWP 34807)): #0 0x00007fc9df788d96 in ppoll () from target:/lib64/libc.so.6 #1 0x0000561d5c963bf9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-timer.c:335 #3 0x0000561d5c94ae21 in fdmon_poll_wait (ctx=0x561d5d597a20, ready_list=0x7ffe8876ffb0, timeout=-1) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/fdmon-poll.c:79 #4 0x0000561d5c94a62e in aio_poll (ctx=0x561d5d597a20, blocking=blocking@entry=true) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/aio-posix.c:600 #5 0x0000561d5c8e2db2 in nbd_export_close_all () at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/nbd/server.c:1751 #6 0x0000561d5c8808a9 in bdrv_close_all () at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/block.c:4427 #7 0x0000561d5c6be2bb in qemu_cleanup () at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/vl.c:4531 #8 0x0000561d5c5d6bd7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/main.c:50 In the client, we see that connections were closed: 2020-11-22 12:53:26,859 ERROR (Thread-15) [http] Request aborted after starting response Traceback (most recent call last): File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 774, in __call__ self.dispatch(req, resp) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 819, in dispatch return method(req, resp, *match.groups()) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/cors.py", line 84, in wrapper return func(self, req, resp, *args) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/images.py", line 148, in get ticket.run(op) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/auth.py", line 174, in run return operation.run() File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 57, in run res = self._run() File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 100, in _run self._read_chunk() File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 112, in _read_chunk count = self._src.readinto(view) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/nbd.py", line 110, in readinto self._client.readinto(self._position, view) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 416, in readinto self._recv_reply(cmd) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 935, in _recv_reply magic = self._recv_fmt("!I")[0] File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 1190, in _recv_fmt data = self._recv(s.size) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 1200, in _recv self._recv_into(buf) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 1213, in _recv_into .format(pos, length)) ovirt_imageio._internal.nbd.ProtocolError: Server closed the connection, read 0 bytes, expected 4 bytes 2020-11-22 17:54:25,831 ERROR (Thread-23) [http] Server error Traceback (most recent call last): File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 774, in __call__ self.dispatch(req, resp) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 819, in dispatch return method(req, resp, *match.groups()) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/cors.py", line 84, in wrapper return func(self, req, resp, *args) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/images.py", line 148, in get ticket.run(op) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/auth.py", line 174, in run return operation.run() File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 57, in run res = self._run() File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 100, in _run self._read_chunk() File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 112, in _read_chunk count = self._src.readinto(view) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/nbd.py", line 110, in readinto self._client.readinto(self._position, view) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 415, in readinto self._send_command(cmd) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 928, in _send_command self._send(cmd.to_bytes()) File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 1196, in _send self._sock.sendall(data) BrokenPipeError: [Errno 32] Broken pipe
After qemu hangs, libvirt reports the vm as: # virsh -r list Id Name State -------------------------------- 1 backup-raw in shutdown And any operation (e.g. stopping backup) fails with: libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=qemu-event) Libvirt is waiting for events from qemu monitor: Thread 18 (Thread 0x7fddcf15f700 (LWP 9155)): #0 0x00007fde2f5d02fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fde32e2491a in virCondWait () from /lib64/libvirt.so.0 #2 0x00007fdde9367924 in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #3 0x00007fdde9377605 in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #4 0x00007fdde9387eac in qemuMonitorJSONGetJobInfo () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #5 0x00007fdde92e9fb9 in qemuBlockJobUpdate () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #6 0x00007fdde93bd968 in qemuProcessEventHandler () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #7 0x00007fde32e2546f in virThreadPoolWorker () from /lib64/libvirt.so.0 #8 0x00007fde32e24a6b in virThreadHelper () from /lib64/libvirt.so.0 #9 0x00007fde2f5ca14a in start_thread () from /lib64/libpthread.so.0 #10 0x00007fde2eedff23 in clone () from /lib64/libc.so.6 Peter, is this expected or should I file a libvirt bug for this issue?
If qemu doesn't respond to the QMP command as visible by the back trace ending in qemuMonitorSend() there isn't much that libvirt can do. If 'virsh destroy $VMNAME' properly terminates the VM it's not a libvirt bug. virsh destroy is supposed to kill the process which in turn closes the monitor socket which should make the qemuMonitorSend() call return failure.
In RHV you cannot destroy the VM when it is in the middle of a backup, and we cannot stop the backup because all libvirt calls for this vm fail :-) Is it possible to stop the backup in this state when qemu monitor is blocked? We need to fix fix RHV to allow destroying the VM in this case first.
No, stopping a backup requires issuing QMP commands to tear down the backup (stop NBD, remove scratch files, delete temporrary bitmaps), so if qemu is deadlocked you can't stop the backup. Additionally it wouldn't help much if we'd allow any other command as that would get stuck on a deadlocked qemu too. Only two options are: 1) qemu un-deadlocks 2) qemu is killed (virsh destroy or equivalent)
I hit this bug while working on BZ#1900326. There's a circular dependency problem in bdrv_close_all(): void bdrv_close_all(void) { assert(job_next(NULL) == NULL); blk_exp_close_all(); /* Drop references from requests still in flight, such as canceled block * jobs whose AIO context has not been polled yet */ bdrv_drain_all(); blk_remove_all_bs(); blockdev_close_all_bdrv_states(); assert(QTAILQ_EMPTY(&all_bdrv_states)); } blk_exp_close_all() will request the export to shutdown all client connections, and will wait until the export has been stopped completely using AIO_WAIT_WHILE. The problem is that the nbd server may have created coroutines that will be waiting in the block layer until it's fully drained. On the other hand, we can't drain the block layer until all export's client connections are closed, so they don't queue more requests. I'm going to propose a modification of the blk_exp_close_all() so it'll receive an argument to indicate if we want it to wait for all exports to be completely stopped. That way, we can use it here twice, one at the beginning of the function to close all client connections without waiting, and another close to the end, when the block layer has been quiesced, to actually wait and confirm all exports are stopped.
v2: of Sergio's series: https://lists.nongnu.org/archive/html/qemu-devel/2020-12/msg03742.html seems to be getting closer to a solution and since he's working it, I'll change assignment. Once a solution is accepted upstream we'll need to clone this bug for RHEL-AV 8.4.0 as well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0639