Bug 1900505

Summary: [incremental_backup] qemu deadlock after poweroff in guest during backup in nbd_export_close_all()
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Nir Soffer <nsoffer>
Component: qemu-kvmAssignee: Sergio Lopez <slopezpa>
qemu-kvm sub component: NBD QA Contact: Gu Nini <ngu>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: high CC: aliang, areis, coli, eblake, ehadley, jferlan, jinzhao, juzhang, pkrempa, qzhang, slopezpa, virt-maint, zhencliu, zixchen
Version: 8.3Keywords: Triaged
Target Milestone: rc   
Target Release: 8.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-5.1.0-19.module+el8.3.1+9795+4ce2a535 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1918968 (view as bug list) Environment:
Last Closed: 2021-02-22 15:39:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1892672, 1918968    
Attachments:
Description Flags
backtraces, client logs, vm xml none

Description Nir Soffer 2020-11-23 08:50:41 UTC
Created attachment 1732362 [details]
backtraces, client logs, vm xml

qemu deadlock after poweroff in guest during backup in nbd_export_close_all()

Description of problem:

If you poweroff a guest during backup, qemu deadlock during shutdown,
leaving the vm in "shutdown" state. qemu monitor blocks, blocking
libvirt. When this happens in RHV, there is no way to recover from this
state without killing qemu manually.

This is worse than a crash, since it cause unlimited downtime until
manual innervation.

Version-Release number of selected component (if applicable):
qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64

How reproducible:
50% (4 of 8 attempts)

Steps to Reproduce:
1. Start full backup in RHV
   (imageio create 4 connections to qemu nbd server, and start reading
    data extents concurrently)
2. Wait until downlaod started
3. In the guest, poweroff

Actual results:
After few seconds, qemu deadlocks. Libvirt reports the vm in "shutdown"
state.

Expected results:
qemu terminates cleanly.

Additional info:
It would be nicer if we qemu could pause the guest cpus and keep runing
untilt the backup was completed, but this is a topic for another bug.

Looks like nbd_export_close_all() deadlocks, waiting for io events that
will never arrive since all clients are disconneted.

Here is a typical backtrace of qemu whenn this happens:

Thread 7 (Thread 0x7fc9875fe700 (LWP 34822)):
#0  0x00007fc9dfa682fc in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x0000561d5c94d1fd in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x561d5f14f438, file=0x561d5cace868 "/builddir/build/BUILD/qemu-5.1.0/ui/vnc-jobs.c", line=214) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:174
#2  0x0000561d5c8742d1 in vnc_worker_thread_loop (queue=queue@entry=0x561d5f14f400) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/ui/vnc-jobs.c:214
#3  0x0000561d5c874c00 in vnc_worker_thread (arg=0x561d5f14f400) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/ui/vnc-jobs.c:324
#4  0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d8bd0a0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521
#5  0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6

Thread 6 (Thread 0x7fc987dff700 (LWP 34821)):
#0  0x00007fc9df788ca1 in poll () from target:/lib64/libc.so.6
#1  0x00007fc9e3d899b6 in g_main_context_iterate.isra () from target:/lib64/libglib-2.0.so.0
#2  0x00007fc9e3d89d72 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0
#3  0x00007fc9e196e8db in red_worker_main () from target:/lib64/libspice-server.so.1
#4  0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0
#5  0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6

Thread 5 (Thread 0x7fc9d2fde700 (LWP 34818)):
#0  0x00007fc9dfa682fc in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x0000561d5c94d1fd in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x561d5d1841c0 <qemu_global_mutex>, file=0x561d5ca125d8 "/builddir/build/BUILD/qemu-5.1.0/softmmu/cpus.c", line=1145) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:174
#2  0x0000561d5c6a9277 in qemu_wait_io_event (cpu=cpu@entry=0x561d5d618eb0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/cpus.c:1145
#3  0x0000561d5c6aaaa8 in qemu_kvm_cpu_thread_fn (arg=0x561d5d618eb0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/cpus.c:1193
#4  qemu_kvm_cpu_thread_fn (arg=0x561d5d618eb0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/cpus.c:1160
#5  0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d63f7e0) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521
#6  0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0
#7  0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6

Thread 4 (Thread 0x7fc9d37df700 (LWP 34817)):
#0  0x00007fc9df788ca1 in poll () from target:/lib64/libc.so.6
#1  0x00007fc9e3d899b6 in g_main_context_iterate.isra () from target:/lib64/libglib-2.0.so.0
#2  0x00007fc9e3d89d72 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0
#3  0x0000561d5c726931 in iothread_run (opaque=0x561d5d4abe60) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/iothread.c:82
#4  0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d616730) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521
#5  0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6

Thread 3 (Thread 0x7fc9d901e700 (LWP 34812)):
#0  0x00007fc9df788d96 in ppoll () from target:/lib64/libc.so.6
#1  0x0000561d5c963bf9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-timer.c:335
#3  0x0000561d5c94ae21 in fdmon_poll_wait (ctx=0x561d5d59c300, ready_list=0x7fc9d901d6b0, timeout=-1) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/fdmon-poll.c:79
#4  0x0000561d5c94a62e in aio_poll (ctx=0x561d5d59c300, blocking=blocking@entry=true) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/aio-posix.c:600
#5  0x0000561d5c726904 in iothread_run (opaque=0x561d5d4b9800) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/iothread.c:75
#6  0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d59c810) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521
#7  0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0
#8  0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6

Thread 2 (Thread 0x7fc9d981f700 (LWP 34811)):
#0  0x00007fc9df78e78d in syscall () from target:/lib64/libc.so.6
#1  0x0000561d5c94d6af in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x561d5d1999e8 <rcu_call_ready_event>) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:460
#3  0x0000561d5c9558f2 in call_rcu_thread (opaque=<optimized out>) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/rcu.c:258
#4  0x0000561d5c94ce44 in qemu_thread_start (args=0x561d5d3b0760) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-thread-posix.c:521
#5  0x00007fc9dfa6214a in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007fc9df793f23 in clone () from target:/lib64/libc.so.6

Thread 1 (Thread 0x7fc9e4a97680 (LWP 34807)):
#0  0x00007fc9df788d96 in ppoll () from target:/lib64/libc.so.6
#1  0x0000561d5c963bf9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/qemu-timer.c:335
#3  0x0000561d5c94ae21 in fdmon_poll_wait (ctx=0x561d5d597a20, ready_list=0x7ffe8876ffb0, timeout=-1) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/fdmon-poll.c:79
#4  0x0000561d5c94a62e in aio_poll (ctx=0x561d5d597a20, blocking=blocking@entry=true) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/util/aio-posix.c:600
#5  0x0000561d5c8e2db2 in nbd_export_close_all () at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/nbd/server.c:1751
#6  0x0000561d5c8808a9 in bdrv_close_all () at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/block.c:4427
#7  0x0000561d5c6be2bb in qemu_cleanup () at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/vl.c:4531
#8  0x0000561d5c5d6bd7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /usr/src/debug/qemu-kvm-5.1.0-13.module+el8.3.0+8424+e82f331d.x86_64/softmmu/main.c:50


In the client, we see that connections were closed:

2020-11-22 12:53:26,859 ERROR   (Thread-15) [http] Request aborted after starting response
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 774, in __call__
    self.dispatch(req, resp)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 819, in dispatch
    return method(req, resp, *match.groups())
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/cors.py", line 84, in wrapper
    return func(self, req, resp, *args)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/images.py", line 148, in get
    ticket.run(op)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/auth.py", line 174, in run
    return operation.run()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 57, in run
    res = self._run()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 100, in _run
    self._read_chunk()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 112, in _read_chunk
    count = self._src.readinto(view)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/nbd.py", line 110, in readinto
    self._client.readinto(self._position, view)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 416, in readinto
    self._recv_reply(cmd)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 935, in _recv_reply
    magic = self._recv_fmt("!I")[0]
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 1190, in _recv_fmt
    data = self._recv(s.size)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 1200, in _recv
    self._recv_into(buf)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 1213, in _recv_into
    .format(pos, length))
ovirt_imageio._internal.nbd.ProtocolError: Server closed the connection, read 0 bytes, expected 4 bytes

2020-11-22 17:54:25,831 ERROR   (Thread-23) [http] Server error
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 774, in __call__
    self.dispatch(req, resp)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 819, in dispatch
    return method(req, resp, *match.groups())
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/cors.py", line 84, in wrapper
    return func(self, req, resp, *args)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/images.py", line 148, in get
    ticket.run(op)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/auth.py", line 174, in run
    return operation.run()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 57, in run
    res = self._run()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 100, in _run
    self._read_chunk()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/ops.py", line 112, in _read_chunk
    count = self._src.readinto(view)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/nbd.py", line 110, in readinto
    self._client.readinto(self._position, view)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 415, in readinto
    self._send_command(cmd)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 928, in _send_command
    self._send(cmd.to_bytes())
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 1196, in _send
    self._sock.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe

Comment 2 Nir Soffer 2020-11-23 09:02:28 UTC
After qemu hangs, libvirt reports the vm as:

# virsh -r list
 Id   Name         State
--------------------------------
 1    backup-raw   in shutdown

And any operation (e.g. stopping backup) fails with:

    libvirt.libvirtError: Timed out during operation: cannot acquire state
    change lock (held by monitor=qemu-event)

Libvirt is waiting for events from qemu monitor:

Thread 18 (Thread 0x7fddcf15f700 (LWP 9155)):
#0  0x00007fde2f5d02fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fde32e2491a in virCondWait () from /lib64/libvirt.so.0
#2  0x00007fdde9367924 in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#3  0x00007fdde9377605 in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#4  0x00007fdde9387eac in qemuMonitorJSONGetJobInfo () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#5  0x00007fdde92e9fb9 in qemuBlockJobUpdate () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#6  0x00007fdde93bd968 in qemuProcessEventHandler () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#7  0x00007fde32e2546f in virThreadPoolWorker () from /lib64/libvirt.so.0
#8  0x00007fde32e24a6b in virThreadHelper () from /lib64/libvirt.so.0
#9  0x00007fde2f5ca14a in start_thread () from /lib64/libpthread.so.0
#10 0x00007fde2eedff23 in clone () from /lib64/libc.so.6

Peter, is this expected or should I file a libvirt bug for this issue?

Comment 3 Peter Krempa 2020-11-23 09:58:23 UTC
If qemu doesn't respond to the QMP command as visible by the back trace ending in qemuMonitorSend() there isn't much that libvirt can do.

If 'virsh destroy $VMNAME' properly terminates the VM it's not a libvirt bug. virsh destroy is supposed to kill the process which in turn closes the monitor socket which should make the qemuMonitorSend() call return failure.

Comment 4 Nir Soffer 2020-11-23 10:04:34 UTC
In RHV you cannot destroy the VM when it is in the middle of a backup, and we
cannot stop the backup because all libvirt calls for this vm fail :-)

Is it possible to stop the backup in this state when qemu monitor is blocked?

We need to fix fix RHV to allow destroying the VM in this case first.

Comment 5 Peter Krempa 2020-11-23 10:37:11 UTC
No, stopping a backup requires issuing QMP commands to tear down the backup (stop NBD, remove scratch files, delete temporrary bitmaps), so if qemu is deadlocked you can't stop the backup. Additionally it wouldn't help much if we'd allow any other command as that would get stuck on a deadlocked qemu too.

Only two options are:
1) qemu un-deadlocks
2) qemu is killed (virsh destroy or equivalent)

Comment 7 Sergio Lopez 2020-12-11 16:28:14 UTC
I hit this bug while working on BZ#1900326. There's a circular dependency problem in bdrv_close_all():

void bdrv_close_all(void)
{
    assert(job_next(NULL) == NULL);
    blk_exp_close_all();

    /* Drop references from requests still in flight, such as canceled block
     * jobs whose AIO context has not been polled yet */
    bdrv_drain_all();

    blk_remove_all_bs();
    blockdev_close_all_bdrv_states();

    assert(QTAILQ_EMPTY(&all_bdrv_states));
}

blk_exp_close_all() will request the export to shutdown all client connections, and will wait until the export has been stopped completely using AIO_WAIT_WHILE. The problem is that the nbd server may have created coroutines that will be waiting in the block layer until it's fully drained. On the other hand, we can't drain the block layer until all export's client connections are closed, so they don't queue more requests.

I'm going to propose a modification of the blk_exp_close_all() so it'll receive an argument to indicate if we want it to wait for all exports to be completely stopped. That way, we can use it here twice, one at the beginning of the function to close all client connections without waiting, and another close to the end, when the block layer has been quiesced, to actually wait and confirm all exports are stopped.

Comment 8 John Ferlan 2020-12-22 17:05:20 UTC
v2: of Sergio's series: https://lists.nongnu.org/archive/html/qemu-devel/2020-12/msg03742.html seems to be getting closer to a solution and since he's working it, I'll change assignment.

Once a solution is accepted upstream we'll need to clone this bug for RHEL-AV 8.4.0 as well.

Comment 19 errata-xmlrpc 2021-02-22 15:39:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0639