Bug 1524273
Summary: | [data-plane]qemu core dump when trying to resume block stream job after glusterfs server reboot | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Qianqian Zhu <qizhu> |
Component: | qemu-kvm | Assignee: | Sergio Lopez <slopezpa> |
Status: | CLOSED ERRATA | QA Contact: | aihua liang <aliang> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | --- | CC: | aliang, chayang, coli, ddepaula, juzhang, kanderso, knoel, lolyu, mtessun, ngu, rbalakri, stefanha, virt-maint, xfu |
Target Milestone: | rc | ||
Target Release: | 8.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-4.1.0-10.module+el8.1.0+4234+33aa4f57 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-06 07:10:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Qianqian Zhu
2017-12-11 06:00:27 UTC
See also Bug 1524254 (In reply to Qianqian Zhu from comment #0) > #2 0x00005555557989eb in error_exit (err=<optimized out>, > msg=msg@entry=0x555555baae90 <__func__.23806> "qemu_mutex_unlock") at > util/qemu-thread-posix.c:36 > #3 0x0000555555abe816 in qemu_mutex_unlock > (mutex=mutex@entry=0x555556d35b20) at util/qemu-thread-posix.c:96 > err = <optimized out> > __PRETTY_FUNCTION__ = "qemu_mutex_unlock" > __func__ = "qemu_mutex_unlock" > #4 0x0000555555aba3c9 in aio_context_release (ctx=ctx@entry=0x555556d35ac0) > at util/async.c:494 > #5 0x0000555555a47ec0 in bdrv_prwv_co (child=child@entry=0x555558fa4b40, > offset=offset@entry=126615552, qiov=qiov@entry=0x7fffffffd6c0, > is_write=is_write@entry=true, flags=flags@entry=0) at block/io.c:626 > waited_ = true > busy_ = true > bs_ = 0x55555941a000 > ctx_ = 0x555556d35ac0 > co = <optimized out> > rwco = {child = 0x555558fa4b40, offset = 126615552, qiov = > 0x7fffffffd6c0, is_write = true, ret = 2147483647, flags = 0} > __PRETTY_FUNCTION__ = "bdrv_prwv_co" > #6 0x0000555555a48299 in bdrv_pwrite (qiov=0x7fffffffd6c0, > offset=126615552, child=0x555558fa4b40) at block/io.c:764 > ret = <optimized out> > qiov = {iov = 0x7fffffffd6b0, niov = 1, nalloc = -1, size = 65536} > iov = {iov_base = 0x55555a78c000, iov_len = 65536} > #7 0x0000555555a48299 in bdrv_pwrite (child=0x555558fa4b40, > offset=126615552, buf=<optimized out>, bytes=<optimized out>) at > block/io.c:785 > qiov = {iov = 0x7fffffffd6b0, niov = 1, nalloc = -1, size = 65536} > iov = {iov_base = 0x55555a78c000, iov_len = 65536} The qemu_mutex_unlock() EPERM error from pthread_mutex_unlock() occurs when the mutex is owned by another thread. Perhaps BDRV_POLL_WHILE() got an AioContext that isn't held by the main loop thread. That could mean the BDS's AioContext changed sometime during bdrv_close_all(). Reproduce it on qemu-kvm-rhev-2.12.0-18.el7_6.2.x86_64, guest and qemu hang when quit vm. Test env: kernel version:3.10.0-957.1.2.el7.x86_64 qemu-kvm-rhev version:qemu-kvm-rhev-2.12.0-18.el7_6.2.x86_64 glusterfs version: glusterfs-server-3.12.2-32.el7rhgs.x86_64 Test steps: 1. Start guest with qemu cmds: /usr/libexec/qemu-kvm \ -M q35,accel=kvm,kernel-irqchip=split \ -m 3G \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,iothread=iothread1 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=writethrough,format=qcow2,file=/home/kvm_autotest_root/images/rhel76-64-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1 \ -device virtio-net-pci,mac=9a:89:8a:8b:8c:8d,id=idTqeiKU,netdev=idU17IIx,bus=pcie.0 \ -object iothread,id=iothread1 \ -netdev tap,id=idU17IIx \ -cpu host \ -vnc :0 \ -enable-kvm \ -qmp tcp::5555,server,nowait \ -monitor stdio \ -object iothread,id=iothread2 \ -object iothread,id=iothread3 \ -device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pcie.0,iothread=iothread2 \ -drive id=drive_image2,if=none,snapshot=off,aio=threads,cache=writethrough,format=qcow2,file=/home/data1.qcow2 \ -device scsi-hd,id=image2,drive=drive_image2 \ 2. Create snapshot of system disk "sn1" on gluster server. {"execute": "blockdev-snapshot-sync", "arguments": { "device": "drive_image1","snapshot-file": "gluster://intel-e52650-16-4.englab.nay.redhat.com/aliang/sn1","mode": "absolute-paths", "format": "qcow2" } } 3. Continue to create snapshot sn2 on gluster server. { "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive_image1","snapshot-file": "gluster://intel-e52650-16-4.englab.nay.redhat.com/aliang/sn1","mode": "absolute-paths", "format": "qcow2" } } 4. Check block info: (qemu) info block drive_image1 (#block1397): gluster://intel-e52650-16-4.englab.nay.redhat.com/aliang/sn2 (qcow2) Attached to: image1 Cache mode: writethrough Backing file: gluster://intel-e52650-16-4.englab.nay.redhat.com/aliang/sn1 (chain depth: 2) drive_image2 (#block337): /home/data1.qcow2 (qcow2) Attached to: image2 Cache mode: writethrough sd0: [not inserted] Removable device: not locked, tray closed 5. Do block stream and check block-job info { "execute": "block-stream", "arguments": { "device": "drive_image1"}} ***ourput**** {"timestamp": {"seconds": 1544520178, "microseconds": 230653}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive_image1"}} {"timestamp": {"seconds": 1544520178, "microseconds": 230741}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive_image1"}} {"return": {}} (qemu) info block-jobs Streaming device drive_image1: Completed 3231252480 of 21474836480 bytes, speed limit 0 bytes/s 6. During block stream running, reboot gluster server. ****output for hmp and qmp**** (qemu)[2018-12-11 09:24:01.583704] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-aliang-client-0: server 10.66.144.34:49152 has not responded in the last 42 seconds, disconnecting. [2018-12-11 09:24:01.584076] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fec1f2e369b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fec1f0a796e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fec1f0a7a8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fec1f0a9210] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x300)[0x7fec1f0a9d60] ))))) 0-aliang-client-0: forced unwinding frame type(GlusterFS 3.3) op(FSYNC(16)) called at 2018-12-11 09:23:04.814263 (xid=0x93b) [2018-12-11 09:24:01.584222] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fec1f2e369b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fec1f0a796e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fec1f0a7a8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7fec1f0a9210] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x300)[0x7fec1f0a9d60] ))))) 0-aliang-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2018-12-11 09:23:19.582306 (xid=0x93c) [2018-12-11 09:25:15.734398] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-aliang-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2018-12-11 09:25:20.916017] E [socket.c:2369:socket_connect_finish] 0-gfapi: connection to 10.66.144.34:24007 failed (Connection timed out); disconnecting socket {"timestamp": {"seconds": 1544520241, "microseconds": 584347}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "__com.redhat_reason": "eother", "node-name": "#block1397", "reason": "Transport endpoint is not connected", "operation": "read", "action": "report"}} {"timestamp": {"seconds": 1544520241, "microseconds": 584738}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "__com.redhat_reason": "eother", "node-name": "#block1397", "reason": "No medium found", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1544520241, "microseconds": 584798}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "__com.redhat_reason": "eother", "node-name": "#block1397", "reason": "No medium found", "operation": "write", "action": "report"}} 7. After gluster server rebooted, check its info: gluster volume status Status of volume: aliang Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick intel-e52650-16-4.englab.nay.redhat.c om:/home/aliang 49152 0 Y 6395 Task Status of Volume aliang ------------------------------------------------------------------------------ There are no active volume tasks 8. Check block-job status and check block info (qemu) info block-jobs No active jobs (qemu) info block drive_image1 (#block1397): gluster://intel-e52650-16-4.englab.nay.redhat.com/aliang/sn2 (qcow2) Attached to: image1 Cache mode: writethrough Backing file: gluster://intel-e52650-16-4.englab.nay.redhat.com/aliang/sn1 (chain depth: 2) drive_image2 (#block337): /home/data1.qcow2 (qcow2) Attached to: image2 Cache mode: writethrough sd0: [not inserted] Removable device: not locked, tray closed 9. Redo block stream. { "execute": "block-stream", "arguments": { "device": "drive_image1"}} *****output************ {"timestamp": {"seconds": 1544520538, "microseconds": 984417}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive_image1"}} {"timestamp": {"seconds": 1544520538, "microseconds": 984499}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive_image1"}} {"return": {}} {"timestamp": {"seconds": 1544520539, "microseconds": 89662}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive_image1", "operation": "read", "action": "report"}} {"timestamp": {"seconds": 1544520539, "microseconds": 89716}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "drive_image1"}} {"timestamp": {"seconds": 1544520539, "microseconds": 89754}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image1", "len": 21474836480, "offset": 3231776768, "speed": 0, "type": "stream", "error": "No medium found"}} {"timestamp": {"seconds": 1544520539, "microseconds": 89779}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "drive_image1"}} {"timestamp": {"seconds": 1544520539, "microseconds": 89810}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "drive_image1"}} ---> block stream failed to executed for "No medium found" error. 10. quit vm (qemu)quit --> qemu and guest hang Additional info: [root@ibm-x3250m4-05 ~]# pstack 21681 Thread 18 (Thread 0x7fec13f4f700 (LWP 21682)): #0 0x00007fec1b71a1c9 in syscall () at /lib64/libc.so.6 #1 0x0000560ddb9e2970 in qemu_event_wait (val=<optimized out>, f=<optimized out>) at /usr/src/debug/qemu-2.12.0/include/qemu/futex.h:29 #2 0x0000560ddb9e2970 in qemu_event_wait (ev=ev@entry=0x560ddc669aa8 <rcu_call_ready_event>) at util/qemu-thread-posix.c:445 #3 0x0000560ddb9f2e9e in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:261 #4 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #5 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 17 (Thread 0x7fec1374e700 (LWP 21683)): #0 0x00007fec1b7152cf in ppoll () at /lib64/libc.so.6 #1 0x0000560ddb9de96b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x0000560ddb9de96b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322 #3 0x0000560ddb9e06e7 in aio_poll (ctx=0x560ddd727900, blocking=blocking@entry=true) at util/aio-posix.c:645 #4 0x0000560ddb7b070e in iothread_run (opaque=0x560ddd745ce0) at iothread.c:64 #5 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #6 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 16 (Thread 0x7fec12f4d700 (LWP 21684)): #0 0x00007fec1b7152cf in ppoll () at /lib64/libc.so.6 #1 0x0000560ddb9de96b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x0000560ddb9de96b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322 #3 0x0000560ddb9e06e7 in aio_poll (ctx=0x560ddd727a40, blocking=blocking@entry=true) at util/aio-posix.c:645 #4 0x0000560ddb7b070e in iothread_run (opaque=0x560ddd745dc0) at iothread.c:64 #5 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #6 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 15 (Thread 0x7fec1274c700 (LWP 21685)): #0 0x00007fec1b7152cf in ppoll () at /lib64/libc.so.6 #1 0x0000560ddb9de96b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x0000560ddb9de96b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322 #3 0x0000560ddb9e06e7 in aio_poll (ctx=0x560ddd727b80, blocking=blocking@entry=true) at util/aio-posix.c:645 #4 0x0000560ddb7b070e in iothread_run (opaque=0x560ddd745ea0) at iothread.c:64 #5 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #6 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 14 (Thread 0x7fec11649700 (LWP 21703)): #0 0x00007fec1b7152cf in ppoll () at /lib64/libc.so.6 #1 0x0000560ddb9de96b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x0000560ddb9de96b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322 #3 0x0000560ddb9e06e7 in aio_poll (ctx=0x560dddad6000, blocking=blocking@entry=true) at util/aio-posix.c:645 #4 0x0000560ddb7b070e in iothread_run (opaque=0x560ddd81c0e0) at iothread.c:64 #5 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #6 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 13 (Thread 0x7fec10e48700 (LWP 21704)): #0 0x00007fec1b9fa965 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x0000560ddb9e2549 in qemu_cond_wait_impl (cond=<optimized out>, mutex=mutex@entry=0x560ddc22ffc0 <qemu_global_mutex>, file=file@entry=0x560ddba76868 "/builddir/build/BUILD/qemu-2.12.0/cpus.c", line=line@entry=1176) at util/qemu-thread-posix.c:164 #2 0x0000560ddb6c4b9f in qemu_wait_io_event (cpu=cpu@entry=0x560dddcbe000) at /usr/src/debug/qemu-2.12.0/cpus.c:1176 #3 0x0000560ddb6c62a0 in qemu_kvm_cpu_thread_fn (arg=0x560dddcbe000) at /usr/src/debug/qemu-2.12.0/cpus.c:1220 #4 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #5 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 12 (Thread 0x7feb4ebff700 (LWP 21713)): #0 0x00007fec1b9fa965 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x0000560ddb9e2549 in qemu_cond_wait_impl (cond=cond@entry=0x560ddd6f0ab0, mutex=mutex@entry=0x560ddd6f0ae8, file=file@entry=0x560ddbb4be67 "ui/vnc-jobs.c", line=line@entry=212) at util/qemu-thread-posix.c:164 #2 0x0000560ddb8fd28f in vnc_worker_thread_loop (queue=queue@entry=0x560ddd6f0ab0) at ui/vnc-jobs.c:212 #3 0x0000560ddb8fd858 in vnc_worker_thread (arg=0x560ddd6f0ab0) at ui/vnc-jobs.c:319 #4 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #5 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 11 (Thread 0x7feb38f77700 (LWP 21814)): #0 0x00007fec1b9fad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007fec1f31ce68 in syncenv_task () at /lib64/libglusterfs.so.0 #2 0x00007fec1f31dd30 in syncenv_processor () at /lib64/libglusterfs.so.0 #3 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #4 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 10 (Thread 0x7feb38776700 (LWP 21815)): #0 0x00007fec1b9fad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007fec1f31ce68 in syncenv_task () at /lib64/libglusterfs.so.0 #2 0x00007fec1f31dd30 in syncenv_processor () at /lib64/libglusterfs.so.0 #3 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #4 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 9 (Thread 0x7fec11f4b700 (LWP 21831)): #0 0x00007fec1b6e6e2d in nanosleep () at /lib64/libc.so.6 #1 0x00007fec1b6e6cc4 in sleep () at /lib64/libc.so.6 #2 0x00007fec1f30a20d in pool_sweeper () at /lib64/libglusterfs.so.0 #3 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #4 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 8 (Thread 0x7feb2ec44700 (LWP 21832)): #0 0x00007fec1b9fad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007fec1f31ce68 in syncenv_task () at /lib64/libglusterfs.so.0 #2 0x00007fec1f31dd30 in syncenv_processor () at /lib64/libglusterfs.so.0 #3 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #4 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 7 (Thread 0x7feb2d406700 (LWP 21833)): #0 0x00007fec1b9fad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007fec1f31ce68 in syncenv_task () at /lib64/libglusterfs.so.0 #2 0x00007fec1f31dd30 in syncenv_processor () at /lib64/libglusterfs.so.0 #3 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #4 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 6 (Thread 0x7feb2abb7700 (LWP 21834)): #0 0x00007fec1b9fde3d in nanosleep () at /lib64/libpthread.so.0 #1 0x00007fec1f2ef9d6 in gf_timer_proc () at /lib64/libglusterfs.so.0 #2 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #3 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 5 (Thread 0x7feb2bc55700 (LWP 21835)): #0 0x00007fec1b9f7f47 in pthread_join () at /lib64/libpthread.so.0 #1 0x00007fec1f33faf8 in event_dispatch_epoll () at /lib64/libglusterfs.so.0 #2 0x00007fec1f5bd634 in glfs_poller () at /lib64/libgfapi.so.0 #3 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #4 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 4 (Thread 0x7feb2b454700 (LWP 21836)): #0 0x00007fec1b720483 in epoll_wait () at /lib64/libc.so.6 #1 0x00007fec1f33f392 in event_dispatch_epoll_worker () at /lib64/libglusterfs.so.0 #2 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #3 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 3 (Thread 0x7feb29318700 (LWP 21838)): #0 0x00007fec1b720483 in epoll_wait () at /lib64/libc.so.6 #1 0x00007fec1f33f392 in event_dispatch_epoll_worker () at /lib64/libglusterfs.so.0 #2 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #3 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 2 (Thread 0x7feb5007c700 (LWP 21843)): #0 0x00007fec1b9fad12 in pthread_cond_timedwait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0 #1 0x00007feb2975fd3c in iot_worker () at /usr/lib64/glusterfs/3.12.2/xlator/performance/io-threads.so #2 0x00007fec1b9f6dd5 in start_thread () at /lib64/libpthread.so.0 #3 0x00007fec1b71fead in clone () at /lib64/libc.so.6 Thread 1 (Thread 0x7fec34e06dc0 (LWP 21681)): #0 0x00007fec1b7152cf in ppoll () at /lib64/libc.so.6 #1 0x0000560ddb9de96b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x0000560ddb9de96b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322 #3 0x0000560ddb9e06e7 in aio_poll (ctx=0x560ddd7277c0, blocking=blocking@entry=true) at util/aio-posix.c:645 #4 0x0000560ddb95c59a in bdrv_prwv_co (child=child@entry=0x560ddeea4140, offset=offset@entry=131072, qiov=qiov@entry=0x7ffc92beda50, is_write=is_write@entry=true, flags=flags@entry=0) at block/io.c:830 #5 0x0000560ddb95c949 in bdrv_pwrite (qiov=0x7ffc92beda50, offset=131072, child=0x560ddeea4140) at block/io.c:966 #6 0x0000560ddb95c949 in bdrv_pwrite (child=0x560ddeea4140, offset=131072, buf=<optimized out>, bytes=<optimized out>) at block/io.c:987 #7 0x0000560ddb9395f5 in qcow2_cache_entry_flush (bs=bs@entry=0x560ddf39e800, c=c@entry=0x560dddfef880, i=i@entry=0) at block/qcow2-cache.c:227 #8 0x0000560ddb93971d in qcow2_cache_write (bs=bs@entry=0x560ddf39e800, c=0x560dddfef880) at block/qcow2-cache.c:248 #9 0x0000560ddb9394ce in qcow2_cache_flush (bs=bs@entry=0x560ddf39e800, c=<optimized out>) at block/qcow2-cache.c:259 #10 0x0000560ddb93957e in qcow2_cache_entry_flush (c=0x560ddee9ee40, c=0x560ddee9ee40, bs=0x560ddf39e800) at block/qcow2-cache.c:170 #11 0x0000560ddb93957e in qcow2_cache_entry_flush (bs=bs@entry=0x560ddf39e800, c=c@entry=0x560ddee9ee40, i=i@entry=6) at block/qcow2-cache.c:194 #12 0x0000560ddb93971d in qcow2_cache_write (bs=bs@entry=0x560ddf39e800, c=0x560ddee9ee40) at block/qcow2-cache.c:248 #13 0x0000560ddb9394ce in qcow2_cache_flush (bs=bs@entry=0x560ddf39e800, c=<optimized out>) at block/qcow2-cache.c:259 #14 0x0000560ddb92a9cc in qcow2_inactivate (bs=bs@entry=0x560ddf39e800) at block/qcow2.c:2124 #15 0x0000560ddb92aa9f in qcow2_close (bs=0x560ddf39e800) at block/qcow2.c:2153 #16 0x0000560ddb90a252 in bdrv_unref (bs=0x560ddf39e800) at block.c:3358 #17 0x0000560ddb90a252 in bdrv_unref (bs=0x560ddf39e800) at block.c:3542 #18 0x0000560ddb90a252 in bdrv_unref (bs=0x560ddf39e800) at block.c:4598 #19 0x0000560ddb94be51 in blk_remove_bs (blk=blk@entry=0x560ddd802000) at block/block-backend.c:784 #20 0x0000560ddb94beab in blk_remove_all_bs () at block/block-backend.c:482 #21 0x0000560ddb90782f in bdrv_close_all () at block.c:3412 #22 0x0000560ddb6829ab in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776 Verified on qemu-kvm-4.1.0-10.module+el8.1.0+4234+33aa4f57.x86_64, core dump issue not exist any more. Set bug's status to "Verified". Test steps: 1.Start guest with qemu cmds: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190820-032540-OesJUJdj,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190820-032540-OesJUJdj,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idbJPqrG \ -chardev socket,id=chardev_serial0,server,path=/var/tmp/serial-serial0-20190820-032540-OesJUJdj,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20190820-032540-OesJUJdj,path=/var/tmp/seabios-20190820-032540-OesJUJdj,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20190820-032540-OesJUJdj,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -object iothread,id=iothread0 \ -object iothread,id=iothread1 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/install.qcow2 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0,iothread=iothread0 \ -device scsi-hd,id=image1,drive=drive_image1,bus=scsi0.0,bootindex=0 \ -drive id=drive_cd1,if=none,snapshot=off,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/linux/RHEL8.1.0-BaseOS-x86_64.iso \ -device ide-cd,id=cd1,drive=drive_cd1,bootindex=2,bus=ide.0,unit=0 \ -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \ -drive id=drive_data1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/data.qcow2 \ -device virtio-scsi-pci,id=scsi1,iothread=iothread1,bus=pcie.0-root-port-6,addr=0x0 \ -device scsi-hd,id=data1,drive=drive_data1,bus=scsi1.0 \ -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:19:6a:3c:a6:a5,id=idq14C2Q,netdev=idHzG7Zk,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idHzG7Zk,vhost=on \ -m 2048 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -monitor stdio \ -device virtio-serial-pci,id=virtio-serial0,bus=pcie_extra_root_port_0,addr=0x0 \ -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 \ -qmp tcp:0:3000,server,nowait \ 2. Do snapshot to gluster server. {"execute": "blockdev-snapshot-sync", "arguments": { "device": "drive_image1","snapshot-file": "gluster://intel-5405-32-2.englab.nay.redhat.com/aliang/sn1","mode": "absolute-paths", "format": "qcow2" } } {"return": {}} {"execute": "blockdev-snapshot-sync", "arguments": { "device": "drive_image1","snapshot-file": "gluster://intel-5405-32-2.englab.nay.redhat.com/aliang/sn2","mode": "absolute-paths", "format": "qcow2" } } {"return": {}} 3. Do block stream. { "execute": "block-stream", "arguments": { "device": "drive_image1"}} {"timestamp": {"seconds": 1568776861, "microseconds": 444698}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive_image1"}} {"timestamp": {"seconds": 1568776861, "microseconds": 444746}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive_image1"}} {"return": {}} 4. During block stream, stop gluster volume (gluster server)#gluster volume stop aliang {"timestamp": {"seconds": 1568776870, "microseconds": 948800}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive_image1", "operation": "read", "action": "report"}} {"timestamp": {"seconds": 1568776870, "microseconds": 948955}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "drive_image1"}} {"timestamp": {"seconds": 1568776870, "microseconds": 948991}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image1", "len": 21474836480, "offset": 3704487936, "speed": 0, "type": "stream", "error": "Transport endpoint is not connected"}} {"timestamp": {"seconds": 1568776870, "microseconds": 949049}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "drive_image1"}} {"timestamp": {"seconds": 1568776870, "microseconds": 949084}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "drive_image1"}} {"timestamp": {"seconds": 1568776908, "microseconds": 811333}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776908, "microseconds": 819193}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776908, "microseconds": 827249}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776908, "microseconds": 835242}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776908, "microseconds": 843249}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776908, "microseconds": 851220}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 232756}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 241696}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 249699}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 257620}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 265695}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 273698}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 274119}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 282483}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 290499}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 298501}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 306501}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1568776934, "microseconds": 314472}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": false, "node-name": "#block1527", "reason": "Transport endpoint is not connected", "operation": "write", "action": "report"}} 5. Quit vm (qemu)quit After step5, vm quit successfully with info: (qemu) quit qemu-kvm: Failed to flush the L2 table cache: Transport endpoint is not connected qemu-kvm: Failed to flush the refcount block cache: Transport endpoint is not connected [2019-09-18 03:24:42.821220] I [io-stats.c:4027:fini] 0-aliang: io-stats translator unloaded Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723 |