Bug 1461827
Summary: | QEMU hangs in aio wait when trying to access NBD volume over TLS | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniel Berrangé <berrange> |
Component: | qemu-kvm-rhev | Assignee: | Paolo Bonzini <pbonzini> |
Status: | CLOSED ERRATA | QA Contact: | Suqin Huang <shuang> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | aliang, berrange, chayang, coli, famz, juzhang, knoel, lmiksik, michen, mrezanin, pbonzini, qzhang, stefanha, virt-maint |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.9.0-12.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 04:43:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1300770 |
Description
Daniel Berrangé
2017-06-15 12:16:00 UTC
Testing with lastest upstream GIT master, I managed to capture a trace of the NBD coroutine: #0 0x0000555555b60710 in qemu_coroutine_switch (from_=from_@entry=0x55555897df50, to_=to_@entry=0x7ffff7edae58, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:176 #1 0x0000555555b5f821 in qemu_coroutine_yield () at util/qemu-coroutine.c:172 #2 0x0000555555ada920 in nbd_co_receive_reply (request=0x7fffc23decc0, request=0x7fffc23decc0, qiov=0x7fffffffd090, reply=<synthetic pointer>, s=0x555556795ed0) at block/nbd-client.c:169 #3 0x0000555555ada920 in nbd_client_co_preadv (bs=0x555556790c40, offset=<optimized out>, bytes=<optimized out>, qiov=0x7fffffffd090, flags=0) at block/nbd-client.c:226 #4 0x0000555555ad3f53 in bdrv_driver_preadv (bs=bs@entry=0x555556790c40, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7fffffffd090, flags=0) at block/io.c:834 #5 0x0000555555ad7495 in bdrv_aligned_preadv (child=child@entry=0x55555678cb10, req=req@entry=0x7fffc23dee90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7fffffffd090, flags=0) at block/io.c:1083 #6 0x0000555555ad7743 in bdrv_co_preadv (child=0x55555678cb10, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7fffffffd090, flags=flags@entry=0) at block/io.c:1177 #7 0x0000555555ac810a in blk_co_preadv (blk=0x555556789be0, offset=0, bytes=512, qiov=0x7fffffffd090, flags=0) at block/block-backend.c:990 #8 0x0000555555ac81ec in blk_read_entry (opaque=0x7fffffffd0b0) at block/block-backend.c:1037 #9 0x0000555555b6077a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79 #10 0x00007fffdabdebf0 in __start_context () at /lib64/libc.so.6 #11 0x00007fffffffc8f0 in () #12 0x0000000000000000 in () (NB, the stack trace in bug description is qemu-kvm-rhev RPM source, this stack trace is today's git master upstream, so beware of filenames/line numbers) IIUC, the NBD coroutine has yielded, waiting for the response from the NBD server - presumably it is expecting that response to trigger the main loop event callback to yield back into the coroutine. The main thread hasn't got as far as running the main loop yet though, so AFAICT, the yield will never come. (In reply to Daniel Berrange from comment #0) > Current upstream GIT master suffers the same flaw, and git bisect shows this > is a regression introduced in the 2.9.0 release with this commit: > > > commit ff82911cd3f69f028f2537825c9720ff78bc3f19 > Author: Paolo Bonzini <pbonzini> > Date: Mon Feb 13 14:52:24 2017 +0100 > > nbd: convert to use qio_channel_yield > > [...] > > Reviewed-by: Stefan Hajnoczi <stefanha> > Reviewed-by: Fam Zheng <famz> Reassigning to Paolo (Eric would be my second choice, but he's on PTO). (In reply to Ademar Reis from comment #4) > (In reply to Daniel Berrange from comment #0) > > Current upstream GIT master suffers the same flaw, and git bisect shows this > > is a regression introduced in the 2.9.0 release with this commit: > > > > > > commit ff82911cd3f69f028f2537825c9720ff78bc3f19 > > Author: Paolo Bonzini <pbonzini> > > Date: Mon Feb 13 14:52:24 2017 +0100 > > > > nbd: convert to use qio_channel_yield > > > > [...] > > > > Reviewed-by: Stefan Hajnoczi <stefanha> > > Reviewed-by: Fam Zheng <famz> > > Reassigning to Paolo (Eric would be my second choice, but he's on PTO). For real this time. :-/ Patch is ready. Fix included in qemu-kvm-rhev-2.9.0-12.el7 Result: boot up and login the guest successfully Server: qemu-nbd -f raw --object tls-creds-x509,id=tls0,endpoint=server,dir=/root/spice_x509-sJF --tls-creds tls0 rhel74-64-virtio-scsi.qcow2 -p 9000 -t Client: -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/spice_x509-sJF \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=nbd://hp-dl388g8-16.rhts.eng.pek2.redhat.com:9000,file.tls-creds=tls0 \ Package: qemu-kvm-rhev-2.9.0-12.el7.x86_64 Hi Paolo, Any other test do i need to run? Thanks Suqin No, it's okay. Thanks! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |