RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1461827 - QEMU hangs in aio wait when trying to access NBD volume over TLS
Summary: QEMU hangs in aio wait when trying to access NBD volume over TLS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Suqin Huang
URL:
Whiteboard:
Depends On:
Blocks: 1300770
TreeView+ depends on / blocked
 
Reported: 2017-06-15 12:16 UTC by Daniel Berrangé
Modified: 2017-08-02 04:43 UTC (History)
14 users (show)

Fixed In Version: qemu-kvm-rhev-2.9.0-12.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-02 04:43:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 0 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC

Description Daniel Berrangé 2017-06-15 12:16:00 UTC
Description of problem:

Export a disk image using qemu-nbd, enabling use of TLS. For example

 $ qemu-nbd  --object tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/security/qemutls --tls-creds tls0  /home/berrange/Fedora-Workstation-Live-x86_64-25-1.3.iso -p 9000 -t

Now run QEMU, pointing it to this disk image

 $ qemu-system-x86_64 -object tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/security/qemutls -drive driver=nbd,host=127.0.0.1,port=9000,id=drv0,if=none,tls-creds=tls0    -device virtio-blk-pci,id=image1,drive=drv0,bootindex=0 

QEMU will hang at this point, never getting as far as running the main loop. A GDB stack trace is:

#0  0x00007f1e1b335aff in __GI_ppoll (fds=0x55c0411e3600, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1  0x000055c03c97d3fb in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x000055c03c97d3fb in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0x000055c03c97f0b5 in aio_poll (ctx=ctx@entry=0x55c03f955700, blocking=<optimized out>) at util/aio-posix.c:622
#4  0x000055c03c8ffa64 in blk_prw (blk=blk@entry=0x55c03f980000, offset=offset@entry=0, buf=buf@entry=0x7ffc4f03f2c0 "@", bytes=bytes@entry=512, co_entry=co_entry@entry=0x55c03c900f40 <blk_read_entry>, flags=flags@entry=0) at block/block-backend.c:1052
#5  0x000055c03c90115a in blk_pread_unthrottled (count=512, buf=0x7ffc4f03f2c0, offset=0, blk=0x55c03f980000) at block/block-backend.c:1201
---Type <return> to continue, or q <return> to quit---
#6  0x000055c03c90115a in blk_pread_unthrottled (blk=blk@entry=0x55c03f980000, offset=offset@entry=0, buf=buf@entry=0x7ffc4f03f2c0 "@", count=count@entry=512)
    at block/block-backend.c:1069
#7  0x000055c03c7c8e9f in guess_disk_lchs (blk=blk@entry=0x55c03f980000, pcylinders=pcylinders@entry=0x7ffc4f03f50c, pheads=pheads@entry=0x7ffc4f03f510, psectors=psectors@entry=0x7ffc4f03f514) at hw/block/hd-geometry.c:70
#8  0x000055c03c7c9007 in hd_geometry_guess (blk=0x55c03f980000, pcyls=pcyls@entry=0x55c03f977974, pheads=pheads@entry=0x55c03f977978, psecs=psecs@entry=0x55c03f97797c, ptrans=ptrans@entry=0x55c03f977990) at hw/block/hd-geometry.c:135
#9  0x000055c03c7c8b82 in blkconf_geometry (conf=conf@entry=0x55c03f977958, ptrans=ptrans@entry=0x55c03f977990, cyls_max=cyls_max@entry=65535, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7ffc4f03f5f0) at hw/block/block.c:123
#10 0x000055c03c8123cf in ide_dev_initfn (dev=0x55c03f9778e0, kind=IDE_HD) at hw/ide/qdev.c:194
#11 0x000055c03c7d8944 in device_realize (dev=0x55c03f9778e0, errp=0x7ffc4f03f680) at hw/core/qdev.c:228
#12 0x000055c03c7da1c1 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc4f03f770) at hw/core/qdev.c:939
#13 0x000055c03c8c069e in property_set_bool (obj=0x55c03f9778e0, v=<optimized out>, name=<optimized out>, opaque=0x55c040960a70, errp=0x7ffc4f03f770) at qom/object.c:1860
#14 0x000055c03c8c435f in object_property_set_qobject (obj=0x55c03f9778e0, value=<optimized out>, name=0x55c03c9ea28b "realized", errp=0x7ffc4f03f770) at qom/qom-qobject.c:27
#15 0x000055c03c8c21d0 in object_property_set_bool (obj=0x55c03f9778e0, value=<optimized out>, name=0x55c03c9ea28b "realized", errp=0x7ffc4f03f770) at qom/object.c:1163
#16 0x000055c03c7d8f92 in qdev_init_nofail (dev=dev@entry=0x55c03f9778e0) at hw/core/qdev.c:373
#17 0x000055c03c812794 in ide_create_drive (bus=bus@entry=0x55c0411989f0, unit=unit@entry=0, drive=0x55c03f91b400) at hw/ide/qdev.c:132
#18 0x000055c03c812f9e in pci_ide_create_devs (dev=dev@entry=0x55c041198000, hd_table=hd_table@entry=0x7ffc4f03f870) at hw/ide/pci.c:430
#19 0x000055c03c8136e5 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7ffc4f03f870, devfn=<optimized out>) at hw/ide/piix.c:231
#20 0x000055c03c711ce5 in pc_init1 (machine=0x55c03f95e3c0, pci_type=0x55c03c9b094a "i440FX", host_type=0x55c03c9b0951 "i440FX-pcihost")
    at /usr/src/debug/qemu-2.9.0/hw/i386/pc_piix.c:249
#21 0x000055c03c66cb60 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4579

Current upstream GIT master suffers the same flaw, and git bisect shows this is a regression introduced in the 2.9.0 release with this commit:


commit ff82911cd3f69f028f2537825c9720ff78bc3f19
Author: Paolo Bonzini <pbonzini>
Date:   Mon Feb 13 14:52:24 2017 +0100

    nbd: convert to use qio_channel_yield
    
    In the client, read the reply headers from a coroutine, switching the
    read side between the "read header" coroutine and the I/O coroutine that
    reads the body of the reply.
    
    In the server, if the server can read more requests it will create a new
    "read request" coroutine as soon as a request has been read.  Otherwise,
    the new coroutine is created in nbd_request_put.
    
    Reviewed-by: Stefan Hajnoczi <stefanha>
    Signed-off-by: Paolo Bonzini <pbonzini>
    Reviewed-by: Fam Zheng <famz>
    Reviewed-by: Daniel P. Berrange <berrange>
    Message-id: 20170213135235.12274-8-pbonzini
    Signed-off-by: Stefan Hajnoczi <stefanha>





Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-10.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1.  $ qemu-nbd  --object tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/security/qemutls --tls-creds tls0  /home/berrange/Fedora-Workstation-Live-x86_64-25-1.3.iso -p 9000 -t
2.  $ qemu-system-x86_64 -object tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/security/qemutls -drive driver=nbd,host=127.0.0.1,port=9000,id=drv0,if=none,tls-creds=tls0    -device virtio-blk-pci,id=image1,drive=drv0,bootindex=0 


Actual results:
QEMU hangs when initializing machine

Expected results:
QEMU runs normally

Comment 3 Daniel Berrangé 2017-06-15 13:02:06 UTC
Testing with lastest upstream GIT master, I managed to capture a trace of the NBD coroutine:

#0  0x0000555555b60710 in qemu_coroutine_switch (from_=from_@entry=0x55555897df50, to_=to_@entry=0x7ffff7edae58, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:176
#1  0x0000555555b5f821 in qemu_coroutine_yield () at util/qemu-coroutine.c:172
#2  0x0000555555ada920 in nbd_co_receive_reply (request=0x7fffc23decc0, request=0x7fffc23decc0, qiov=0x7fffffffd090, reply=<synthetic pointer>, s=0x555556795ed0) at block/nbd-client.c:169
#3  0x0000555555ada920 in nbd_client_co_preadv (bs=0x555556790c40, offset=<optimized out>, bytes=<optimized out>, qiov=0x7fffffffd090, flags=0)
    at block/nbd-client.c:226
#4  0x0000555555ad3f53 in bdrv_driver_preadv (bs=bs@entry=0x555556790c40, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7fffffffd090, flags=0) at block/io.c:834
#5  0x0000555555ad7495 in bdrv_aligned_preadv (child=child@entry=0x55555678cb10, req=req@entry=0x7fffc23dee90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7fffffffd090, flags=0) at block/io.c:1083
#6  0x0000555555ad7743 in bdrv_co_preadv (child=0x55555678cb10, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7fffffffd090, flags=flags@entry=0) at block/io.c:1177
#7  0x0000555555ac810a in blk_co_preadv (blk=0x555556789be0, offset=0, bytes=512, qiov=0x7fffffffd090, flags=0) at block/block-backend.c:990
#8  0x0000555555ac81ec in blk_read_entry (opaque=0x7fffffffd0b0) at block/block-backend.c:1037
#9  0x0000555555b6077a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
#10 0x00007fffdabdebf0 in __start_context () at /lib64/libc.so.6
#11 0x00007fffffffc8f0 in  ()
#12 0x0000000000000000 in  ()


(NB, the stack trace in bug description is qemu-kvm-rhev RPM source, this stack trace is today's git master upstream, so beware of filenames/line numbers)


IIUC, the NBD coroutine has yielded, waiting for the response from the NBD server - presumably it is expecting that response to trigger the main loop event callback to yield back into the coroutine. The main thread hasn't got as far as running the main loop yet though, so AFAICT, the yield will never come.

Comment 4 Ademar Reis 2017-06-15 16:48:26 UTC
(In reply to Daniel Berrange from comment #0)
> Current upstream GIT master suffers the same flaw, and git bisect shows this
> is a regression introduced in the 2.9.0 release with this commit:
> 
> 
> commit ff82911cd3f69f028f2537825c9720ff78bc3f19
> Author: Paolo Bonzini <pbonzini>
> Date:   Mon Feb 13 14:52:24 2017 +0100
> 
>     nbd: convert to use qio_channel_yield
> 
> [...]
> 
>     Reviewed-by: Stefan Hajnoczi <stefanha>
>     Reviewed-by: Fam Zheng <famz>

Reassigning to Paolo (Eric would be my second choice, but he's on PTO).

Comment 5 Ademar Reis 2017-06-15 16:53:38 UTC
(In reply to Ademar Reis from comment #4)
> (In reply to Daniel Berrange from comment #0)
> > Current upstream GIT master suffers the same flaw, and git bisect shows this
> > is a regression introduced in the 2.9.0 release with this commit:
> > 
> > 
> > commit ff82911cd3f69f028f2537825c9720ff78bc3f19
> > Author: Paolo Bonzini <pbonzini>
> > Date:   Mon Feb 13 14:52:24 2017 +0100
> > 
> >     nbd: convert to use qio_channel_yield
> > 
> > [...]
> > 
> >     Reviewed-by: Stefan Hajnoczi <stefanha>
> >     Reviewed-by: Fam Zheng <famz>
> 
> Reassigning to Paolo (Eric would be my second choice, but he's on PTO).

For real this time. :-/

Comment 7 Paolo Bonzini 2017-06-16 10:29:11 UTC
Patch is ready.

Comment 8 Miroslav Rezanina 2017-06-20 06:03:13 UTC
Fix included in qemu-kvm-rhev-2.9.0-12.el7

Comment 10 Suqin Huang 2017-06-20 06:50:21 UTC
Result:
boot up and login the guest successfully

Server:
qemu-nbd -f raw --object tls-creds-x509,id=tls0,endpoint=server,dir=/root/spice_x509-sJF --tls-creds tls0  rhel74-64-virtio-scsi.qcow2 -p 9000 -t


Client:

    -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/spice_x509-sJF \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=nbd://hp-dl388g8-16.rhts.eng.pek2.redhat.com:9000,file.tls-creds=tls0 \

Package:
qemu-kvm-rhev-2.9.0-12.el7.x86_64

Comment 11 Suqin Huang 2017-06-20 06:50:58 UTC
Hi Paolo,

Any other test do i need to run?

Thanks
Suqin

Comment 12 Paolo Bonzini 2017-06-21 10:19:51 UTC
No, it's okay.  Thanks!

Comment 15 errata-xmlrpc 2017-08-02 04:43:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.