Bug 1461827

Summary: QEMU hangs in aio wait when trying to access NBD volume over TLS
Product: Red Hat Enterprise Linux 7 Reporter: Daniel Berrangé <berrange>
Component: qemu-kvm-rhevAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED ERRATA QA Contact: Suqin Huang <shuang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.4CC: aliang, berrange, chayang, coli, famz, juzhang, knoel, lmiksik, michen, mrezanin, pbonzini, qzhang, stefanha, virt-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-12.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:43:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1300770    

Description Daniel Berrangé 2017-06-15 12:16:00 UTC
Description of problem:

Export a disk image using qemu-nbd, enabling use of TLS. For example

 $ qemu-nbd  --object tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/security/qemutls --tls-creds tls0  /home/berrange/Fedora-Workstation-Live-x86_64-25-1.3.iso -p 9000 -t

Now run QEMU, pointing it to this disk image

 $ qemu-system-x86_64 -object tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/security/qemutls -drive driver=nbd,host=127.0.0.1,port=9000,id=drv0,if=none,tls-creds=tls0    -device virtio-blk-pci,id=image1,drive=drv0,bootindex=0 

QEMU will hang at this point, never getting as far as running the main loop. A GDB stack trace is:

#0  0x00007f1e1b335aff in __GI_ppoll (fds=0x55c0411e3600, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1  0x000055c03c97d3fb in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x000055c03c97d3fb in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0x000055c03c97f0b5 in aio_poll (ctx=ctx@entry=0x55c03f955700, blocking=<optimized out>) at util/aio-posix.c:622
#4  0x000055c03c8ffa64 in blk_prw (blk=blk@entry=0x55c03f980000, offset=offset@entry=0, buf=buf@entry=0x7ffc4f03f2c0 "@", bytes=bytes@entry=512, co_entry=co_entry@entry=0x55c03c900f40 <blk_read_entry>, flags=flags@entry=0) at block/block-backend.c:1052
#5  0x000055c03c90115a in blk_pread_unthrottled (count=512, buf=0x7ffc4f03f2c0, offset=0, blk=0x55c03f980000) at block/block-backend.c:1201
---Type <return> to continue, or q <return> to quit---
#6  0x000055c03c90115a in blk_pread_unthrottled (blk=blk@entry=0x55c03f980000, offset=offset@entry=0, buf=buf@entry=0x7ffc4f03f2c0 "@", count=count@entry=512)
    at block/block-backend.c:1069
#7  0x000055c03c7c8e9f in guess_disk_lchs (blk=blk@entry=0x55c03f980000, pcylinders=pcylinders@entry=0x7ffc4f03f50c, pheads=pheads@entry=0x7ffc4f03f510, psectors=psectors@entry=0x7ffc4f03f514) at hw/block/hd-geometry.c:70
#8  0x000055c03c7c9007 in hd_geometry_guess (blk=0x55c03f980000, pcyls=pcyls@entry=0x55c03f977974, pheads=pheads@entry=0x55c03f977978, psecs=psecs@entry=0x55c03f97797c, ptrans=ptrans@entry=0x55c03f977990) at hw/block/hd-geometry.c:135
#9  0x000055c03c7c8b82 in blkconf_geometry (conf=conf@entry=0x55c03f977958, ptrans=ptrans@entry=0x55c03f977990, cyls_max=cyls_max@entry=65535, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7ffc4f03f5f0) at hw/block/block.c:123
#10 0x000055c03c8123cf in ide_dev_initfn (dev=0x55c03f9778e0, kind=IDE_HD) at hw/ide/qdev.c:194
#11 0x000055c03c7d8944 in device_realize (dev=0x55c03f9778e0, errp=0x7ffc4f03f680) at hw/core/qdev.c:228
#12 0x000055c03c7da1c1 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc4f03f770) at hw/core/qdev.c:939
#13 0x000055c03c8c069e in property_set_bool (obj=0x55c03f9778e0, v=<optimized out>, name=<optimized out>, opaque=0x55c040960a70, errp=0x7ffc4f03f770) at qom/object.c:1860
#14 0x000055c03c8c435f in object_property_set_qobject (obj=0x55c03f9778e0, value=<optimized out>, name=0x55c03c9ea28b "realized", errp=0x7ffc4f03f770) at qom/qom-qobject.c:27
#15 0x000055c03c8c21d0 in object_property_set_bool (obj=0x55c03f9778e0, value=<optimized out>, name=0x55c03c9ea28b "realized", errp=0x7ffc4f03f770) at qom/object.c:1163
#16 0x000055c03c7d8f92 in qdev_init_nofail (dev=dev@entry=0x55c03f9778e0) at hw/core/qdev.c:373
#17 0x000055c03c812794 in ide_create_drive (bus=bus@entry=0x55c0411989f0, unit=unit@entry=0, drive=0x55c03f91b400) at hw/ide/qdev.c:132
#18 0x000055c03c812f9e in pci_ide_create_devs (dev=dev@entry=0x55c041198000, hd_table=hd_table@entry=0x7ffc4f03f870) at hw/ide/pci.c:430
#19 0x000055c03c8136e5 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7ffc4f03f870, devfn=<optimized out>) at hw/ide/piix.c:231
#20 0x000055c03c711ce5 in pc_init1 (machine=0x55c03f95e3c0, pci_type=0x55c03c9b094a "i440FX", host_type=0x55c03c9b0951 "i440FX-pcihost")
    at /usr/src/debug/qemu-2.9.0/hw/i386/pc_piix.c:249
#21 0x000055c03c66cb60 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4579

Current upstream GIT master suffers the same flaw, and git bisect shows this is a regression introduced in the 2.9.0 release with this commit:


commit ff82911cd3f69f028f2537825c9720ff78bc3f19
Author: Paolo Bonzini <pbonzini>
Date:   Mon Feb 13 14:52:24 2017 +0100

    nbd: convert to use qio_channel_yield
    
    In the client, read the reply headers from a coroutine, switching the
    read side between the "read header" coroutine and the I/O coroutine that
    reads the body of the reply.
    
    In the server, if the server can read more requests it will create a new
    "read request" coroutine as soon as a request has been read.  Otherwise,
    the new coroutine is created in nbd_request_put.
    
    Reviewed-by: Stefan Hajnoczi <stefanha>
    Signed-off-by: Paolo Bonzini <pbonzini>
    Reviewed-by: Fam Zheng <famz>
    Reviewed-by: Daniel P. Berrange <berrange>
    Message-id: 20170213135235.12274-8-pbonzini
    Signed-off-by: Stefan Hajnoczi <stefanha>





Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-10.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1.  $ qemu-nbd  --object tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/security/qemutls --tls-creds tls0  /home/berrange/Fedora-Workstation-Live-x86_64-25-1.3.iso -p 9000 -t
2.  $ qemu-system-x86_64 -object tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/security/qemutls -drive driver=nbd,host=127.0.0.1,port=9000,id=drv0,if=none,tls-creds=tls0    -device virtio-blk-pci,id=image1,drive=drv0,bootindex=0 


Actual results:
QEMU hangs when initializing machine

Expected results:
QEMU runs normally

Comment 3 Daniel Berrangé 2017-06-15 13:02:06 UTC
Testing with lastest upstream GIT master, I managed to capture a trace of the NBD coroutine:

#0  0x0000555555b60710 in qemu_coroutine_switch (from_=from_@entry=0x55555897df50, to_=to_@entry=0x7ffff7edae58, action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:176
#1  0x0000555555b5f821 in qemu_coroutine_yield () at util/qemu-coroutine.c:172
#2  0x0000555555ada920 in nbd_co_receive_reply (request=0x7fffc23decc0, request=0x7fffc23decc0, qiov=0x7fffffffd090, reply=<synthetic pointer>, s=0x555556795ed0) at block/nbd-client.c:169
#3  0x0000555555ada920 in nbd_client_co_preadv (bs=0x555556790c40, offset=<optimized out>, bytes=<optimized out>, qiov=0x7fffffffd090, flags=0)
    at block/nbd-client.c:226
#4  0x0000555555ad3f53 in bdrv_driver_preadv (bs=bs@entry=0x555556790c40, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7fffffffd090, flags=0) at block/io.c:834
#5  0x0000555555ad7495 in bdrv_aligned_preadv (child=child@entry=0x55555678cb10, req=req@entry=0x7fffc23dee90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7fffffffd090, flags=0) at block/io.c:1083
#6  0x0000555555ad7743 in bdrv_co_preadv (child=0x55555678cb10, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7fffffffd090, flags=flags@entry=0) at block/io.c:1177
#7  0x0000555555ac810a in blk_co_preadv (blk=0x555556789be0, offset=0, bytes=512, qiov=0x7fffffffd090, flags=0) at block/block-backend.c:990
#8  0x0000555555ac81ec in blk_read_entry (opaque=0x7fffffffd0b0) at block/block-backend.c:1037
#9  0x0000555555b6077a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
#10 0x00007fffdabdebf0 in __start_context () at /lib64/libc.so.6
#11 0x00007fffffffc8f0 in  ()
#12 0x0000000000000000 in  ()


(NB, the stack trace in bug description is qemu-kvm-rhev RPM source, this stack trace is today's git master upstream, so beware of filenames/line numbers)


IIUC, the NBD coroutine has yielded, waiting for the response from the NBD server - presumably it is expecting that response to trigger the main loop event callback to yield back into the coroutine. The main thread hasn't got as far as running the main loop yet though, so AFAICT, the yield will never come.

Comment 4 Ademar Reis 2017-06-15 16:48:26 UTC
(In reply to Daniel Berrange from comment #0)
> Current upstream GIT master suffers the same flaw, and git bisect shows this
> is a regression introduced in the 2.9.0 release with this commit:
> 
> 
> commit ff82911cd3f69f028f2537825c9720ff78bc3f19
> Author: Paolo Bonzini <pbonzini>
> Date:   Mon Feb 13 14:52:24 2017 +0100
> 
>     nbd: convert to use qio_channel_yield
> 
> [...]
> 
>     Reviewed-by: Stefan Hajnoczi <stefanha>
>     Reviewed-by: Fam Zheng <famz>

Reassigning to Paolo (Eric would be my second choice, but he's on PTO).

Comment 5 Ademar Reis 2017-06-15 16:53:38 UTC
(In reply to Ademar Reis from comment #4)
> (In reply to Daniel Berrange from comment #0)
> > Current upstream GIT master suffers the same flaw, and git bisect shows this
> > is a regression introduced in the 2.9.0 release with this commit:
> > 
> > 
> > commit ff82911cd3f69f028f2537825c9720ff78bc3f19
> > Author: Paolo Bonzini <pbonzini>
> > Date:   Mon Feb 13 14:52:24 2017 +0100
> > 
> >     nbd: convert to use qio_channel_yield
> > 
> > [...]
> > 
> >     Reviewed-by: Stefan Hajnoczi <stefanha>
> >     Reviewed-by: Fam Zheng <famz>
> 
> Reassigning to Paolo (Eric would be my second choice, but he's on PTO).

For real this time. :-/

Comment 7 Paolo Bonzini 2017-06-16 10:29:11 UTC
Patch is ready.

Comment 8 Miroslav Rezanina 2017-06-20 06:03:13 UTC
Fix included in qemu-kvm-rhev-2.9.0-12.el7

Comment 10 Suqin Huang 2017-06-20 06:50:21 UTC
Result:
boot up and login the guest successfully

Server:
qemu-nbd -f raw --object tls-creds-x509,id=tls0,endpoint=server,dir=/root/spice_x509-sJF --tls-creds tls0  rhel74-64-virtio-scsi.qcow2 -p 9000 -t


Client:

    -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/spice_x509-sJF \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=nbd://hp-dl388g8-16.rhts.eng.pek2.redhat.com:9000,file.tls-creds=tls0 \

Package:
qemu-kvm-rhev-2.9.0-12.el7.x86_64

Comment 11 Suqin Huang 2017-06-20 06:50:58 UTC
Hi Paolo,

Any other test do i need to run?

Thanks
Suqin

Comment 12 Paolo Bonzini 2017-06-21 10:19:51 UTC
No, it's okay.  Thanks!

Comment 15 errata-xmlrpc 2017-08-02 04:43:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392