Bug 1300770

Summary: RFE: add support for native TLS encryption on NBD client/server transports
Product: Red Hat Enterprise Linux 7 Reporter: Daniel Berrangé <berrange>
Component: qemu-kvm-rhevAssignee: Daniel Berrangé <berrange>
Status: CLOSED ERRATA QA Contact: Suqin Huang <shuang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: aliang, berrange, chayang, coli, hachen, huding, jen, juzhang, knoel, meyang, michen, mrezanin, ngu, pbonzini, pingl, virt-maint, xfu, xuwei
Target Milestone: rcKeywords: FutureFeature, TestOnly
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.6.0-1.el7 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1300772 (view as bug list) Environment:
Last Closed: 2017-08-01 23:29:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1461827    
Bug Blocks: 1300772, 1544869    
Attachments:
Description Flags
gdb debug info none

Description Daniel Berrangé 2016-01-21 16:50:29 UTC
Description of problem:
The NBD protocol currently runs in clear text, offering no security protection for the data transferred, unless it is tunnelled over some external transport like SSH. Such tunnelling is inefficient and inconvenient to manage, so there is a desire to add explicit support for TLS to the NBD clients & servers provided by QEMU.

A particular focus is on the need to have encryption of NBD channels used for disk copy during migration.

Latest patch series implementing TLS for NBD is

https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03440.html

Comment 3 Jeff Nelson 2017-05-22 13:20:20 UTC
Mirek: Add to 7.4 advisory?

Comment 6 Suqin Huang 2017-06-01 08:50:32 UTC
retry with:

Server: 
# qemu-nbd -f raw --object tls-creds-x509,id=tls0,endpoint=server,dir=/root/spice_x509-t0f/ --tls-creds tls0  /var/lib/avocado/data/avocado-vt/images/rhel74-64-virtio.qcow2 -p 9000 -t


Client:

    -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/spice_x509-t0f/ \
    -drive driver=nbd,host=10.73.196.167,port=9000,tls-creds=tls0 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=03 \


Error:
qemu-kvm: -drive driver=nbd,host=10.73.196.167,port=9000,tls-creds=tls0: Certificate does not match the hostname 10.73.196.167

Comment 7 Daniel Berrangé 2017-06-01 09:12:58 UTC
(In reply to Suqin Huang from comment #6)
> Error:
> qemu-kvm: -drive driver=nbd,host=10.73.196.167,port=9000,tls-creds=tls0:
> Certificate does not match the hostname 10.73.196.167

Your 'genx509dir.sh' script uses the output of 'hostname' as the certificate hostname. I expect this is an actual hostname, not the ip address 10.73.196.167. IOW, you need to use  host=$HOSTNAME in the qemu -drive arg, not the IP address. Alternatively you need to improve your genx509dir.sh script, so that it includes the IP address in a subject-alt-name field.

Comment 8 Suqin Huang 2017-06-01 09:19:52 UTC
The qemu-kvm process hang



  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                              
25893 root      20   0 2713508  30012  12300 S   0.0  0.1   0:00.19 qemu-kvm  


kvm statistics - summary

 Event                                        Total Current


Server:

# qemu-nbd -f raw --object tls-creds-x509,id=tls0,endpoint=server,dir=/root/spice_x509-gP9/ --tls-creds tls0  /var/lib/avocado/data/avocado-vt/images/rhel74-64-virtio.qcow2 -p 9000 -t

Client:

    -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/spice_x509-gP9 \
    -drive driver=nbd,host=hp-dl385g7-04.lab.eng.pek2.redhat.com,port=9000,tls-creds=tls0 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=03 \

Comment 9 Suqin Huang 2017-06-09 07:16:18 UTC
Hi Dan,

Could you check comment8, guest hang when boot with tls-creds

Regards
Suqin

Comment 10 Daniel Berrangé 2017-06-15 09:53:06 UTC
Can you try to run the qemu process with some debugging turned on - eg add

  -d trace:qio*,trace:qcrypto*

to the command line args of the qemu-kvm process and that post the debug messages here

Comment 11 Suqin Huang 2017-06-15 10:01:47 UTC
Log items (comma separated):
out_asm         show generated host assembly code for each compiled TB
in_asm          show target assembly code for each compiled TB
op              show micro ops for each compiled TB
op_opt          show micro ops after optimization
op_ind          show micro ops before indirect lowering
int             show interrupts/exceptions in short format
exec            show trace before each executed TB (lots of logs)
cpu             show CPU registers before entering a TB (lots of logs)
mmu             log MMU-related activities
pcall           x86 only: show protected mode far calls/returns/exceptions
cpu_reset       show CPU state before CPU resets
unimp           log unimplemented functionality
guest_errors    log when the guest OS does something invalid (eg accessing a
non-existent register)
page            dump pages at beginning of user mode emulation
nochain         do not chain compiled TBs so that "exec" and "cpu" show
complete traces

Comment 12 Suqin Huang 2017-06-15 10:05:49 UTC
Package: qemu-kvm-rhev-2.9.0-10.el7.x86_64

Comment 13 Daniel Berrangé 2017-06-15 10:13:27 UTC
Urgh, unfortunately this shows that the debug feature is disabled in RHEL. Can you instead try to attach to the QEMU process with GDB.  First you'll need to install the debuginfo rpm

http://download-ipv4.eng.brq.redhat.com/brewroot/packages/qemu-kvm-rhev/2.9.0/10.el7/x86_64/qemu-kvm-rhev-debuginfo-2.9.0-10.el7.x86_64.rpm

Then run QEMU normally, and use 'ps' to find its PID. Now you can attach with GDB

  $ gdb -p PID-OF-QEMU

and then when you see the "(gdb)" prompt run 'thread apply all bt' and save the (very long !) output to a file and attach it to this bug

Comment 14 Suqin Huang 2017-06-15 10:47:47 UTC
Created attachment 1287996 [details]
gdb debug info

Comment 15 Daniel Berrangé 2017-06-15 10:52:32 UTC
Ok, the trace from the main thread does appear to show it waiting for I/O to complete:

#0  0x00007f1e1b335aff in __GI_ppoll (fds=0x55c0411e3600, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1  0x000055c03c97d3fb in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x000055c03c97d3fb in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0x000055c03c97f0b5 in aio_poll (ctx=ctx@entry=0x55c03f955700, blocking=<optimized out>) at util/aio-posix.c:622
#4  0x000055c03c8ffa64 in blk_prw (blk=blk@entry=0x55c03f980000, offset=offset@entry=0, buf=buf@entry=0x7ffc4f03f2c0 "@", bytes=bytes@entry=512, co_entry=co_entry@entry=0x55c03c900f40 <blk_read_entry>, flags=flags@entry=0) at block/block-backend.c:1052
#5  0x000055c03c90115a in blk_pread_unthrottled (count=512, buf=0x7ffc4f03f2c0, offset=0, blk=0x55c03f980000) at block/block-backend.c:1201
---Type <return> to continue, or q <return> to quit---
#6  0x000055c03c90115a in blk_pread_unthrottled (blk=blk@entry=0x55c03f980000, offset=offset@entry=0, buf=buf@entry=0x7ffc4f03f2c0 "@", count=count@entry=512)
    at block/block-backend.c:1069
#7  0x000055c03c7c8e9f in guess_disk_lchs (blk=blk@entry=0x55c03f980000, pcylinders=pcylinders@entry=0x7ffc4f03f50c, pheads=pheads@entry=0x7ffc4f03f510, psectors=psectors@entry=0x7ffc4f03f514) at hw/block/hd-geometry.c:70
#8  0x000055c03c7c9007 in hd_geometry_guess (blk=0x55c03f980000, pcyls=pcyls@entry=0x55c03f977974, pheads=pheads@entry=0x55c03f977978, psecs=psecs@entry=0x55c03f97797c, ptrans=ptrans@entry=0x55c03f977990) at hw/block/hd-geometry.c:135
#9  0x000055c03c7c8b82 in blkconf_geometry (conf=conf@entry=0x55c03f977958, ptrans=ptrans@entry=0x55c03f977990, cyls_max=cyls_max@entry=65535, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7ffc4f03f5f0) at hw/block/block.c:123
#10 0x000055c03c8123cf in ide_dev_initfn (dev=0x55c03f9778e0, kind=IDE_HD) at hw/ide/qdev.c:194
#11 0x000055c03c7d8944 in device_realize (dev=0x55c03f9778e0, errp=0x7ffc4f03f680) at hw/core/qdev.c:228
#12 0x000055c03c7da1c1 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc4f03f770) at hw/core/qdev.c:939
#13 0x000055c03c8c069e in property_set_bool (obj=0x55c03f9778e0, v=<optimized out>, name=<optimized out>, opaque=0x55c040960a70, errp=0x7ffc4f03f770) at qom/object.c:1860
#14 0x000055c03c8c435f in object_property_set_qobject (obj=0x55c03f9778e0, value=<optimized out>, name=0x55c03c9ea28b "realized", errp=0x7ffc4f03f770) at qom/qom-qobject.c:27
#15 0x000055c03c8c21d0 in object_property_set_bool (obj=0x55c03f9778e0, value=<optimized out>, name=0x55c03c9ea28b "realized", errp=0x7ffc4f03f770) at qom/object.c:1163
#16 0x000055c03c7d8f92 in qdev_init_nofail (dev=dev@entry=0x55c03f9778e0) at hw/core/qdev.c:373
#17 0x000055c03c812794 in ide_create_drive (bus=bus@entry=0x55c0411989f0, unit=unit@entry=0, drive=0x55c03f91b400) at hw/ide/qdev.c:132
#18 0x000055c03c812f9e in pci_ide_create_devs (dev=dev@entry=0x55c041198000, hd_table=hd_table@entry=0x7ffc4f03f870) at hw/ide/pci.c:430
#19 0x000055c03c8136e5 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7ffc4f03f870, devfn=<optimized out>) at hw/ide/piix.c:231
#20 0x000055c03c711ce5 in pc_init1 (machine=0x55c03f95e3c0, pci_type=0x55c03c9b094a "i440FX", host_type=0x55c03c9b0951 "i440FX-pcihost")
    at /usr/src/debug/qemu-2.9.0/hw/i386/pc_piix.c:249
#21 0x000055c03c66cb60 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4579


will have to investigate what is going on here as it could be a real bug

Comment 16 Daniel Berrangé 2017-06-15 12:17:43 UTC
I've filed a new bug to track this problem, since it is a recent regression caused by changes in AIO handling for NBD servers https://bugzilla.redhat.com/show_bug.cgi?id=1461827

Comment 17 Paolo Bonzini 2017-06-16 10:30:24 UTC
> Urgh, unfortunately this shows that the debug feature is disabled in RHEL.

Yes, you're supposed to use systemtap.  However, now that the tracing log has become more useful, maybe we could undo that (it does cause slightly worse performance).

Comment 18 Daniel Berrangé 2017-06-16 10:37:28 UTC
(In reply to Paolo Bonzini from comment #17)
> > Urgh, unfortunately this shows that the debug feature is disabled in RHEL.
> 
> Yes, you're supposed to use systemtap.  However, now that the tracing log
> has become more useful, maybe we could undo that (it does cause slightly
> worse performance).

Or provide a helper that invokes stap in a simplified manner, simply printing a message for each tracepoint, eg 'qemu-log -d qcrypto* -d qio*  /usr/libexec/qemu-kvm ....other args'. That would be almost as simple as built-in log, but without the additional performance penalty

Comment 19 Suqin Huang 2017-06-20 06:45:10 UTC
Result:
boot up and login the guest successfully

Server:
qemu-nbd -f raw --object tls-creds-x509,id=tls0,endpoint=server,dir=/root/spice_x509-sJF --tls-creds tls0  rhel74-64-virtio-scsi.qcow2 -p 9000 -t


Client:

    -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/spice_x509-sJF \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=nbd://hp-dl388g8-16.rhts.eng.pek2.redhat.com:9000,file.tls-creds=tls0 \

Package:
qemu-kvm-rhev-2.9.0-12.el7.x86_64

Comment 20 Suqin Huang 2017-06-20 06:46:19 UTC
Hi Dan,

Any other test do i need to run?

Thanks
Suqin

Comment 21 Daniel Berrangé 2017-06-20 12:03:01 UTC
That looks good to me.

Comment 24 errata-xmlrpc 2017-08-01 23:29:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 25 errata-xmlrpc 2017-08-02 01:07:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 26 errata-xmlrpc 2017-08-02 01:59:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 27 errata-xmlrpc 2017-08-02 02:40:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 28 errata-xmlrpc 2017-08-02 03:04:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 29 errata-xmlrpc 2017-08-02 03:24:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392