Bug 1300770 - RFE: add support for native TLS encryption on NBD client/server transports
RFE: add support for native TLS encryption on NBD client/server transports
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Daniel Berrange
Suqin Huang
: FutureFeature, TestOnly
Depends On: 1461827
Blocks: 1300772
  Show dependency treegraph
 
Reported: 2016-01-21 11:50 EST by Daniel Berrange
Modified: 2017-08-01 23:24 EDT (History)
18 users (show)

See Also:
Fixed In Version: qemu-kvm-rhev-2.6.0-1.el7
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1300772 (view as bug list)
Environment:
Last Closed: 2017-08-01 19:29:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
gdb debug info (16.71 KB, text/plain)
2017-06-15 06:47 EDT, Suqin Huang
no flags Details

  None (edit)
Description Daniel Berrange 2016-01-21 11:50:29 EST
Description of problem:
The NBD protocol currently runs in clear text, offering no security protection for the data transferred, unless it is tunnelled over some external transport like SSH. Such tunnelling is inefficient and inconvenient to manage, so there is a desire to add explicit support for TLS to the NBD clients & servers provided by QEMU.

A particular focus is on the need to have encryption of NBD channels used for disk copy during migration.

Latest patch series implementing TLS for NBD is

https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03440.html
Comment 3 Jeff Nelson 2017-05-22 09:20:20 EDT
Mirek: Add to 7.4 advisory?
Comment 6 Suqin Huang 2017-06-01 04:50:32 EDT
retry with:

Server: 
# qemu-nbd -f raw --object tls-creds-x509,id=tls0,endpoint=server,dir=/root/spice_x509-t0f/ --tls-creds tls0  /var/lib/avocado/data/avocado-vt/images/rhel74-64-virtio.qcow2 -p 9000 -t


Client:

    -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/spice_x509-t0f/ \
    -drive driver=nbd,host=10.73.196.167,port=9000,tls-creds=tls0 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=03 \


Error:
qemu-kvm: -drive driver=nbd,host=10.73.196.167,port=9000,tls-creds=tls0: Certificate does not match the hostname 10.73.196.167
Comment 7 Daniel Berrange 2017-06-01 05:12:58 EDT
(In reply to Suqin Huang from comment #6)
> Error:
> qemu-kvm: -drive driver=nbd,host=10.73.196.167,port=9000,tls-creds=tls0:
> Certificate does not match the hostname 10.73.196.167

Your 'genx509dir.sh' script uses the output of 'hostname' as the certificate hostname. I expect this is an actual hostname, not the ip address 10.73.196.167. IOW, you need to use  host=$HOSTNAME in the qemu -drive arg, not the IP address. Alternatively you need to improve your genx509dir.sh script, so that it includes the IP address in a subject-alt-name field.
Comment 8 Suqin Huang 2017-06-01 05:19:52 EDT
The qemu-kvm process hang



  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                              
25893 root      20   0 2713508  30012  12300 S   0.0  0.1   0:00.19 qemu-kvm  


kvm statistics - summary

 Event                                        Total Current


Server:

# qemu-nbd -f raw --object tls-creds-x509,id=tls0,endpoint=server,dir=/root/spice_x509-gP9/ --tls-creds tls0  /var/lib/avocado/data/avocado-vt/images/rhel74-64-virtio.qcow2 -p 9000 -t

Client:

    -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/spice_x509-gP9 \
    -drive driver=nbd,host=hp-dl385g7-04.lab.eng.pek2.redhat.com,port=9000,tls-creds=tls0 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=03 \
Comment 9 Suqin Huang 2017-06-09 03:16:18 EDT
Hi Dan,

Could you check comment8, guest hang when boot with tls-creds

Regards
Suqin
Comment 10 Daniel Berrange 2017-06-15 05:53:06 EDT
Can you try to run the qemu process with some debugging turned on - eg add

  -d trace:qio*,trace:qcrypto*

to the command line args of the qemu-kvm process and that post the debug messages here
Comment 11 Suqin Huang 2017-06-15 06:01:47 EDT
Log items (comma separated):
out_asm         show generated host assembly code for each compiled TB
in_asm          show target assembly code for each compiled TB
op              show micro ops for each compiled TB
op_opt          show micro ops after optimization
op_ind          show micro ops before indirect lowering
int             show interrupts/exceptions in short format
exec            show trace before each executed TB (lots of logs)
cpu             show CPU registers before entering a TB (lots of logs)
mmu             log MMU-related activities
pcall           x86 only: show protected mode far calls/returns/exceptions
cpu_reset       show CPU state before CPU resets
unimp           log unimplemented functionality
guest_errors    log when the guest OS does something invalid (eg accessing a
non-existent register)
page            dump pages at beginning of user mode emulation
nochain         do not chain compiled TBs so that "exec" and "cpu" show
complete traces
Comment 12 Suqin Huang 2017-06-15 06:05:49 EDT
Package: qemu-kvm-rhev-2.9.0-10.el7.x86_64
Comment 13 Daniel Berrange 2017-06-15 06:13:27 EDT
Urgh, unfortunately this shows that the debug feature is disabled in RHEL. Can you instead try to attach to the QEMU process with GDB.  First you'll need to install the debuginfo rpm

http://download-ipv4.eng.brq.redhat.com/brewroot/packages/qemu-kvm-rhev/2.9.0/10.el7/x86_64/qemu-kvm-rhev-debuginfo-2.9.0-10.el7.x86_64.rpm

Then run QEMU normally, and use 'ps' to find its PID. Now you can attach with GDB

  $ gdb -p PID-OF-QEMU

and then when you see the "(gdb)" prompt run 'thread apply all bt' and save the (very long !) output to a file and attach it to this bug
Comment 14 Suqin Huang 2017-06-15 06:47 EDT
Created attachment 1287996 [details]
gdb debug info
Comment 15 Daniel Berrange 2017-06-15 06:52:32 EDT
Ok, the trace from the main thread does appear to show it waiting for I/O to complete:

#0  0x00007f1e1b335aff in __GI_ppoll (fds=0x55c0411e3600, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1  0x000055c03c97d3fb in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x000055c03c97d3fb in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0x000055c03c97f0b5 in aio_poll (ctx=ctx@entry=0x55c03f955700, blocking=<optimized out>) at util/aio-posix.c:622
#4  0x000055c03c8ffa64 in blk_prw (blk=blk@entry=0x55c03f980000, offset=offset@entry=0, buf=buf@entry=0x7ffc4f03f2c0 "@", bytes=bytes@entry=512, co_entry=co_entry@entry=0x55c03c900f40 <blk_read_entry>, flags=flags@entry=0) at block/block-backend.c:1052
#5  0x000055c03c90115a in blk_pread_unthrottled (count=512, buf=0x7ffc4f03f2c0, offset=0, blk=0x55c03f980000) at block/block-backend.c:1201
---Type <return> to continue, or q <return> to quit---
#6  0x000055c03c90115a in blk_pread_unthrottled (blk=blk@entry=0x55c03f980000, offset=offset@entry=0, buf=buf@entry=0x7ffc4f03f2c0 "@", count=count@entry=512)
    at block/block-backend.c:1069
#7  0x000055c03c7c8e9f in guess_disk_lchs (blk=blk@entry=0x55c03f980000, pcylinders=pcylinders@entry=0x7ffc4f03f50c, pheads=pheads@entry=0x7ffc4f03f510, psectors=psectors@entry=0x7ffc4f03f514) at hw/block/hd-geometry.c:70
#8  0x000055c03c7c9007 in hd_geometry_guess (blk=0x55c03f980000, pcyls=pcyls@entry=0x55c03f977974, pheads=pheads@entry=0x55c03f977978, psecs=psecs@entry=0x55c03f97797c, ptrans=ptrans@entry=0x55c03f977990) at hw/block/hd-geometry.c:135
#9  0x000055c03c7c8b82 in blkconf_geometry (conf=conf@entry=0x55c03f977958, ptrans=ptrans@entry=0x55c03f977990, cyls_max=cyls_max@entry=65535, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7ffc4f03f5f0) at hw/block/block.c:123
#10 0x000055c03c8123cf in ide_dev_initfn (dev=0x55c03f9778e0, kind=IDE_HD) at hw/ide/qdev.c:194
#11 0x000055c03c7d8944 in device_realize (dev=0x55c03f9778e0, errp=0x7ffc4f03f680) at hw/core/qdev.c:228
#12 0x000055c03c7da1c1 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc4f03f770) at hw/core/qdev.c:939
#13 0x000055c03c8c069e in property_set_bool (obj=0x55c03f9778e0, v=<optimized out>, name=<optimized out>, opaque=0x55c040960a70, errp=0x7ffc4f03f770) at qom/object.c:1860
#14 0x000055c03c8c435f in object_property_set_qobject (obj=0x55c03f9778e0, value=<optimized out>, name=0x55c03c9ea28b "realized", errp=0x7ffc4f03f770) at qom/qom-qobject.c:27
#15 0x000055c03c8c21d0 in object_property_set_bool (obj=0x55c03f9778e0, value=<optimized out>, name=0x55c03c9ea28b "realized", errp=0x7ffc4f03f770) at qom/object.c:1163
#16 0x000055c03c7d8f92 in qdev_init_nofail (dev=dev@entry=0x55c03f9778e0) at hw/core/qdev.c:373
#17 0x000055c03c812794 in ide_create_drive (bus=bus@entry=0x55c0411989f0, unit=unit@entry=0, drive=0x55c03f91b400) at hw/ide/qdev.c:132
#18 0x000055c03c812f9e in pci_ide_create_devs (dev=dev@entry=0x55c041198000, hd_table=hd_table@entry=0x7ffc4f03f870) at hw/ide/pci.c:430
#19 0x000055c03c8136e5 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7ffc4f03f870, devfn=<optimized out>) at hw/ide/piix.c:231
#20 0x000055c03c711ce5 in pc_init1 (machine=0x55c03f95e3c0, pci_type=0x55c03c9b094a "i440FX", host_type=0x55c03c9b0951 "i440FX-pcihost")
    at /usr/src/debug/qemu-2.9.0/hw/i386/pc_piix.c:249
#21 0x000055c03c66cb60 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4579


will have to investigate what is going on here as it could be a real bug
Comment 16 Daniel Berrange 2017-06-15 08:17:43 EDT
I've filed a new bug to track this problem, since it is a recent regression caused by changes in AIO handling for NBD servers https://bugzilla.redhat.com/show_bug.cgi?id=1461827
Comment 17 Paolo Bonzini 2017-06-16 06:30:24 EDT
> Urgh, unfortunately this shows that the debug feature is disabled in RHEL.

Yes, you're supposed to use systemtap.  However, now that the tracing log has become more useful, maybe we could undo that (it does cause slightly worse performance).
Comment 18 Daniel Berrange 2017-06-16 06:37:28 EDT
(In reply to Paolo Bonzini from comment #17)
> > Urgh, unfortunately this shows that the debug feature is disabled in RHEL.
> 
> Yes, you're supposed to use systemtap.  However, now that the tracing log
> has become more useful, maybe we could undo that (it does cause slightly
> worse performance).

Or provide a helper that invokes stap in a simplified manner, simply printing a message for each tracepoint, eg 'qemu-log -d qcrypto* -d qio*  /usr/libexec/qemu-kvm ....other args'. That would be almost as simple as built-in log, but without the additional performance penalty
Comment 19 Suqin Huang 2017-06-20 02:45:10 EDT
Result:
boot up and login the guest successfully

Server:
qemu-nbd -f raw --object tls-creds-x509,id=tls0,endpoint=server,dir=/root/spice_x509-sJF --tls-creds tls0  rhel74-64-virtio-scsi.qcow2 -p 9000 -t


Client:

    -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/spice_x509-sJF \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=nbd://hp-dl388g8-16.rhts.eng.pek2.redhat.com:9000,file.tls-creds=tls0 \

Package:
qemu-kvm-rhev-2.9.0-12.el7.x86_64
Comment 20 Suqin Huang 2017-06-20 02:46:19 EDT
Hi Dan,

Any other test do i need to run?

Thanks
Suqin
Comment 21 Daniel Berrange 2017-06-20 08:03:01 EDT
That looks good to me.
Comment 24 errata-xmlrpc 2017-08-01 19:29:42 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 25 errata-xmlrpc 2017-08-01 21:07:21 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 26 errata-xmlrpc 2017-08-01 21:59:20 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 27 errata-xmlrpc 2017-08-01 22:40:06 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 28 errata-xmlrpc 2017-08-01 23:04:50 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 29 errata-xmlrpc 2017-08-01 23:24:58 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Note You need to log in before you can comment on or make changes to this bug.