Bug 1430620

Summary: TLS encryption migration via exec failed with "TLS handshake failed: The TLS connection was non-properly terminated"
Product: Red Hat Enterprise Linux 7 Reporter: xianwang <xianwang>
Component: qemu-kvm-rhevAssignee: Daniel Berrangé <berrange>
Status: CLOSED ERRATA QA Contact: xianwang <xianwang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: berrange, chayang, juzhang, knoel, michen, mrezanin, qzhang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-6.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 03:39:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description xianwang 2017-03-09 06:56:28 UTC
Description of problem:
when do migration via exec with TLS encryption, the migration is failed with error:
qemu-kvm: TLS handshake failed: The TLS connection was non-properly
terminated

Version-Release number of selected component (if applicable):
3.10.0-514.15.1.el7.x86_64
qemu-kvm-rhev-2.8.0-5.el7.x86_64

How reproducible:
3/3

Steps to Reproduce:
1.ca file generated
a)template:
# echo 'cn = hp-dl385g7-08.lab.eng.pek2.redhat.com' > ca.tmpl
# echo 'ca' >> ca.tmpl
# echo 'cert_signing_key' >> ca.tmpl
# certtool --generate-self-signed --load-privkey x509-ca-key.pem --template ca.tmpl --outfile x509-ca.pem

b)server certificate:
# certtool --generate-privkey > x509-server-key.pem
# echo 'organization = GnuTLS test server' > server.tmpl
# echo 'cn = hp-dl385g7-08.lab.eng.pek2.redhat.com' >> server.tmpl
# echo 'tls_www_server' >> server.tmpl
# echo 'encryption_key' >> server.tmpl
# echo 'signing_key' >> server.tmpl
# echo 'dns_name = hp-dl385g7-08' >> server.tmpl
# echo 'ip_address = 10.73.196.159' >> server.tmpl
# certtool --generate-certificate --load-privkey x509-server-key.pem --load-ca-certificate x509-ca.pem --load-ca-privkey x509-ca-key.pem --template server.tmpl --outfile x509-server.pem

c)client certificate:
# certtool --generate-privkey > x509-client-key.pem
# echo 'cn = hp-dl385g7-08.lab.eng.pek2.redhat.com' > client.tmpl
# echo 'tls_www_client' >> client.tmpl
# echo 'encryption_key' >> client.tmpl
# echo 'signing_key' >> client.tmpl
# echo 'ip_address = 10.73.196.159' >> client.tmpl
# echo 'dns_name = hp-dl385g7-08' >> client.tmpl
#certtool --generate-certificate --load-privkey x509-client-key.pem --load-ca-certificate x509-ca.pem --load-ca-privkey x509-ca-key.pem --template client.tmpl --outfile x509-client.pem

d)#cp x509-ca.pem ca-cert.pem
#cp x509-server.pem server-cert.pem
#cp x509-server-key.pem server-key.pem
#cp x509-client.pem client-cert.pem
#cp x509-client-key.pem client-key.pem

2.Boot a guest with qemu cli in dst host(TLS server end and NFS server end)
/usr/libexec/qemu-kvm -object tls-creds-x509,id=tls0,endpoint=server,dir=/root -drive id=drive_image1,if=none,cache=none,format=qcow2,file=/root/RHEL.7.3.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0 -monitor stdio -vnc :1 -incoming defer
3.Boot a guest with same qemu cli in src host(TLS and NFS client end)
/usr/libexec/qemu-kvm -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/mount_point -drive id=drive_image1,if=none,cache=none,format=qcow2,file=/root/mount_point/RHEL.7.3.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0 -monitor stdio -vnc :1
4.In dst host
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate_incoming "exec:socat TCP4-LISTEN:9002 -"

In src host
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate "exec:socat - TCP4:hp-dl385g7-08:9002"


Actual results:
In src host:
qemu-kvm: No hostname available for TLS

******Then I set tls-hostname in src host, but the result is also
failed*******
In src host:
(qemu) migrate_set_parameter tls-hostname hp-dl385g7-08
(qemu) migrate "exec:socat - TCP4:hp-dl385g7-08:9002"
qemu-kvm: TLS handshake failed: The TLS connection was non-properly
terminated.


Expected results:
migration completed and vm works well

Additional info:

Comment 2 xianwang 2017-03-09 08:00:32 UTC
I have communicated and confirmed with Daniel in E-mail, he suggested to report this bug and assign it to him to track this problem, so, I assigned it to Daniel and QA contact is myself.

Comment 3 Daniel Berrangé 2017-04-21 11:24:37 UTC
Patch posted upstream

https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03623.html

Comment 5 Daniel Berrangé 2017-05-17 10:47:50 UTC
Patch is now merged in upstream GIT master as

commit 062d81f0e968fe1597474735f3ea038065027372
Author: Daniel P. Berrange <berrange>
Date:   Fri Apr 21 12:12:20 2017 +0100

    migration: setup bi-directional I/O channel for exec: protocol
    
    Historically the migration data channel has only needed to be
    unidirectional. Thus the 'exec:' protocol was requesting an
    I/O channel with O_RDONLY on incoming side, and O_WRONLY on
    the outgoing side.
    
    This is fine for classic migration, but if you then try to run
    TLS over it, this fails because the TLS handshake requires a
    bi-directional channel.
    
    Signed-off-by: Daniel P. Berrange <berrange>
    Reviewed-by: Juan Quintela <quintela>
    Signed-off-by: Juan Quintela <quintela>

Comment 7 Miroslav Rezanina 2017-05-23 08:14:02 UTC
Fix included in qemu-kvm-rhev-2.9.0-6.el7

Comment 9 xianwang 2017-06-01 05:16:41 UTC
I have test two scenarios for "exec": local migration and two hosts migration, the result is that local migraion completed and the two hosts migration failed. test results are as following:
version info:
3.10.0-671.el7.ppc64le
qemu-kvm-rhev-2.9.0-6.el7.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch

I.local migration
TLS server end:
# /usr/libexec/qemu-kvm -object tls-creds-x509,id=tls0,endpoint=server,dir=/root -monitor stdio -vnc :1 -incoming defer

TLS client end:
# /usr/libexec/qemu-kvm -object tls-creds-x509,id=tls0,endpoint=client,dir=/root -monitor stdio -vnc :2

server end:
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate_set_parameter tls-hostname ibm-p8-rhevm-02
(qemu) migrate_incoming "exec:socat TCP4-LISTEN:9002 -"

client end:
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate_set_parameter tls-hostname ibm-p8-rhevm-02
(qemu) migrate "exec:socat - TCP4:ibm-p8-rhevm-02:9002"

check the status of migration:
server end:
(qemu) info status 
VM status: running

client end:
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: completed

so, this scenario is pass.

II)Two hosts migration
server end:
# /usr/libexec/qemu-kvm -object tls-creds-x509,id=tls0,endpoint=server,dir=/root -monitor stdio -vnc :1 -incoming defer
client end:
# /usr/libexec/qemu-kvm -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/mount_point -monitor stdio -vnc :1

server end:
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate_set_parameter tls-hostname ibm-p8-rhevm-02
(qemu) migrate_incoming "exec:socat TCP4-LISTEN:9002 -"

client end:
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate_set_parameter tls-hostname ibm-p8-rhevm-02
(qemu) migrate "exec:socat - TCP4:ibm-p8-rhevm-02:9002"
2017/06/01 00:50:56 socat[3264] E getaddrinfo("ibm-p8-rhevm-02", "NULL", {1,2,1,6}, {}): Name or service not known
qemu-kvm: TLS handshake failed: The TLS connection was non-properly terminated.
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: failed (TLS handshake failed: The TLS connection was non-properly terminated.)
total time: 0 milliseconds

so, this scenario is failed.
In summary, this bug may still has some problem.
Daniel, could you please help to check this problem?

Comment 10 Daniel Berrangé 2017-06-01 09:14:53 UTC
The 'socat' you are telling QEMU to run is failing with an error:

2017/06/01 00:50:56 socat[3264] E getaddrinfo("ibm-p8-rhevm-02", "NULL", {1,2,1,6}, {}): Name or service not known

so it is expected that QEMU migration then fails.

Comment 11 xianwang 2017-06-02 05:32:56 UTC
(In reply to Daniel Berrange from comment #10)
> The 'socat' you are telling QEMU to run is failing with an error:
> 
> 2017/06/01 00:50:56 socat[3264] E getaddrinfo("ibm-p8-rhevm-02", "NULL",
> {1,2,1,6}, {}): Name or service not known
> 
> so it is expected that QEMU migration then fails.

yes, I have detected this error message, but I wonder this means my step is wrong? or, there is still some problem with this issue ?

if my step is wrong, could you help to give the right steps to verify this bug? thanks

Comment 12 Daniel Berrangé 2017-06-02 07:54:24 UTC
(In reply to xianwang from comment #11)
> (In reply to Daniel Berrange from comment #10)
> > The 'socat' you are telling QEMU to run is failing with an error:
> > 
> > 2017/06/01 00:50:56 socat[3264] E getaddrinfo("ibm-p8-rhevm-02", "NULL",
> > {1,2,1,6}, {}): Name or service not known
> > 
> > so it is expected that QEMU migration then fails.
> 
> yes, I have detected this error message, but I wonder this means my step is
> wrong? or, there is still some problem with this issue ?

The hostname 'ibm-p8-rhevm-02' can't be resolved  - that's either a wrong hostname or your DNS is broken.

Comment 13 xianwang 2017-06-05 05:36:30 UTC
(In reply to Daniel Berrange from comment #12)
> (In reply to xianwang from comment #11)
> > (In reply to Daniel Berrange from comment #10)
> > > The 'socat' you are telling QEMU to run is failing with an error:
> > > 
> > > 2017/06/01 00:50:56 socat[3264] E getaddrinfo("ibm-p8-rhevm-02", "NULL",
> > > {1,2,1,6}, {}): Name or service not known
> > > 
> > > so it is expected that QEMU migration then fails.
> > 
> > yes, I have detected this error message, but I wonder this means my step is
> > wrong? or, there is still some problem with this issue ?
> 
> The hostname 'ibm-p8-rhevm-02' can't be resolved  - that's either a wrong
> hostname or your DNS is broken.

Yes, Daniel is right, I have test this issue on other ppc host, the test result is pass, so, maybe there is something wrong for the src host in #comment9. 

I have re-test this issue both on x86_64 and power pc, the result are both passedfor x86_64 and ppc,The steps are as following:

I)x86_64
3.10.0-671.el7.x86_64
qemu-kvm-rhev-2.9.0-6.el7.x86_64

test steps are same as bug description.
the result:
migration completed, and vm works well.

II)ppc64le host:
3.10.0-675.el7.ppc64le
qemu-kvm-rhev-2.9.0-7.el7.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch

test steps are same as bug description.
the result:
migration completed, and vm works well.

So, this bug is fixed.

Comment 15 errata-xmlrpc 2017-08-02 03:39:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392