Bug 1724911 - qemu reports that "Cannot load client cert & key" when do encypted native live migration but actually they should not be needed.
Summary: qemu reports that "Cannot load client cert & key" when do encypted native liv...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Dr. David Alan Gilbert
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-28 04:47 UTC by Fangge Jin
Modified: 2020-11-14 09:13 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-16 05:58:22 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)

Description Fangge Jin 2019-06-28 04:47:43 UTC
Description of problem:
qemu reports that "Cannot load client cert & key" when do encypted native live migration but actually they should not be needed.

Version-Release number of selected component (if applicable):
libvirt-5.0.0-11.module+el8.0.1+3459+e357ef2f.x86_64
qemu-kvm-3.1.0-27.module+el8.0.1+3253+c5371cb3.x86_64

How reproducible:
100%

Steps to Reproduce:
0. Prepare tls env for live migration on both source and target host.
1) Src host should have ca cert 
2) Target host should have ca cert and server cert&key

Be default, target(server) host doesn't need to verify src(client) host's cert, so no need to create client cert on src host although I did have client cert&key on src host(client key doesn't have read permission for others):
# ll /etc/pki/qemu/
total 32
drwxr-xr-x. 2 root root    6 Jun 28 12:42 bak
-rw-r--r--. 1 root root 1237 Jun 28 10:16 ca-cert.pem
-rw-------. 1 root root 1743 Jun 28 10:16 ca-key.pem
-rw-r--r--. 1 root root  916 Jun 28 10:17 client-cert.pem
-rw-------. 1 root root  887 Jun 28 10:17 client-key.pem
-rw-------. 1 root root  887 Jun 28 10:17 client-key.pem.secure
-rw-r--r--. 1 root root 1090 Jun 28 10:16 server-cert.pem
-rw-r--r--. 1 root root 1679 Jun 28 10:16 server-key.pem
-rw-------. 1 root root 1679 Jun 28 10:16 server-key.pem.secure


1. Start vm on source host

2. Do encypted native live migration:
# virsh migrate rhel7.6-1 qemu+ssh://10.66.4.240/system --live --verbose --p2p --undefinesource --persistent  --tls
error: internal error: unable to execute QEMU command 'object-add': Cannot load certificate '/etc/pki/qemu/client-cert.pem' & key '/etc/pki/qemu/client-key.pem': Error while reading file.

3. From libvirtd log, I can see:
2019-06-28 04:32:17.571+0000: 9141: info : qemuMonitorSend:1081 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8140349e0 msg={"execute":"object-add","arguments":{"qom-type":"tls-creds-x509","id":"objlibvirt_migrate_tls0","props":{"dir":"/etc/pki/qemu","endpoint":"client","verify-peer":true}},"id":"libvirt-105"}
2019-06-28 04:32:17.573+0000: 9139: debug : qemuMonitorJSONIOProcessLine:196 : Line [{"id": "libvirt-105", "error": {"class": "GenericError", "desc": "Cannot load certificate '/etc/pki/qemu/client-cert.pem' & key '/etc/pki/qemu/client-key.pem': Error while reading file."}}]

4. If I remove client cert&key on src host, migration can succeed

5. If I add read permission to client key for others on src host, migration can succeed.


Actual results:


Expected results:
qemu should not load client cert&key when they are not needed

Additional info:

Comment 2 Li Xiaohui 2019-07-02 12:01:52 UTC
Hi all,
I test on rhel8.0.1 host, do tls encryption migration(only use one host, do local migration), didn't hit this issue.
host info: kernel-4.18.0-80.4.2.el8_0.x86_64 & qemu-img-3.1.0-27.module+el8.0.1+3253+c5371cb3.x86_64
guest info: kernel-3.10.0-957.27.1.el7.x86_64

test 2 situations: 
1) with client cert & key on right directory:
migration finish successfully, qemu and guest run normally 
2) without client cert & key on right directory, migration will fail,like followings. So I think Client cert & key are necessary for tls encryption migration.
on dst qemu, will prompt:
(qemu) qemu-kvm: Verify failed: No certificate was found.
on src qemu, will find migration failed:   
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off 
Migration status: failed
total time: 0 milliseconds


step to reproduce:
1.Ca files generated:
for situation 1), like this:
[root@intel-e31225-8-3 tls]# pwd
/home/work/qemu-sh/tls
[root@intel-e31225-8-3 tls]# ls
ca-cert.pem  client-cert.pem  client.tmpl      server-key.pem  x509-ca-key.pem
ca.tmpl      client-key.pem   server-cert.pem  server.tmpl

for situation 2), like this:
[root@intel-e31225-8-3 tls]# ls 
ca-cert.pem  server-cert.pem  server.tmpl
ca.tmpl      server-key.pem   x509-ca-key.pem

2.boot guest on src host with commands:
/usr/libexec/qemu-kvm \
-enable-kvm \
-machine q35  \
-m 2G  \
-smp 2  \
-cpu 'SandyBridge' \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/mnt/nfs/rhel76-64-virtio-scsi.qcow2 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bus=pcie.0-root-port-2,addr=0x0 \
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=4 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:e6:67:a7:1c,bus=pcie.0-root-port-4,vectors=10,mq=on \
-object tls-creds-x509,id=tls0,endpoint=client,dir=/home/work/qemu-sh/tls \
-vnc :0  \
-device VGA \
-monitor stdio \
-qmp tcp:0:1234,server,nowait \

3.boot guest on dst host with "-incoming defer":
/usr/libexec/qemu-kvm \
-enable-kvm \
-machine q35  \
-m 2G  \
-smp 2  \
-cpu 'SandyBridge' \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/mnt/nfs/rhel76-64-virtio-scsi.qcow2 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bus=pcie.0-root-port-2,addr=0x0 \
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=4 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:e6:67:a7:1c,bus=pcie.0-root-port-4,vectors=10,mq=on \
-object tls-creds-x509,id=tls0,endpoint=server,dir=/home/work/qemu-sh/tls \
-vnc :1  \
-device VGA \
-monitor stdio \
-qmp tcp:0:2345,server,nowait \
-incoming defer \

4.set tls-creds and do migration on dst and src qemu:
dst qemu:
(qemu) migrate_set_parameter tls-creds tls0 
(qemu) migrate_incoming tcp:10.66.148.4:5801

src qemu:
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate -d tcp:10.66.148.4:5801

Comment 3 Li Xiaohui 2019-07-02 12:18:23 UTC
Add comment 2 's ca files detail:
for situation 1)
[root@intel-e31225-8-3 tls]# ls -lt
total 48
-rw-r--r--. 1 root root 1635 Jul  2 07:31 client-cert.pem
-rw-r--r--. 1 root root 8167 Jul  2 07:31 client-key.pem
-rw-r--r--. 1 root root 1675 Jul  2 07:31 server-cert.pem
-rw-r--r--. 1 root root 8180 Jul  2 07:31 server-key.pem
-rw-r--r--. 1 root root 1517 Jul  2 07:30 ca-cert.pem
-rw-r--r--. 1 root root 8177 Jul  2 07:30 x509-ca-key.pem
-rw-r--r--. 1 root root  173 Jul  2 07:30 server.tmpl
-rw-r--r--. 1 root root  140 Jul  2 07:30 client.tmpl
-rw-r--r--. 1 root root   64 Jul  2 07:29 ca.tmpl

for situation 2)
-rw-r--r--. 1 root root 1675 Jul  2 07:31 server-cert.pem
-rw-r--r--. 1 root root 8180 Jul  2 07:31 server-key.pem
-rw-r--r--. 1 root root 1517 Jul  2 07:30 ca-cert.pem
-rw-r--r--. 1 root root 8177 Jul  2 07:30 x509-ca-key.pem
-rw-r--r--. 1 root root  173 Jul  2 07:30 server.tmpl
-rw-r--r--. 1 root root   64 Jul  2 07:29 ca.tmpl

Comment 4 Li Xiaohui 2019-07-02 12:22:49 UTC
Situation 3 test:
3) with client-key.pem file on host, but it has no read permission, migration finish successfully, qemu and guest run normally:
[root@intel-e31225-8-3 tls]# ls -lt
total 48
-rw-r--r--. 1 root root 1635 Jul  2 07:31 client-cert.pem
-rw-------. 1 root root 8167 Jul  2 07:31 client-key.pem
-rw-r--r--. 1 root root 1675 Jul  2 07:31 server-cert.pem
-rw-r--r--. 1 root root 8180 Jul  2 07:31 server-key.pem
-rw-r--r--. 1 root root 1517 Jul  2 07:30 ca-cert.pem
-rw-r--r--. 1 root root 8177 Jul  2 07:30 x509-ca-key.pem
-rw-r--r--. 1 root root  173 Jul  2 07:30 server.tmpl
-rw-r--r--. 1 root root  140 Jul  2 07:30 client.tmpl
-rw-r--r--. 1 root root   64 Jul  2 07:29 ca.tmpl

Comment 5 Daniel Berrangé 2019-09-17 13:42:21 UTC
(In reply to Fangge Jin from comment #0)
> Description of problem:
> qemu reports that "Cannot load client cert & key" when do encypted native
> live migration but actually they should not be needed.

Out of the box, libvirt *does* enable client key verification on the server QEMU. This means you must proivde client key & cert on the source host

Change is controlled in /etc/libvirt/qemu.conf via

default_tls_x509_verify = 1

Or

migrate_tls_x509_verify = 1

They default to '1' which means client certs are required. Setting either to 0 will disable client certs

The latter takes priority if both are set. 

(In reply to Li Xiaohui from comment #2)
> 1) with client cert & key on right directory:
> migration finish successfully, qemu and guest run normally 
> 2) without client cert & key on right directory, migration will fail,like
> followings. So I think Client cert & key are necessary for tls encryption
> migration.
> on dst qemu, will prompt:

Yes, this is correct out of the box behaviour with libvirt's TLS setup.

(2) will only succeed if you have modified qemu.conf as shown above

Comment 6 Dr. David Alan Gilbert 2019-11-07 18:22:25 UTC
Hi Fangge,
  Can you please check Daniel's answer in #5 - if it's sufficient then please close as not-a-bug, else clarify.

Comment 8 yafu 2019-11-13 06:36:35 UTC
(In reply to Dr. David Alan Gilbert from comment #6)
> Hi Fangge,
>   Can you please check Daniel's answer in #5 - if it's sufficient then
> please close as not-a-bug, else clarify.

libvirt does not enable client key verification on the target host in default.

In the comment 0, the reproduce step also not enable client key verification on the target host in default, so the migration can succeed without client cert&key on src host (please see step 4 in comment 0).

So i think it's still a bug. If client key verification is not enabled on the target host, qemu should not load client cert&key.

Comment 9 Daniel Berrangé 2019-11-13 10:46:51 UTC
(In reply to yafu from comment #8)
> (In reply to Dr. David Alan Gilbert from comment #6)
> > Hi Fangge,
> >   Can you please check Daniel's answer in #5 - if it's sufficient then
> > please close as not-a-bug, else clarify.
> 
> libvirt does not enable client key verification on the target host in
> default.

Libvirt *does* enable client key verification by default.


> In the comment 0, the reproduce step also not enable client key verification
> on the target host in default, so the migration can succeed without client
> cert&key on src host (please see step 4 in comment 0).
> 
> So i think it's still a bug. If client key verification is not enabled on
> the target host, qemu should not load client cert&key.

The client QEMU process doesn't know whether the target is requiring a client key or not. So if it finds a client key on disk it always has to load it and send it, just in case.


If the client key pem file did not exist on the client host at all there would not have been any problem. QEMU would simply have connected & not sent a key. This would have worked if the server did not require keys.


The problem in comment 0 arise because the client key pem file *does* exist, but permissions prevent QEMU from reading it.

There is no right answer about what todo in this case.

QEMU could ignore the EPERM error when reading the file and carry on, just in case the server doesn't require client keys.

The downside of this is that it leads to silent failures if the user really did want client key to be used, and simply messed up the permissions.

On balance, given that it is very hard to diagnose TLS failures, I took the approach that we should report an error if the key exists, but cannot be loaded as this is almost certainly an admin mistake.


IOW, I think the test scearnio is invalid. To test migration when client keys are disabled on the server, remove the client keys from disk entirely on the client, don't simply change permissions.

Comment 10 yafu 2019-11-15 04:59:01 UTC
(In reply to Daniel Berrangé from comment #9)
> (In reply to yafu from comment #8)
> > (In reply to Dr. David Alan Gilbert from comment #6)
> > > Hi Fangge,
> > >   Can you please check Daniel's answer in #5 - if it's sufficient then
> > > please close as not-a-bug, else clarify.
> > 
> > libvirt does not enable client key verification on the target host in
> > default.
> 
> Libvirt *does* enable client key verification by default.
> 
> 
> > In the comment 0, the reproduce step also not enable client key verification
> > on the target host in default, so the migration can succeed without client
> > cert&key on src host (please see step 4 in comment 0).
> > 
> > So i think it's still a bug. If client key verification is not enabled on
> > the target host, qemu should not load client cert&key.
> 
> The client QEMU process doesn't know whether the target is requiring a
> client key or not. So if it finds a client key on disk it always has to load
> it and send it, just in case.
> 
> 
> If the client key pem file did not exist on the client host at all there
> would not have been any problem. QEMU would simply have connected & not sent
> a key. This would have worked if the server did not require keys.
> 
> 
> The problem in comment 0 arise because the client key pem file *does* exist,
> but permissions prevent QEMU from reading it.
> 
> There is no right answer about what todo in this case.
> 
> QEMU could ignore the EPERM error when reading the file and carry on, just
> in case the server doesn't require client keys.
> 
> The downside of this is that it leads to silent failures if the user really
> did want client key to be used, and simply messed up the permissions.
> 
> On balance, given that it is very hard to diagnose TLS failures, I took the
> approach that we should report an error if the key exists, but cannot be
> loaded as this is almost certainly an admin mistake.
> 
> 
> IOW, I think the test scearnio is invalid. To test migration when client
> keys are disabled on the server, remove the client keys from disk entirely
> on the client, don't simply change permissions.

Thanks for your detailed explanation. I will close the bug as not-a-bug.

But i still have a small question. I did the tls migration without changing the setting in qemu.conf both on the source and target, and having client cert&key file on the source host. After migration i checked the libvirtd.log on the target host. I found the "verify-peer" was false. Do it mean libvirtd does not enable client key verification by default?

# cat /var/log/libvirt/libvirtd.log | grep -i "verify-peer"
2019-11-15 03:37:45.206+0000: 15571: debug : virJSONValueToString:2005 : result={"execute":"object-add","arguments":{"qom-type":"tls-creds-x509","id":"objlibvirt_migrate_tls0","props":{"dir":"/etc/pki/qemu","endpoint":"server","verify-peer":false}},"id":"libvirt-14"}
2019-11-15 03:37:45.206+0000: 15571: debug : qemuMonitorJSONCommandWithFd:305 : Send command '{"execute":"object-add","arguments":{"qom-type":"tls-creds-x509","id":"objlibvirt_migrate_tls0","props":{"dir":"/etc/pki/qemu","endpoint":"server","verify-peer":false}},"id":"libvirt-14"}' for write with FD -1
2019-11-15 03:37:45.206+0000: 15571: info : qemuMonitorSend:1083 : QEMU_MONITOR_SEND_MSG: mon=0x7fecb0017990 msg={"execute":"object-add","arguments":{"qom-type":"tls-creds-x509","id":"objlibvirt_migrate_tls0","props":{"dir":"/etc/pki/qemu","endpoint":"server","verify-peer":false}},"id":"libvirt-14"}
2019-11-15 03:37:45.206+0000: 15566: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7fecb0017990 buf={"execute":"object-add","arguments":{"qom-type":"tls-creds-x509","id":"objlibvirt_migrate_tls0","props":{"dir":"/etc/pki/qemu","endpoint":"server","verify-peer":false}},"id":"libvirt-14"}

Comment 11 Daniel Berrangé 2019-11-15 09:56:32 UTC
(In reply to yafu from comment #10)
> But i still have a small question. I did the tls migration without changing
> the setting in qemu.conf both on the source and target, and having client
> cert&key file on the source host. After migration i checked the libvirtd.log
> on the target host. I found the "verify-peer" was false. Do it mean libvirtd
> does not enable client key verification by default?
> 
> 2019-11-15 03:37:45.206+0000: 15566: info : qemuMonitorIOWrite:551 :
> QEMU_MONITOR_IO_WRITE: mon=0x7fecb0017990
> buf={"execute":"object-add","arguments":{"qom-type":"tls-creds-x509","id":
> "objlibvirt_migrate_tls0","props":{"dir":"/etc/pki/qemu","endpoint":"server",
> "verify-peer":false}},"id":"libvirt-14"}


Yes, verify-peer:false means there was no client verification. AFAIK, that should not be the default.

Did you change the /etc/libvirt/qemu.conf in any way, as that file controls verification usage.

Comment 12 yafu 2019-11-18 04:55:43 UTC
(In reply to Daniel Berrangé from comment #11)
> (In reply to yafu from comment #10)
> > But i still have a small question. I did the tls migration without changing
> > the setting in qemu.conf both on the source and target, and having client
> > cert&key file on the source host. After migration i checked the libvirtd.log
> > on the target host. I found the "verify-peer" was false. Do it mean libvirtd
> > does not enable client key verification by default?
> > 
> > 2019-11-15 03:37:45.206+0000: 15566: info : qemuMonitorIOWrite:551 :
> > QEMU_MONITOR_IO_WRITE: mon=0x7fecb0017990
> > buf={"execute":"object-add","arguments":{"qom-type":"tls-creds-x509","id":
> > "objlibvirt_migrate_tls0","props":{"dir":"/etc/pki/qemu","endpoint":"server",
> > "verify-peer":false}},"id":"libvirt-14"}
> 
> 
> Yes, verify-peer:false means there was no client verification. AFAIK, that
> should not be the default.
> 
> Did you change the /etc/libvirt/qemu.conf in any way, as that file controls
> verification usage.

No.

Comment 13 Ademar Reis 2020-02-05 22:59:49 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 14 Fangge Jin 2020-09-16 06:02:22 UTC
(In reply to Daniel Berrangé from comment #11)
> (In reply to yafu from comment #10)
> > But i still have a small question. I did the tls migration without changing
> > the setting in qemu.conf both on the source and target, and having client
> > cert&key file on the source host. After migration i checked the libvirtd.log
> > on the target host. I found the "verify-peer" was false. Do it mean libvirtd
> > does not enable client key verification by default?
> > 
> > 2019-11-15 03:37:45.206+0000: 15566: info : qemuMonitorIOWrite:551 :
> > QEMU_MONITOR_IO_WRITE: mon=0x7fecb0017990
> > buf={"execute":"object-add","arguments":{"qom-type":"tls-creds-x509","id":
> > "objlibvirt_migrate_tls0","props":{"dir":"/etc/pki/qemu","endpoint":"server",
> > "verify-peer":false}},"id":"libvirt-14"}
> 
> 
> Yes, verify-peer:false means there was no client verification. AFAIK, that
> should not be the default.
> 
> Did you change the /etc/libvirt/qemu.conf in any way, as that file controls
> verification usage.

Hi Daniel
I just closed the bug. But there is still one thing that needs to confirm with you:
whether the server needs to verify client's cert by default.

For now, the test result is "server doesn't verify client's cert", and this is 
consistent with the description in /etc/libvirt/qemu.conf:

# The default TLS configuration only uses certificates for the server
# allowing the client to verify the server's identity and establish
# an encrypted channel.
#
# It is possible to use x509 certificates for authentication too, by
# issuing an x509 certificate to every client who needs to connect.
#
# Enabling this option will reject any client who does not have a
# certificate signed by the CA in /etc/pki/qemu/ca-cert.pem
#
# The default_tls_x509_cert_dir directory must also contain
#
#  client-cert.pem - the client certificate signed with the ca-cert.pem
#  client-key.pem - the client private key
#
#default_tls_x509_verify = 1

Comment 15 Daniel Berrangé 2020-09-16 09:30:22 UTC
(In reply to Fangge Jin from comment #14)
> (In reply to Daniel Berrangé from comment #11)
> > 
> > Yes, verify-peer:false means there was no client verification. AFAIK, that
> > should not be the default.
> > 
> > Did you change the /etc/libvirt/qemu.conf in any way, as that file controls
> > verification usage.
> 
> Hi Daniel
> I just closed the bug. But there is still one thing that needs to confirm
> with you:
> whether the server needs to verify client's cert by default.
> 
> For now, the test result is "server doesn't verify client's cert", and this
> is 
> consistent with the description in /etc/libvirt/qemu.conf:
> 
> # The default TLS configuration only uses certificates for the server
> # allowing the client to verify the server's identity and establish
> # an encrypted channel.
> #
> # It is possible to use x509 certificates for authentication too, by
> # issuing an x509 certificate to every client who needs to connect.
> #
> # Enabling this option will reject any client who does not have a
> # certificate signed by the CA in /etc/pki/qemu/ca-cert.pem
> #
> # The default_tls_x509_cert_dir directory must also contain
> #
> #  client-cert.pem - the client certificate signed with the ca-cert.pem
> #  client-key.pem - the client private key
> #
> #default_tls_x509_verify = 1


Hmm, yes, I think you are right. This feels like a bug in libvirt's TLS setup. The default_tls_x509_verify setting does indeed appear to default to 0.  This is likely inherited from when we generalized the previous vnc_tls_x509_verify option. This default was fine for VNC, as typically theres an authentication process applied on top of the TLS layer.

For NBD/chardev/migration, this default is not a very good idea, as there's no extra auth layer. So this is a bug in libvirt defaults we should consider changing.


Note You need to log in before you can comment on or make changes to this bug.