+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1648190 +++ ====================================================================== Description of problem: After upgrading the host to latest RHVH 4.2.7 (RHEL7.6), libvirt fails to start due to being unable to open /etc/pki/vdsm/certs/cacert.pem. Its not a permission issue, gnutls is failing to open it. Customer has old cacert.pem on host, generated in 2015. So most likely at 3.5 RHV-M install. There doesnt seem to be anything special on the customers cacert.pem, except it uses Sha1 instead of sha256. Still, the error seems to be something else (see additional info). I could not come to a conclusion to check. If I place a freshly generated cacert.pem (by 4.2.7 engine), it works. Version-Release number of selected component (if applicable): FAIL: gnutls-3.3.29-8.el7.x86_64 (7.6) libvirt-4.5.0-10.el7_6.2.x86_64 OK: gnutls-3.3.26-9.el7.x86_64 (7.5) libvirt-4.5.0-10.el7_6.2.x86_64 How reproducible: Always Steps to Reproduce: 1. Place provided cacert.pem on /etc/pki/vdsm/cacert.pem 2. Or compile and use provided code to simulate it, gets the same error. # gcc -o test test.c -lgnutls Actual results: Libvirt fails to start: 2018-11-09 02:06:45.139+0000: 12963: error : virNetTLSContextLoadCACertListFromFile:550 : Unable to import CA certificate list /etc/pki/vdsm/certs/cacert.pem Expected results: Libvirt starts Additional info: After upgrade of RHVH 4.2.3 to 4.2.7 (to RHEL 7.6), libvirtd wont start anymore: error : virNetTLSContextLoadCACertListFromFile:550 : Unable to import CA certificate list /etc/pki/vdsm/certs/cacert.pem The error comes from here: if (gnutls_x509_crt_list_import(certs, &certMax, &data, GNUTLS_X509_FMT_PEM, 0) < 0) { virReportError(VIR_ERR_SYSTEM_ERROR, _("Unable to import CA certificate list %s"), certFile); goto cleanup; } This means the ca cert /etc/pki/vdsm/certs/cacert.pem cannot be loaded. The related packages in use are: gnutls-3.3.29-8.el7.x86_64 libvirt-4.5.0-10.el7_6.2.x86_64 But if I downgrade gnutls to RHEL 7.5, it loads the cert just fine: gnutls-utils-3.3.26-9.el7.x86_64 gnutls-3.3.26-9.el7.x86_64 gnutls-dane-3.3.26-9.el7.x86_64 So gnutls: RHEL7.5: gnutls-3.3.26-9.el7.x86_64 RHEL7.6: gnutls-3.3.29-8.el7.x86_64 (doesnt load old RHV certs?) The certificate in question was generated in 2015 by an older engine. But it looks exact the same of a freshly generated one by a 4.2.7 engine. The only difference seems to be that the 2015 certificate uses signature algorithm sha1WithRSAEncryption instead of sha256WithRSAEncryption. When enabling debug on gnutls, it seems to stop here on function gnutls_x509_crt_list_import, returning non-zero: gnutls_x509_crt_import(certs[count], &tmp, GNUTLS_X509_FMT_PEM); if (ret < 0) { gnutls_assert(); <------------ goto error; On further inspection with GDB, it seems to be failing here, in function gnutls_x509_crt_import. result = _asn1_strict_der_decode(&cert->cert, cert->der.data, cert->der.size, NULL); if (result != ASN1_SUCCESS) { result = _gnutls_asn2err(result); gnutls_assert(); goto cleanup; } This is an inline function which I was not able to debug it further. Still, it seems to return ASN1_DER_ERROR. (gdb) finish Run till exit from #0 gnutls_x509_crt_import (cert=0x61d7c0, data=data@entry=0x7fffffffe190, format=format@entry=GNUTLS_X509_FMT_PEM) at x509.c:252 gnutls_x509_crt_list_import (certs=0x7fffffffe220, cert_max=0x7fffffffe20c, data=0x7fffffffe210, format=<optimized out>, flags=0) at x509.c:3495 3495 if (ret < 0) { Value returned is $1 = -69 Which means we are hitting this on GNU tls: #define GNUTLS_E_ASN1_DER_ERROR -69 Which is translated to this from libtasn1: ASN1_DER_ERROR And it means: "the der encoding doesn't match the structure ELEMENT." Doesn't look like related to sha1 vs sha256. I could not get into a conclusion. Is there a problem with old RHV CA Certs? Or gnutls/libtasn are broken? (Originally by Germano Veit Michel)
Forgot this, also, pointing to the same error: # GNUTLS_DEBUG_LEVEL=9 libvirtd --listen <...> gnutls[3]: ASSERT: x509.c:311 <-- ASN1_DER_ERROR gnutls[3]: ASSERT: x509.c:3496 (Originally by Germano Veit Michel)
More info: In function gnutls_x509_crt_import, gnutls uses: RHEL 7.5 : v3.3.26 asn1_der_decoding(&cert->cert, cert->der.data, cert->der.size, NULL); RHEL 7.6 : v3.3.29 _asn1_strict_der_decode(&cert->cert, cert->der.data, cert->der.size, NULL); Here: https://gitlab.com/gnutls/gnutls/commit/5691dc3e4331000765c694bc101b445a80c0a9e2 So in RHEL 7.6, it is using libtasn1 "asn1_der_decoding2" function, with some sort of STRICT option, ASN1_DECODE_FLAG_STRICT_DER. Not sure if related, but: * If %ASN1_DECODE_FLAG_STRICT_DER flag is set then the function will * not decode any BER-encoded elements. Tried to 'watch' the result return value in gdb to find where its returning from, but its optimized out. So gnutls is using a different function, now from libtasn1, which for some reason does not like that certificate. (Originally by Germano Veit Michel)
Ran out of time today... here is some more info I got. Not sure if related. Breaking down the certs, found 2 other differences. 1) from date (UTCTIME type) is slightly different: 110:d=3 hl=2 l= 17 prim: UTCTIME :150517165500+0000 94:d=3 hl=2 l= 13 prim: UTCTIME :181022044108Z 2) The working cert uses UTF8_STRING instead of PRINTABLE_STRING. Maybe I need to recompile libtasn1 to track it down more precisely where it returns ASN1_DER_ERROR, with optimizations is a bit hard. (Originally by Germano Veit Michel)
(In reply to Germano Veit Michel from comment #6) > Ran out of time today... here is some more info I got. > > Not sure if related. Breaking down the certs, found 2 other differences. > > 1) from date (UTCTIME type) is slightly different: > 110:d=3 hl=2 l= 17 prim: UTCTIME :150517165500+0000 > 94:d=3 hl=2 l= 13 prim: UTCTIME :181022044108Z According to https://bugzilla.redhat.com/show_bug.cgi?id=1636023#c15 and https://github.com/openssl/openssl/pull/2668 , the issue is due to the malformed value for UTCTIME: it was acceptable in the past but now openssl and gnutls refuses it. (Originally by Simone Tiraboschi)
On the valid cert example we see: Validity Not Before: Oct 22 04:41:08 2018 GMT Not After : Oct 20 04:41:08 2028 GMT while on the bad one: Validity Not Before: May 17 16:55:00 2015 Not After : May 15 16:55:00 2025 GMT so the issue is specific to 'Not Before:' value. But in engine-setup we already have code to handle that case: https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/plugins/ovirt-engine-setup/ovirt-engine/pki/ca.py#L275 Now we have to understand if it never triggered or if it got correctly triggered on engine side but the CA cert was never redistributed to all the hosts. (Originally by Simone Tiraboschi)
According to engine-setup logs it seams that engine-setup correctly detected the issue and offered to start the PKI renewal process as for https://access.redhat.com/solutions/1572983 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human human.queryString:145 query OVESETUP_RENEW_PKI 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND One or more of the certificates should be renewed, because they expire soon, or include an invalid expiry date, or do not include the subjectAltName extension, which can cause them to be rejected by recent browsers. 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND If you choose "No", you will be asked again the next time you run Setup. 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND See https://access.redhat.com/solutions/1572983 for more details. 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew certificates? (Yes, No) [No]: 2018-05-02 10:51:51 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2018-05-02 10:51:51 DEBUG otopi.context context.dumpEnvironment:770 ENV OVESETUP_PKI/renew=bool:'False' The user reject it at least 5 times: [root@t470s setup]# grep -R "Renew PKI" ovirt-engine-setup-20171213144824-oppz1d.log:2017-12-13 14:49:16 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False ovirt-engine-setup-20171213144427-y922jr.log:2017-12-13 14:45:36 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False ovirt-engine-setup-20180102112424-dmayx8.log:2018-01-02 11:26:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False ovirt-engine-setup-20180102182847-cf5zz9.log:2018-01-02 18:29:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False ovirt-engine-setup-20180502105103-z9uh1h.log:2018-05-02 10:52:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False I have to add that the default response is No while Yes probably makes more sense now also due to this one. (Originally by Simone Tiraboschi)
My customer already has the ca.pem in the engine renewed and corrected long back. openssl x509 -in ca.pem -noout -dates notBefore=Mar 24 06:33:24 2017 GMT notAfter=Mar 22 06:33:24 2027 GMT But the host is still having the old CA. openssl x509 -in certs/cacert.pem -noout -dates notBefore=Jun 14 13:41:58 2015 notAfter=Jun 12 13:41:58 2025 GMT I think the engine-setup should automatically push the CA certificate automatically to the hosts instead of asking the user to manually enroll the certificate as it's only CA has been changed and the vdsm certificates are still valid. Can we do this automatically during engine-setup? (Originally by Nijin Ashok)
(In reply to nijin ashok from comment #15) > My customer already has the ca.pem in the engine renewed and corrected long back. ... > I think the engine-setup should automatically push the CA certificate > automatically to the hosts instead of asking the user to manually enroll the > certificate as it's only CA has been changed and the vdsm certificates are > still valid. Can we do this automatically during engine-setup? For sure not during engine-setup since the host has to be in maintenance since we have to restart libvirt and vdsm to make it effective and so the host has to be in maintenance. I think we should instead re-enroll host certs (just if needed?) during host upgrades at least if the upgrade is triggered from the engine. Martin, do you know if we are already re-enrolling certs in host upgrades lead by the engine? Nijin, do you know if the customer triggered host upgrade from the engine as recommended? (Originally by Simone Tiraboschi)
What are the reproduction steps or how do I reach the change that is linked to this bug? Should the bad certificate linked to this bug work correctly?
Verified on ovirt-engine-4.2.8.1-0.0.master.20181127093701.gite34295b.el7.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0121