Description of problem: After upgrading the host to latest RHVH 4.2.7 (RHEL7.6), libvirt fails to start due to being unable to open /etc/pki/vdsm/certs/cacert.pem. Its not a permission issue, gnutls is failing to open it. Customer has old cacert.pem on host, generated in 2015. So most likely at 3.5 RHV-M install. There doesnt seem to be anything special on the customers cacert.pem, except it uses Sha1 instead of sha256. Still, the error seems to be something else (see additional info). I could not come to a conclusion to check. If I place a freshly generated cacert.pem (by 4.2.7 engine), it works. Version-Release number of selected component (if applicable): FAIL: gnutls-3.3.29-8.el7.x86_64 (7.6) libvirt-4.5.0-10.el7_6.2.x86_64 OK: gnutls-3.3.26-9.el7.x86_64 (7.5) libvirt-4.5.0-10.el7_6.2.x86_64 How reproducible: Always Steps to Reproduce: 1. Place provided cacert.pem on /etc/pki/vdsm/cacert.pem 2. Or compile and use provided code to simulate it, gets the same error. # gcc -o test test.c -lgnutls Actual results: Libvirt fails to start: 2018-11-09 02:06:45.139+0000: 12963: error : virNetTLSContextLoadCACertListFromFile:550 : Unable to import CA certificate list /etc/pki/vdsm/certs/cacert.pem Expected results: Libvirt starts Additional info: After upgrade of RHVH 4.2.3 to 4.2.7 (to RHEL 7.6), libvirtd wont start anymore: error : virNetTLSContextLoadCACertListFromFile:550 : Unable to import CA certificate list /etc/pki/vdsm/certs/cacert.pem The error comes from here: if (gnutls_x509_crt_list_import(certs, &certMax, &data, GNUTLS_X509_FMT_PEM, 0) < 0) { virReportError(VIR_ERR_SYSTEM_ERROR, _("Unable to import CA certificate list %s"), certFile); goto cleanup; } This means the ca cert /etc/pki/vdsm/certs/cacert.pem cannot be loaded. The related packages in use are: gnutls-3.3.29-8.el7.x86_64 libvirt-4.5.0-10.el7_6.2.x86_64 But if I downgrade gnutls to RHEL 7.5, it loads the cert just fine: gnutls-utils-3.3.26-9.el7.x86_64 gnutls-3.3.26-9.el7.x86_64 gnutls-dane-3.3.26-9.el7.x86_64 So gnutls: RHEL7.5: gnutls-3.3.26-9.el7.x86_64 RHEL7.6: gnutls-3.3.29-8.el7.x86_64 (doesnt load old RHV certs?) The certificate in question was generated in 2015 by an older engine. But it looks exact the same of a freshly generated one by a 4.2.7 engine. The only difference seems to be that the 2015 certificate uses signature algorithm sha1WithRSAEncryption instead of sha256WithRSAEncryption. When enabling debug on gnutls, it seems to stop here on function gnutls_x509_crt_list_import, returning non-zero: gnutls_x509_crt_import(certs[count], &tmp, GNUTLS_X509_FMT_PEM); if (ret < 0) { gnutls_assert(); <------------ goto error; On further inspection with GDB, it seems to be failing here, in function gnutls_x509_crt_import. result = _asn1_strict_der_decode(&cert->cert, cert->der.data, cert->der.size, NULL); if (result != ASN1_SUCCESS) { result = _gnutls_asn2err(result); gnutls_assert(); goto cleanup; } This is an inline function which I was not able to debug it further. Still, it seems to return ASN1_DER_ERROR. (gdb) finish Run till exit from #0 gnutls_x509_crt_import (cert=0x61d7c0, data=data@entry=0x7fffffffe190, format=format@entry=GNUTLS_X509_FMT_PEM) at x509.c:252 gnutls_x509_crt_list_import (certs=0x7fffffffe220, cert_max=0x7fffffffe20c, data=0x7fffffffe210, format=<optimized out>, flags=0) at x509.c:3495 3495 if (ret < 0) { Value returned is $1 = -69 Which means we are hitting this on GNU tls: #define GNUTLS_E_ASN1_DER_ERROR -69 Which is translated to this from libtasn1: ASN1_DER_ERROR And it means: "the der encoding doesn't match the structure ELEMENT." Doesn't look like related to sha1 vs sha256. I could not get into a conclusion. Is there a problem with old RHV CA Certs? Or gnutls/libtasn are broken?
Forgot this, also, pointing to the same error: # GNUTLS_DEBUG_LEVEL=9 libvirtd --listen <...> gnutls[3]: ASSERT: x509.c:311 <-- ASN1_DER_ERROR gnutls[3]: ASSERT: x509.c:3496
More info: In function gnutls_x509_crt_import, gnutls uses: RHEL 7.5 : v3.3.26 asn1_der_decoding(&cert->cert, cert->der.data, cert->der.size, NULL); RHEL 7.6 : v3.3.29 _asn1_strict_der_decode(&cert->cert, cert->der.data, cert->der.size, NULL); Here: https://gitlab.com/gnutls/gnutls/commit/5691dc3e4331000765c694bc101b445a80c0a9e2 So in RHEL 7.6, it is using libtasn1 "asn1_der_decoding2" function, with some sort of STRICT option, ASN1_DECODE_FLAG_STRICT_DER. Not sure if related, but: * If %ASN1_DECODE_FLAG_STRICT_DER flag is set then the function will * not decode any BER-encoded elements. Tried to 'watch' the result return value in gdb to find where its returning from, but its optimized out. So gnutls is using a different function, now from libtasn1, which for some reason does not like that certificate.
Ran out of time today... here is some more info I got. Not sure if related. Breaking down the certs, found 2 other differences. 1) from date (UTCTIME type) is slightly different: 110:d=3 hl=2 l= 17 prim: UTCTIME :150517165500+0000 94:d=3 hl=2 l= 13 prim: UTCTIME :181022044108Z 2) The working cert uses UTF8_STRING instead of PRINTABLE_STRING. Maybe I need to recompile libtasn1 to track it down more precisely where it returns ASN1_DER_ERROR, with optimizations is a bit hard.
(In reply to Germano Veit Michel from comment #6) > Ran out of time today... here is some more info I got. > > Not sure if related. Breaking down the certs, found 2 other differences. > > 1) from date (UTCTIME type) is slightly different: > 110:d=3 hl=2 l= 17 prim: UTCTIME :150517165500+0000 > 94:d=3 hl=2 l= 13 prim: UTCTIME :181022044108Z According to https://bugzilla.redhat.com/show_bug.cgi?id=1636023#c15 and https://github.com/openssl/openssl/pull/2668 , the issue is due to the malformed value for UTCTIME: it was acceptable in the past but now openssl and gnutls refuses it.
On the valid cert example we see: Validity Not Before: Oct 22 04:41:08 2018 GMT Not After : Oct 20 04:41:08 2028 GMT while on the bad one: Validity Not Before: May 17 16:55:00 2015 Not After : May 15 16:55:00 2025 GMT so the issue is specific to 'Not Before:' value. But in engine-setup we already have code to handle that case: https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/plugins/ovirt-engine-setup/ovirt-engine/pki/ca.py#L275 Now we have to understand if it never triggered or if it got correctly triggered on engine side but the CA cert was never redistributed to all the hosts.
According to engine-setup logs it seams that engine-setup correctly detected the issue and offered to start the PKI renewal process as for https://access.redhat.com/solutions/1572983 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human human.queryString:145 query OVESETUP_RENEW_PKI 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND One or more of the certificates should be renewed, because they expire soon, or include an invalid expiry date, or do not include the subjectAltName extension, which can cause them to be rejected by recent browsers. 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND If you choose "No", you will be asked again the next time you run Setup. 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND See https://access.redhat.com/solutions/1572983 for more details. 2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew certificates? (Yes, No) [No]: 2018-05-02 10:51:51 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2018-05-02 10:51:51 DEBUG otopi.context context.dumpEnvironment:770 ENV OVESETUP_PKI/renew=bool:'False' The user reject it at least 5 times: [root@t470s setup]# grep -R "Renew PKI" ovirt-engine-setup-20171213144824-oppz1d.log:2017-12-13 14:49:16 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False ovirt-engine-setup-20171213144427-y922jr.log:2017-12-13 14:45:36 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False ovirt-engine-setup-20180102112424-dmayx8.log:2018-01-02 11:26:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False ovirt-engine-setup-20180102182847-cf5zz9.log:2018-01-02 18:29:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False ovirt-engine-setup-20180502105103-z9uh1h.log:2018-05-02 10:52:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Renew PKI : False I have to add that the default response is No while Yes probably makes more sense now also due to this one.
My customer already has the ca.pem in the engine renewed and corrected long back. openssl x509 -in ca.pem -noout -dates notBefore=Mar 24 06:33:24 2017 GMT notAfter=Mar 22 06:33:24 2027 GMT But the host is still having the old CA. openssl x509 -in certs/cacert.pem -noout -dates notBefore=Jun 14 13:41:58 2015 notAfter=Jun 12 13:41:58 2025 GMT I think the engine-setup should automatically push the CA certificate automatically to the hosts instead of asking the user to manually enroll the certificate as it's only CA has been changed and the vdsm certificates are still valid. Can we do this automatically during engine-setup?
(In reply to nijin ashok from comment #15) > My customer already has the ca.pem in the engine renewed and corrected long back. ... > I think the engine-setup should automatically push the CA certificate > automatically to the hosts instead of asking the user to manually enroll the > certificate as it's only CA has been changed and the vdsm certificates are > still valid. Can we do this automatically during engine-setup? For sure not during engine-setup since the host has to be in maintenance since we have to restart libvirt and vdsm to make it effective and so the host has to be in maintenance. I think we should instead re-enroll host certs (just if needed?) during host upgrades at least if the upgrade is triggered from the engine. Martin, do you know if we are already re-enrolling certs in host upgrades lead by the engine? Nijin, do you know if the customer triggered host upgrade from the engine as recommended?
(In reply to Simone Tiraboschi from comment #16) > (In reply to nijin ashok from comment #15) > > My customer already has the ca.pem in the engine renewed and corrected long back. > > ... > > > I think the engine-setup should automatically push the CA certificate > > automatically to the hosts instead of asking the user to manually enroll the > > certificate as it's only CA has been changed and the vdsm certificates are > > still valid. Can we do this automatically during engine-setup? > > For sure not during engine-setup since the host has to be in maintenance > since we have to restart libvirt and vdsm to make it effective and so the > host has to be in maintenance. > I think we should instead re-enroll host certs (just if needed?) during host > upgrades at least if the upgrade is triggered from the engine. > > Martin, do you know if we are already re-enrolling certs in host upgrades > lead by the engine? > > Nijin, do you know if the customer triggered host upgrade from the engine as > recommended? We are enrolling CA certificate each time Reinstall or Enroll certification actions are executed on the host. We cannot do that automatically, becase we need to restart VDSM and libvirt to load new CA and that can be done only when host is in Maintenance. In theory we could enroll certificate also during host upgrade, but that's an RFE, because we would need to move the code around certificates enrollment from host-deploy to Ansible. So if really interested in this feature, could you please file an RFE for that?
*** Bug 1649367 has been marked as a duplicate of this bug. ***
Verified on ovirt-engine-4.3.0-0.5.alpha1.el7.noarch
In trying to fix this I downgraded all my libvirt packages to the previous release, 3.9.0-14.el7_5.8.x86_64. This still presented the same problem that libvirt would not start due to the certificate problem. So I went ahead and then downgraded all the gnutls packages to: gnutls-3.3.26-9.el7.x86_64.rpm gnutls-dane-3.3.26-9.el7.x86_64.rpm gnutls-utils-3.3.26-9.el7.x86_64.rpm After the downgrade of those packages, I was able to start the libvirtd 3.9.0-14.el7_5.8.x86_64 series. I did not try upgrading all the libvirt packages and keeping the old gnutls, but it is possible that with the latest libvirt packages and the old gnutls, the libvirt daemon will start just as it did for me with 3.9.0. I am already up and running and I will be getting rid of this RHEV setup next week as I get new servers, so not worth it for me to check it out, but perhaps it can help someone else.
(In reply to dijuremo from comment #21) > In trying to fix this I downgraded all my libvirt packages to the previous > release, 3.9.0-14.el7_5.8.x86_64. This still presented the same problem that > libvirt would not start due to the certificate problem. > > So I went ahead and then downgraded all the gnutls packages to: > > gnutls-3.3.26-9.el7.x86_64.rpm gnutls-dane-3.3.26-9.el7.x86_64.rpm > gnutls-utils-3.3.26-9.el7.x86_64.rpm > > After the downgrade of those packages, I was able to start the libvirtd > 3.9.0-14.el7_5.8.x86_64 series. I did not try upgrading all the libvirt > packages and keeping the old gnutls, but it is possible that with the latest > libvirt packages and the old gnutls, the libvirt daemon will start just as > it did for me with 3.9.0. > > I am already up and running and I will be getting rid of this RHEV setup > next week as I get new servers, so not worth it for me to check it out, but > perhaps it can help someone else. Just out of curiosity, why are you doing so complicated downgrade, when it's actually enough: 1. Run engine-setup and refresh CA certificate 2. Execute enroll certificate on all hosts 3. Execute upgrade on all hosts (if they are not already upgraded)
@Martin Perina, I am about to decommission these servers, never done this before (CA update), I am running a self-hosted engine on hyperconverge gluster storage. It works, but not without it's issues (have not updated from 3.5.9). So the last thing I want is to hit issues I do not know how to fix when the whole virtualization setup is going to be scraped within a week or two since we are moving to real hardware. We run only two Windows Server VMs in this infrastructure and the pain of maintaining and running this for just two VMs does not justify the complexity and time spent keeping it running. And as you can see, it is highly outdated, and migrating from 3.5 to 4.x requires a whole self-hosted engine update to RHEL 7, etc, so too much of a risk.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1085