Bug 1649267 - [downstream clone - 4.2.8] [RHEL76] libvirt is unable to start after upgrade due to malformed UTCTIME values in cacert.pem, because properly renewed CA certificate was not passed to hosts by executing "Enroll certificate" or "Reinstall"
Summary: [downstream clone - 4.2.8] [RHEL76] libvirt is unable to start after upgrade ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.2.8
: ---
Assignee: Simone Tiraboschi
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On: 1648190
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-13 09:44 UTC by RHV bug bot
Modified: 2021-12-10 18:26 UTC (History)
20 users (show)

Fixed In Version: ovirt-engine-4.2.8.1
Doc Type: Enhancement
Doc Text:
Internal CAs generated in the past (<= 3.5) can contain UTCTIME values without timezone indication and this is not acceptable anymore with up to date openssl and gnutls libraries. engine-setup was already checking it proposing a remediation but the user can postpone it, making it more evident since now postponing can cause serious issues.
Clone Of: 1648190
Environment:
Last Closed: 2019-01-22 12:44:51 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44338 0 None None None 2021-12-10 18:26:29 UTC
Red Hat Knowledge Base (Solution) 3682921 0 None None None 2018-11-13 09:46:52 UTC
Red Hat Product Errata RHBA-2019:0121 0 None None None 2019-01-22 12:44:59 UTC
oVirt gerrit 95348 0 master MERGED upgrade: pki: double check on skip of PKI renewal 2021-01-01 03:53:14 UTC
oVirt gerrit 95405 0 ovirt-engine-4.2 MERGED upgrade: pki: double check on skip of PKI renewal 2021-01-01 03:53:14 UTC

Description RHV bug bot 2018-11-13 09:44:54 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1648190 +++
======================================================================

Description of problem:

After upgrading the host to latest RHVH 4.2.7 (RHEL7.6), libvirt fails to start due to being unable to open /etc/pki/vdsm/certs/cacert.pem. Its not a permission issue, gnutls is failing to open it.

Customer has old cacert.pem on host, generated in 2015. So most likely at 3.5 RHV-M install. There doesnt seem to be anything special on the customers cacert.pem, except it uses Sha1 instead of sha256. Still, the error seems to be something else (see additional info).

I could not come to a conclusion to check. If I place a freshly generated cacert.pem (by 4.2.7 engine), it works.

Version-Release number of selected component (if applicable):
FAIL:
gnutls-3.3.29-8.el7.x86_64      (7.6)
libvirt-4.5.0-10.el7_6.2.x86_64
OK:
gnutls-3.3.26-9.el7.x86_64      (7.5)
libvirt-4.5.0-10.el7_6.2.x86_64

How reproducible:
Always 

Steps to Reproduce:
1. Place provided cacert.pem on /etc/pki/vdsm/cacert.pem
2. Or compile and use provided code to simulate it, gets the same error.
   # gcc -o test test.c -lgnutls

Actual results:
Libvirt fails to start:
2018-11-09 02:06:45.139+0000: 12963: error : virNetTLSContextLoadCACertListFromFile:550 : Unable to import CA certificate list /etc/pki/vdsm/certs/cacert.pem

Expected results:
Libvirt starts

Additional info:
After upgrade of RHVH 4.2.3 to 4.2.7 (to RHEL 7.6), libvirtd wont start anymore:

error : virNetTLSContextLoadCACertListFromFile:550 : Unable to import CA certificate list /etc/pki/vdsm/certs/cacert.pem

The error comes from here:

    if (gnutls_x509_crt_list_import(certs, &certMax, &data, GNUTLS_X509_FMT_PEM, 0) < 0) {
        virReportError(VIR_ERR_SYSTEM_ERROR,
                       _("Unable to import CA certificate list %s"),
                       certFile);
        goto cleanup;
    }
 
This means the ca cert /etc/pki/vdsm/certs/cacert.pem cannot be loaded. The related packages in use are:
gnutls-3.3.29-8.el7.x86_64
libvirt-4.5.0-10.el7_6.2.x86_64

But if I downgrade gnutls to RHEL 7.5, it loads the cert just fine:
gnutls-utils-3.3.26-9.el7.x86_64
gnutls-3.3.26-9.el7.x86_64
gnutls-dane-3.3.26-9.el7.x86_64

So gnutls:
RHEL7.5: gnutls-3.3.26-9.el7.x86_64
RHEL7.6: gnutls-3.3.29-8.el7.x86_64 (doesnt load old RHV certs?)

The certificate in question was generated in 2015 by an older engine. But it looks exact the same of a freshly generated one by a 4.2.7 engine.
The only difference seems to be that the 2015 certificate uses signature algorithm sha1WithRSAEncryption instead of sha256WithRSAEncryption.

When enabling debug on gnutls, it seems to stop here on function gnutls_x509_crt_list_import, returning non-zero:

                            gnutls_x509_crt_import(certs[count], &tmp,
                                                   GNUTLS_X509_FMT_PEM);
                        if (ret < 0) {
                                gnutls_assert();       <------------
                                goto error;

On further inspection with GDB, it seems to be failing here, in function gnutls_x509_crt_import.

        result =
            _asn1_strict_der_decode(&cert->cert, cert->der.data, cert->der.size, NULL);
        if (result != ASN1_SUCCESS) {
                result = _gnutls_asn2err(result);
                gnutls_assert();
                goto cleanup;
        }

This is an inline function which I was not able to debug it further. Still, it seems to return ASN1_DER_ERROR.

(gdb) finish
Run till exit from #0  gnutls_x509_crt_import (cert=0x61d7c0, data=data@entry=0x7fffffffe190, format=format@entry=GNUTLS_X509_FMT_PEM) at x509.c:252
gnutls_x509_crt_list_import (certs=0x7fffffffe220, cert_max=0x7fffffffe20c, data=0x7fffffffe210, format=<optimized out>, flags=0) at x509.c:3495
3495				if (ret < 0) {
Value returned is $1 = -69

Which means we are hitting this on GNU tls:
#define GNUTLS_E_ASN1_DER_ERROR -69

Which is translated to this from libtasn1: ASN1_DER_ERROR

And it means: "the der encoding doesn't match the structure ELEMENT."

Doesn't look like related to sha1 vs sha256. I could not get into a conclusion. Is there a problem with old RHV CA Certs? Or gnutls/libtasn are broken?

(Originally by Germano Veit Michel)

Comment 1 RHV bug bot 2018-11-13 09:45:08 UTC
Forgot this, also, pointing to the same error:

# GNUTLS_DEBUG_LEVEL=9 libvirtd --listen
<...>
gnutls[3]: ASSERT: x509.c:311  <-- ASN1_DER_ERROR
gnutls[3]: ASSERT: x509.c:3496

(Originally by Germano Veit Michel)

Comment 5 RHV bug bot 2018-11-13 09:45:34 UTC
More info:

In function gnutls_x509_crt_import, gnutls uses:

RHEL 7.5 : v3.3.26
   asn1_der_decoding(&cert->cert, cert->der.data, cert->der.size, NULL);
RHEL 7.6 : v3.3.29
   _asn1_strict_der_decode(&cert->cert, cert->der.data, cert->der.size, NULL);

Here: https://gitlab.com/gnutls/gnutls/commit/5691dc3e4331000765c694bc101b445a80c0a9e2

So in RHEL 7.6, it is using libtasn1 "asn1_der_decoding2" function, with some sort of STRICT option, ASN1_DECODE_FLAG_STRICT_DER.

Not sure if related, but:

 * If %ASN1_DECODE_FLAG_STRICT_DER flag is set then the function will
 * not decode any BER-encoded elements.

Tried to 'watch' the result return value in gdb to find where its returning from, but its optimized out.

So gnutls is using a different function, now from libtasn1, which for some reason does not like that certificate.

(Originally by Germano Veit Michel)

Comment 6 RHV bug bot 2018-11-13 09:45:39 UTC
Ran out of time today... here is some more info I got.

Not sure if related. Breaking down the certs, found 2 other differences. 

1) from date (UTCTIME type) is slightly different:
  110:d=3  hl=2 l=  17 prim:    UTCTIME           :150517165500+0000
   94:d=3  hl=2 l=  13 prim:    UTCTIME           :181022044108Z

2) The working cert uses UTF8_STRING instead of PRINTABLE_STRING.

Maybe I need to recompile libtasn1 to track it down more precisely where it returns ASN1_DER_ERROR, with optimizations is a bit hard.

(Originally by Germano Veit Michel)

Comment 7 RHV bug bot 2018-11-13 09:45:45 UTC
(In reply to Germano Veit Michel from comment #6)
> Ran out of time today... here is some more info I got.
> 
> Not sure if related. Breaking down the certs, found 2 other differences. 
> 
> 1) from date (UTCTIME type) is slightly different:
>   110:d=3  hl=2 l=  17 prim:    UTCTIME           :150517165500+0000
>    94:d=3  hl=2 l=  13 prim:    UTCTIME           :181022044108Z

According to https://bugzilla.redhat.com/show_bug.cgi?id=1636023#c15 and
 https://github.com/openssl/openssl/pull/2668 , the issue is due to the malformed value for UTCTIME: it was acceptable in the past but now openssl and gnutls refuses it.

(Originally by Simone Tiraboschi)

Comment 9 RHV bug bot 2018-11-13 09:45:57 UTC
On the valid cert example we see:
        Validity
            Not Before: Oct 22 04:41:08 2018 GMT
            Not After : Oct 20 04:41:08 2028 GMT

while on the bad one:
        Validity
            Not Before: May 17 16:55:00 2015
            Not After : May 15 16:55:00 2025 GMT

so the issue is specific to 'Not Before:' value.

But in engine-setup we already have code to handle that case:
https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/plugins/ovirt-engine-setup/ovirt-engine/pki/ca.py#L275

Now we have to understand if it never triggered or if it got correctly triggered on engine side but the CA cert was never redistributed to all the hosts.

(Originally by Simone Tiraboschi)

Comment 14 RHV bug bot 2018-11-13 09:46:25 UTC
According to engine-setup logs it seams that engine-setup correctly detected the issue and offered to start the PKI renewal process as for https://access.redhat.com/solutions/1572983

2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human human.queryString:145 query OVESETUP_RENEW_PKI
2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 One or more of the certificates should be renewed, because they expire soon, or include an invalid expiry date, or do not include the subjectAltName extension, which can cause them to be rejected by recent browsers.
2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 If you choose "No", you will be asked again the next time you run Setup.
2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 See https://access.redhat.com/solutions/1572983 for more details.
2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew certificates? (Yes, No) [No]:
2018-05-02 10:51:51 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN
2018-05-02 10:51:51 DEBUG otopi.context context.dumpEnvironment:770 ENV OVESETUP_PKI/renew=bool:'False'


The user reject it at least 5 times:
[root@t470s setup]# grep -R "Renew PKI"
ovirt-engine-setup-20171213144824-oppz1d.log:2017-12-13 14:49:16 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False
ovirt-engine-setup-20171213144427-y922jr.log:2017-12-13 14:45:36 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False
ovirt-engine-setup-20180102112424-dmayx8.log:2018-01-02 11:26:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False
ovirt-engine-setup-20180102182847-cf5zz9.log:2018-01-02 18:29:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False
ovirt-engine-setup-20180502105103-z9uh1h.log:2018-05-02 10:52:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False

I have to add that the default response is No while Yes probably makes more sense now also due to this one.

(Originally by Simone Tiraboschi)

Comment 15 RHV bug bot 2018-11-13 09:46:34 UTC
My customer already has the ca.pem in the engine renewed and corrected long back.

openssl x509 -in ca.pem -noout  -dates
notBefore=Mar 24 06:33:24 2017 GMT
notAfter=Mar 22 06:33:24 2027 GMT

But the host is still having the old CA.

openssl x509 -in certs/cacert.pem -noout  -dates
notBefore=Jun 14 13:41:58 2015
notAfter=Jun 12 13:41:58 2025 GMT

I think the engine-setup should automatically push the CA certificate automatically to the hosts instead of asking the user to manually enroll the certificate as it's only CA has been changed and the vdsm certificates are still valid. Can we do this automatically during engine-setup?

(Originally by Nijin Ashok)

Comment 16 RHV bug bot 2018-11-13 09:46:39 UTC
(In reply to nijin ashok from comment #15)
> My customer already has the ca.pem in the engine renewed and corrected long back.

...

> I think the engine-setup should automatically push the CA certificate
> automatically to the hosts instead of asking the user to manually enroll the
> certificate as it's only CA has been changed and the vdsm certificates are
> still valid. Can we do this automatically during engine-setup?

For sure not during engine-setup since the host has to be in maintenance since we have to restart libvirt and vdsm to make it effective and so the host has to be in maintenance.
I think we should instead re-enroll host certs (just if needed?) during host upgrades at least if the upgrade is triggered from the engine.

Martin, do you know if we are already re-enrolling certs in host upgrades lead by the engine?

Nijin, do you know if the customer triggered host upgrade from the engine as recommended?

(Originally by Simone Tiraboschi)

Comment 18 Petr Matyáš 2018-12-04 13:02:48 UTC
What are the reproduction steps or how do I reach the change that is linked to this bug? Should the bad certificate linked to this bug work correctly?

Comment 19 Petr Matyáš 2018-12-06 13:56:49 UTC
Verified on ovirt-engine-4.2.8.1-0.0.master.20181127093701.gite34295b.el7.noarch

Comment 21 errata-xmlrpc 2019-01-22 12:44:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0121


Note You need to log in before you can comment on or make changes to this bug.