Bug 1648190 - [RHEL76] libvirt is unable to start after upgrade due to malformed UTCTIME values in cacert.pem, because properly renewed CA certificate was not passed to hosts by executing "Enroll certificate" or "Reinstall"
Summary: [RHEL76] libvirt is unable to start after upgrade due to malformed UTCTIME va...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.3.0
: 4.3.0
Assignee: Simone Tiraboschi
QA Contact: Petr Matyáš
URL:
Whiteboard:
: 1649367 (view as bug list)
Depends On:
Blocks: 1649267
TreeView+ depends on / blocked
 
Reported: 2018-11-09 04:07 UTC by Germano Veit Michel
Modified: 2020-01-13 18:28 UTC (History)
23 users (show)

Fixed In Version: ovirt-engine-4.3.0_alpha
Doc Type: Enhancement
Doc Text:
Internal CAs generated in the past (<= 3.5) can contain UTCTIME values without timezone indication and this is not acceptable anymore with up to date openssl and gnutls libraries. engine-setup was already checking it proposing a remediation but the user can postpone it, making it more evident since now postponing can cause serious issues.
Clone Of:
: 1649267 (view as bug list)
Environment:
Last Closed: 2019-05-08 12:38:47 UTC
oVirt Team: Integration
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1636023 None CLOSED Install fails for a cluster imported into RHGS Console 2019-10-21 04:17:47 UTC
Red Hat Bugzilla 1649285 None CLOSED [RFE] Re-enroll host certificates during host upgrade 2019-10-21 04:17:46 UTC
Red Hat Knowledge Base (Solution) 3682921 None None None 2018-11-09 04:54:31 UTC
Red Hat Knowledge Base (Solution) 4731261 None None None 2020-01-13 18:28:32 UTC
Red Hat Product Errata RHEA-2019:1085 None None None 2019-05-08 12:39:02 UTC
oVirt gerrit 95348 master MERGED upgrade: pki: double check on skip of PKI renewal 2020-05-20 17:29:40 UTC

Internal Links: 1636023 1649285

Description Germano Veit Michel 2018-11-09 04:07:21 UTC
Description of problem:

After upgrading the host to latest RHVH 4.2.7 (RHEL7.6), libvirt fails to start due to being unable to open /etc/pki/vdsm/certs/cacert.pem. Its not a permission issue, gnutls is failing to open it.

Customer has old cacert.pem on host, generated in 2015. So most likely at 3.5 RHV-M install. There doesnt seem to be anything special on the customers cacert.pem, except it uses Sha1 instead of sha256. Still, the error seems to be something else (see additional info).

I could not come to a conclusion to check. If I place a freshly generated cacert.pem (by 4.2.7 engine), it works.

Version-Release number of selected component (if applicable):
FAIL:
gnutls-3.3.29-8.el7.x86_64      (7.6)
libvirt-4.5.0-10.el7_6.2.x86_64
OK:
gnutls-3.3.26-9.el7.x86_64      (7.5)
libvirt-4.5.0-10.el7_6.2.x86_64

How reproducible:
Always 

Steps to Reproduce:
1. Place provided cacert.pem on /etc/pki/vdsm/cacert.pem
2. Or compile and use provided code to simulate it, gets the same error.
   # gcc -o test test.c -lgnutls

Actual results:
Libvirt fails to start:
2018-11-09 02:06:45.139+0000: 12963: error : virNetTLSContextLoadCACertListFromFile:550 : Unable to import CA certificate list /etc/pki/vdsm/certs/cacert.pem

Expected results:
Libvirt starts

Additional info:
After upgrade of RHVH 4.2.3 to 4.2.7 (to RHEL 7.6), libvirtd wont start anymore:

error : virNetTLSContextLoadCACertListFromFile:550 : Unable to import CA certificate list /etc/pki/vdsm/certs/cacert.pem

The error comes from here:

    if (gnutls_x509_crt_list_import(certs, &certMax, &data, GNUTLS_X509_FMT_PEM, 0) < 0) {
        virReportError(VIR_ERR_SYSTEM_ERROR,
                       _("Unable to import CA certificate list %s"),
                       certFile);
        goto cleanup;
    }
 
This means the ca cert /etc/pki/vdsm/certs/cacert.pem cannot be loaded. The related packages in use are:
gnutls-3.3.29-8.el7.x86_64
libvirt-4.5.0-10.el7_6.2.x86_64

But if I downgrade gnutls to RHEL 7.5, it loads the cert just fine:
gnutls-utils-3.3.26-9.el7.x86_64
gnutls-3.3.26-9.el7.x86_64
gnutls-dane-3.3.26-9.el7.x86_64

So gnutls:
RHEL7.5: gnutls-3.3.26-9.el7.x86_64
RHEL7.6: gnutls-3.3.29-8.el7.x86_64 (doesnt load old RHV certs?)

The certificate in question was generated in 2015 by an older engine. But it looks exact the same of a freshly generated one by a 4.2.7 engine.
The only difference seems to be that the 2015 certificate uses signature algorithm sha1WithRSAEncryption instead of sha256WithRSAEncryption.

When enabling debug on gnutls, it seems to stop here on function gnutls_x509_crt_list_import, returning non-zero:

                            gnutls_x509_crt_import(certs[count], &tmp,
                                                   GNUTLS_X509_FMT_PEM);
                        if (ret < 0) {
                                gnutls_assert();       <------------
                                goto error;

On further inspection with GDB, it seems to be failing here, in function gnutls_x509_crt_import.

        result =
            _asn1_strict_der_decode(&cert->cert, cert->der.data, cert->der.size, NULL);
        if (result != ASN1_SUCCESS) {
                result = _gnutls_asn2err(result);
                gnutls_assert();
                goto cleanup;
        }

This is an inline function which I was not able to debug it further. Still, it seems to return ASN1_DER_ERROR.

(gdb) finish
Run till exit from #0  gnutls_x509_crt_import (cert=0x61d7c0, data=data@entry=0x7fffffffe190, format=format@entry=GNUTLS_X509_FMT_PEM) at x509.c:252
gnutls_x509_crt_list_import (certs=0x7fffffffe220, cert_max=0x7fffffffe20c, data=0x7fffffffe210, format=<optimized out>, flags=0) at x509.c:3495
3495				if (ret < 0) {
Value returned is $1 = -69

Which means we are hitting this on GNU tls:
#define GNUTLS_E_ASN1_DER_ERROR -69

Which is translated to this from libtasn1: ASN1_DER_ERROR

And it means: "the der encoding doesn't match the structure ELEMENT."

Doesn't look like related to sha1 vs sha256. I could not get into a conclusion. Is there a problem with old RHV CA Certs? Or gnutls/libtasn are broken?

Comment 1 Germano Veit Michel 2018-11-09 04:08:42 UTC
Forgot this, also, pointing to the same error:

# GNUTLS_DEBUG_LEVEL=9 libvirtd --listen
<...>
gnutls[3]: ASSERT: x509.c:311  <-- ASN1_DER_ERROR
gnutls[3]: ASSERT: x509.c:3496

Comment 5 Germano Veit Michel 2018-11-09 06:32:16 UTC
More info:

In function gnutls_x509_crt_import, gnutls uses:

RHEL 7.5 : v3.3.26
   asn1_der_decoding(&cert->cert, cert->der.data, cert->der.size, NULL);
RHEL 7.6 : v3.3.29
   _asn1_strict_der_decode(&cert->cert, cert->der.data, cert->der.size, NULL);

Here: https://gitlab.com/gnutls/gnutls/commit/5691dc3e4331000765c694bc101b445a80c0a9e2

So in RHEL 7.6, it is using libtasn1 "asn1_der_decoding2" function, with some sort of STRICT option, ASN1_DECODE_FLAG_STRICT_DER.

Not sure if related, but:

 * If %ASN1_DECODE_FLAG_STRICT_DER flag is set then the function will
 * not decode any BER-encoded elements.

Tried to 'watch' the result return value in gdb to find where its returning from, but its optimized out.

So gnutls is using a different function, now from libtasn1, which for some reason does not like that certificate.

Comment 6 Germano Veit Michel 2018-11-09 06:52:59 UTC
Ran out of time today... here is some more info I got.

Not sure if related. Breaking down the certs, found 2 other differences. 

1) from date (UTCTIME type) is slightly different:
  110:d=3  hl=2 l=  17 prim:    UTCTIME           :150517165500+0000
   94:d=3  hl=2 l=  13 prim:    UTCTIME           :181022044108Z

2) The working cert uses UTF8_STRING instead of PRINTABLE_STRING.

Maybe I need to recompile libtasn1 to track it down more precisely where it returns ASN1_DER_ERROR, with optimizations is a bit hard.

Comment 7 Simone Tiraboschi 2018-11-09 09:07:19 UTC
(In reply to Germano Veit Michel from comment #6)
> Ran out of time today... here is some more info I got.
> 
> Not sure if related. Breaking down the certs, found 2 other differences. 
> 
> 1) from date (UTCTIME type) is slightly different:
>   110:d=3  hl=2 l=  17 prim:    UTCTIME           :150517165500+0000
>    94:d=3  hl=2 l=  13 prim:    UTCTIME           :181022044108Z

According to https://bugzilla.redhat.com/show_bug.cgi?id=1636023#c15 and
 https://github.com/openssl/openssl/pull/2668 , the issue is due to the malformed value for UTCTIME: it was acceptable in the past but now openssl and gnutls refuses it.

Comment 9 Simone Tiraboschi 2018-11-09 09:55:25 UTC
On the valid cert example we see:
        Validity
            Not Before: Oct 22 04:41:08 2018 GMT
            Not After : Oct 20 04:41:08 2028 GMT

while on the bad one:
        Validity
            Not Before: May 17 16:55:00 2015
            Not After : May 15 16:55:00 2025 GMT

so the issue is specific to 'Not Before:' value.

But in engine-setup we already have code to handle that case:
https://github.com/oVirt/ovirt-engine/blob/master/packaging/setup/plugins/ovirt-engine-setup/ovirt-engine/pki/ca.py#L275

Now we have to understand if it never triggered or if it got correctly triggered on engine side but the CA cert was never redistributed to all the hosts.

Comment 14 Simone Tiraboschi 2018-11-09 11:02:31 UTC
According to engine-setup logs it seams that engine-setup correctly detected the issue and offered to start the PKI renewal process as for https://access.redhat.com/solutions/1572983

2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human human.queryString:145 query OVESETUP_RENEW_PKI
2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 One or more of the certificates should be renewed, because they expire soon, or include an invalid expiry date, or do not include the subjectAltName extension, which can cause them to be rejected by recent browsers.
2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 If you choose "No", you will be asked again the next time you run Setup.
2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 See https://access.redhat.com/solutions/1572983 for more details.
2018-05-02 10:51:51 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew certificates? (Yes, No) [No]:
2018-05-02 10:51:51 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN
2018-05-02 10:51:51 DEBUG otopi.context context.dumpEnvironment:770 ENV OVESETUP_PKI/renew=bool:'False'


The user reject it at least 5 times:
[root@t470s setup]# grep -R "Renew PKI"
ovirt-engine-setup-20171213144824-oppz1d.log:2017-12-13 14:49:16 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False
ovirt-engine-setup-20171213144427-y922jr.log:2017-12-13 14:45:36 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False
ovirt-engine-setup-20180102112424-dmayx8.log:2018-01-02 11:26:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False
ovirt-engine-setup-20180102182847-cf5zz9.log:2018-01-02 18:29:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False
ovirt-engine-setup-20180502105103-z9uh1h.log:2018-05-02 10:52:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND                 Renew PKI                               : False

I have to add that the default response is No while Yes probably makes more sense now also due to this one.

Comment 15 nijin ashok 2018-11-13 04:33:37 UTC
My customer already has the ca.pem in the engine renewed and corrected long back.

openssl x509 -in ca.pem -noout  -dates
notBefore=Mar 24 06:33:24 2017 GMT
notAfter=Mar 22 06:33:24 2027 GMT

But the host is still having the old CA.

openssl x509 -in certs/cacert.pem -noout  -dates
notBefore=Jun 14 13:41:58 2015
notAfter=Jun 12 13:41:58 2025 GMT

I think the engine-setup should automatically push the CA certificate automatically to the hosts instead of asking the user to manually enroll the certificate as it's only CA has been changed and the vdsm certificates are still valid. Can we do this automatically during engine-setup?

Comment 16 Simone Tiraboschi 2018-11-13 09:43:17 UTC
(In reply to nijin ashok from comment #15)
> My customer already has the ca.pem in the engine renewed and corrected long back.

...

> I think the engine-setup should automatically push the CA certificate
> automatically to the hosts instead of asking the user to manually enroll the
> certificate as it's only CA has been changed and the vdsm certificates are
> still valid. Can we do this automatically during engine-setup?

For sure not during engine-setup since the host has to be in maintenance since we have to restart libvirt and vdsm to make it effective and so the host has to be in maintenance.
I think we should instead re-enroll host certs (just if needed?) during host upgrades at least if the upgrade is triggered from the engine.

Martin, do you know if we are already re-enrolling certs in host upgrades lead by the engine?

Nijin, do you know if the customer triggered host upgrade from the engine as recommended?

Comment 18 Martin Perina 2018-11-13 09:51:11 UTC
(In reply to Simone Tiraboschi from comment #16)
> (In reply to nijin ashok from comment #15)
> > My customer already has the ca.pem in the engine renewed and corrected long back.
> 
> ...
> 
> > I think the engine-setup should automatically push the CA certificate
> > automatically to the hosts instead of asking the user to manually enroll the
> > certificate as it's only CA has been changed and the vdsm certificates are
> > still valid. Can we do this automatically during engine-setup?
> 
> For sure not during engine-setup since the host has to be in maintenance
> since we have to restart libvirt and vdsm to make it effective and so the
> host has to be in maintenance.
> I think we should instead re-enroll host certs (just if needed?) during host
> upgrades at least if the upgrade is triggered from the engine.
> 
> Martin, do you know if we are already re-enrolling certs in host upgrades
> lead by the engine?
> 
> Nijin, do you know if the customer triggered host upgrade from the engine as
> recommended?

We are enrolling CA certificate each time Reinstall or Enroll certification actions are executed on the host. We cannot do that automatically, becase we need to restart VDSM and libvirt to load new CA and that can be done only when host is in Maintenance.

In theory we could enroll certificate also during host upgrade, but that's an RFE, because we would need to move the code around certificates enrollment from host-deploy to Ansible. So if really interested in this feature, could you please file an RFE for that?

Comment 19 Martin Perina 2018-11-14 11:50:41 UTC
*** Bug 1649367 has been marked as a duplicate of this bug. ***

Comment 20 Petr Matyáš 2018-12-06 13:59:39 UTC
Verified on ovirt-engine-4.3.0-0.5.alpha1.el7.noarch

Comment 21 dijuremo 2018-12-15 06:59:00 UTC
In trying to fix this I downgraded all my libvirt packages to the previous release, 3.9.0-14.el7_5.8.x86_64. This still presented the same problem that libvirt would not start due to the certificate problem.

So I went ahead and then downgraded all the gnutls packages to:

gnutls-3.3.26-9.el7.x86_64.rpm  gnutls-dane-3.3.26-9.el7.x86_64.rpm  gnutls-utils-3.3.26-9.el7.x86_64.rpm

After the downgrade of those packages, I was able to start the libvirtd 3.9.0-14.el7_5.8.x86_64 series. I did not try upgrading all the libvirt packages and keeping the old gnutls, but it is possible that with the latest libvirt packages and the old gnutls, the libvirt daemon will start just as it did for me with 3.9.0. 

I am already up and running and I will be getting rid of this RHEV setup next week as I get new servers, so not worth it for me to check it out, but perhaps it can help someone else.

Comment 22 Martin Perina 2018-12-15 11:29:57 UTC
(In reply to dijuremo from comment #21)
> In trying to fix this I downgraded all my libvirt packages to the previous
> release, 3.9.0-14.el7_5.8.x86_64. This still presented the same problem that
> libvirt would not start due to the certificate problem.
> 
> So I went ahead and then downgraded all the gnutls packages to:
> 
> gnutls-3.3.26-9.el7.x86_64.rpm  gnutls-dane-3.3.26-9.el7.x86_64.rpm 
> gnutls-utils-3.3.26-9.el7.x86_64.rpm
> 
> After the downgrade of those packages, I was able to start the libvirtd
> 3.9.0-14.el7_5.8.x86_64 series. I did not try upgrading all the libvirt
> packages and keeping the old gnutls, but it is possible that with the latest
> libvirt packages and the old gnutls, the libvirt daemon will start just as
> it did for me with 3.9.0. 
> 
> I am already up and running and I will be getting rid of this RHEV setup
> next week as I get new servers, so not worth it for me to check it out, but
> perhaps it can help someone else.

Just out of curiosity, why are you doing so complicated downgrade, when it's actually enough:

1. Run engine-setup and refresh CA certificate
2. Execute enroll certificate on all hosts
3. Execute upgrade on all hosts (if they are not already upgraded)

Comment 23 dijuremo 2018-12-17 12:03:15 UTC
@Martin Perina, I am about to decommission these servers, never done this before (CA update), I am running a self-hosted engine on hyperconverge gluster storage. It works, but not without it's issues (have not updated from 3.5.9). So the last thing I want is to hit issues I do not know how to fix when the whole virtualization setup is going to be scraped within a week or two since we are moving to real hardware. We run only two Windows Server VMs in this infrastructure and the pain of maintaining and running this for just two VMs does not justify the complexity and time spent keeping it running. And as you can see, it is highly outdated, and migrating from 3.5 to 4.x requires a whole self-hosted engine update to RHEL 7, etc, so too much of a risk.

Comment 25 errata-xmlrpc 2019-05-08 12:38:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085


Note You need to log in before you can comment on or make changes to this bug.