Bug 2056126 - [RFE] Extend time to warn of upcoming certificate expiration
Summary: [RFE] Extend time to warn of upcoming certificate expiration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.10
Hardware: All
OS: All
high
high
Target Milestone: ovirt-4.5.0
: 4.5.0
Assignee: Martin Perina
QA Contact: Pavol Brilla
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-18 22:05 UTC by Greg Scott
Modified: 2022-08-28 10:23 UTC (History)
11 users (show)

Fixed In Version: ovirt-engine-4.5.0
Doc Type: Release Note
Doc Text:
With this release, the Red Hat Virtualization Manager 4.4 SP1 certificate expiration check will warn of upcoming certificate expiration earlier: 1. If a certificate is about to expire in the upcoming 120 days, a WARNING event is raised in the audit log. 2. If a certificate is about to expire in the upcoming 30 days, an ALERT event is raised in the audit log. This checks for internal RHV certificates (for example certificate for RHVM <-> hypervisor communication), but it doesn't check for custom certificates configured for HTTPS access to RHVM as configured according to [https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/administration_guide/index#Replacing_the_Manager_CA_Certificate](https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/administration_guide/index#Replacing_the_Manager_CA_Certificate)
Clone Of:
Environment:
Last Closed: 2022-05-26 16:23:53 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 117 0 None open core: Extend time to warn of upcoming certificate expiration 2022-03-04 07:58:15 UTC
Red Hat Issue Tracker RHV-44722 0 None None None 2022-02-18 22:06:35 UTC
Red Hat Knowledge Base (Solution) 3532921 0 None None None 2022-03-02 16:19:21 UTC
Red Hat Knowledge Base (Solution) 6865861 0 None None None 2022-03-31 00:34:22 UTC
Red Hat Product Errata RHSA-2022:4711 0 None None None 2022-05-26 16:24:10 UTC

Description Greg Scott 2022-02-18 22:05:02 UTC
Description of problem:
The new standard for certificates apparently wants certificates to expire roughly annually. The days of certificates lasting five years are over. Since certificates expire quickly now, RHV needs an easy to manage them without disrupting entire environments every few months.

Version-Release number of selected component (if applicable):
all

How reproducible:
At will

Steps to Reproduce:
1. Install a RHV environment.
2. Run it steady-state for around 13 months.
3. The certificates expire.

Actual results:
VM consoles stop working. Live-migrations also stop. Hypervisors to non-operational. Admins lose all ability to manage anything.

Expected results:
The simple passage of time should not trash any environment.

Additional info:

RHV does notify about expiring certificates, months in advance. But the notifications are buried with all other events and at least one RHV admin (me) did not notice them until it was too late. One idea to improve notification for expiring certificates might be to set a threshold of so many days before expiration, and then generate an email until they renew.

Renewals right now are disruptive because the admin must migrate all VMs away from a hypervisor and then put it in maintenance mode and update the certificates. It's also labor intensive. And then to update RHVM certificates, admins must run engine-setup again.

RHV needs an "easy button" to make all this work without the associated hassles. Or even better, a way to automate certificate renewal before they expire.

Comment 1 Martin Perina 2022-02-28 11:07:39 UTC
RHV Manager already contains quite sophisticated ways  to check for certificate expiration.
RHVM performs certificate checks every day (can be configured using
engine-config option CertificationValidityCheckTimeInHours) and it checks
not only hosts certificates, but also the engine certificate and the engine
CA certificate. This check produces following records in ovirt-engine audit
log:

1. If the certificate has already expired then below audit log ALERT is
created depending on the type of certificate
    - Host ${VdsName} certification has expired at ${ExpirationDate}. Please renew the host's certification.
    - Engine's certification has expired at ${ExpirationDate}. Please renew the engine's certification.
    - Engine's CA certification has expired at ${ExpirationDate}.

2. If the certificate is going to expire in less than 7 days, then below audit log ALERT is created depending on the type of certificate
    - Host ${VdsName} certification is about to expire at ${ExpirationDate}. Please renew the host's certification.
    - Engine's certification is about to expire at ${ExpirationDate}. Please renew the engine's certification.
    - Engine's CA certification is about to expire at ${ExpirationDate}.

3. If the certificate is going to expire in less than 30 days, then below audit log WARNING is created depending on the type of certificate
    - Host ${VdsName} certification is about to expire at ${ExpirationDate}. Please renew the host's certification.
    - Engine's certification is about to expire at ${ExpirationDate}. Please renew the engine's certification.
    - Engine's CA certification is about to expire at ${ExpirationDate}.

So from my point of view there is enough warnings about upcoming certificate expiration, so now let's take a look at certificate renewal:

1. There's Enroll Certificate action available from both UI and RESTAPI, which renews host certificates which are going to expire
2. During each Host upgrade action (again available from both UI and RESTAPI and also part of cluster_upgrade Ansible role) we are checking if certificate is going to expire soon or is already expired and if so, certificates are renewed during host upgrade
3. We are checking engine certificate validity during each engine-setup execution and if the certificate is going to expire soon and is expired we are renewing that certificate. The same check along with renewal is happening also for engine CA certificate

For certificate renewal the host needs to be in Maintenance, because loading the renewed certificate requires restart of services (VDSM, libvirt, ...), so it cannot be performed when the host is Up.


Now regarding UI: after logging to webamin you are going to be redirected to Dashboard and on the right you can see a box called Events. Here you have a 3 links: Alerts, Errors and all Events. So it very easy to click on Alerts, which will redirect you to Events where only Alerts are shown.

Regarding email notification, you can easily set notifier to send for example Alert events via SMTP to defined email address:
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/administration_guide/chap-event_notifications


So from my personal point of view I believe that we have enough options for administrator to be alerted that hypervisors certificates are going to expire soon.

Comment 2 Greg Scott 2022-03-01 18:33:40 UTC
RHV does a great job checking for certificates. But RHV needs to improve on notifying about them. 

First up is email notifications - many major customers prohibit rogue SMTP servers, and so email notification cannot happen with these customers. But even if it could, we don't document that 13 month certificate lifetime anywhere, and so why would anyone set up any notification for it?

And that leads to events and warnings. With every login, there are a number of events and warnings, and the plain truth is, many admins don't check them. Until one day, 13 months after the last upgrade, when the entire RHV environment dies for no apparent reason, and we consume a support team for hours or days while a whole customer company shuts down. Try telling the CEO of a major organization how an outage like this is their fault because their admin didn't recognize that a pesky security warning they've seen for the past few weeks, buried with lots of other pesky warnings, shut them down.

This should be easy to address. Ideally, the whole certificate renewal process should be automated. But if automated renewals are not feasible, then at least, instead of blaming the victim, why not warn them? 

When, say, half a certificate's lifetime is gone, take new admin portal logins to a screen with a warning that says the certificates will expire on [date], this will break your RHV environment, and here is what you must do to avoid all that bad stuff. Click here to continue onto the admin portal.

One simple warning like that could save lots of grief.

Comment 3 Michal Skrivanek 2022-03-02 16:13:33 UTC
let's renew certificates automatically that are bound to expire 4 months in advance.
let's alert a month before expiration

that should be doable by 4.5
nack on any other changes, too late for that

Comment 15 Greg Scott 2022-03-31 00:34:22 UTC
See https://access.redhat.com/solutions/6865861

Comment 21 Pavol Brilla 2022-05-04 12:19:39 UTC
Engine/Host/CA certs are giving warnings less than 120 days before expiration in event log and alerts in period shorter than 30 days. 

Verified on Software Version:4.5.0.5-0.7.el8ev

Comment 22 Michal Skrivanek 2022-05-10 12:37:44 UTC
note that bug 2079890 in ovirt-engine-4.5.0.8 is changing the warning (and renew) interval to 365 days

Comment 27 errata-xmlrpc 2022-05-26 16:23:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711


Note You need to log in before you can comment on or make changes to this bug.