Description of problem: The new standard for certificates apparently wants certificates to expire roughly annually. The days of certificates lasting five years are over. Since certificates expire quickly now, RHV needs an easy to manage them without disrupting entire environments every few months. Version-Release number of selected component (if applicable): all How reproducible: At will Steps to Reproduce: 1. Install a RHV environment. 2. Run it steady-state for around 13 months. 3. The certificates expire. Actual results: VM consoles stop working. Live-migrations also stop. Hypervisors to non-operational. Admins lose all ability to manage anything. Expected results: The simple passage of time should not trash any environment. Additional info: RHV does notify about expiring certificates, months in advance. But the notifications are buried with all other events and at least one RHV admin (me) did not notice them until it was too late. One idea to improve notification for expiring certificates might be to set a threshold of so many days before expiration, and then generate an email until they renew. Renewals right now are disruptive because the admin must migrate all VMs away from a hypervisor and then put it in maintenance mode and update the certificates. It's also labor intensive. And then to update RHVM certificates, admins must run engine-setup again. RHV needs an "easy button" to make all this work without the associated hassles. Or even better, a way to automate certificate renewal before they expire.
RHV Manager already contains quite sophisticated ways to check for certificate expiration. RHVM performs certificate checks every day (can be configured using engine-config option CertificationValidityCheckTimeInHours) and it checks not only hosts certificates, but also the engine certificate and the engine CA certificate. This check produces following records in ovirt-engine audit log: 1. If the certificate has already expired then below audit log ALERT is created depending on the type of certificate - Host ${VdsName} certification has expired at ${ExpirationDate}. Please renew the host's certification. - Engine's certification has expired at ${ExpirationDate}. Please renew the engine's certification. - Engine's CA certification has expired at ${ExpirationDate}. 2. If the certificate is going to expire in less than 7 days, then below audit log ALERT is created depending on the type of certificate - Host ${VdsName} certification is about to expire at ${ExpirationDate}. Please renew the host's certification. - Engine's certification is about to expire at ${ExpirationDate}. Please renew the engine's certification. - Engine's CA certification is about to expire at ${ExpirationDate}. 3. If the certificate is going to expire in less than 30 days, then below audit log WARNING is created depending on the type of certificate - Host ${VdsName} certification is about to expire at ${ExpirationDate}. Please renew the host's certification. - Engine's certification is about to expire at ${ExpirationDate}. Please renew the engine's certification. - Engine's CA certification is about to expire at ${ExpirationDate}. So from my point of view there is enough warnings about upcoming certificate expiration, so now let's take a look at certificate renewal: 1. There's Enroll Certificate action available from both UI and RESTAPI, which renews host certificates which are going to expire 2. During each Host upgrade action (again available from both UI and RESTAPI and also part of cluster_upgrade Ansible role) we are checking if certificate is going to expire soon or is already expired and if so, certificates are renewed during host upgrade 3. We are checking engine certificate validity during each engine-setup execution and if the certificate is going to expire soon and is expired we are renewing that certificate. The same check along with renewal is happening also for engine CA certificate For certificate renewal the host needs to be in Maintenance, because loading the renewed certificate requires restart of services (VDSM, libvirt, ...), so it cannot be performed when the host is Up. Now regarding UI: after logging to webamin you are going to be redirected to Dashboard and on the right you can see a box called Events. Here you have a 3 links: Alerts, Errors and all Events. So it very easy to click on Alerts, which will redirect you to Events where only Alerts are shown. Regarding email notification, you can easily set notifier to send for example Alert events via SMTP to defined email address: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/administration_guide/chap-event_notifications So from my personal point of view I believe that we have enough options for administrator to be alerted that hypervisors certificates are going to expire soon.
RHV does a great job checking for certificates. But RHV needs to improve on notifying about them. First up is email notifications - many major customers prohibit rogue SMTP servers, and so email notification cannot happen with these customers. But even if it could, we don't document that 13 month certificate lifetime anywhere, and so why would anyone set up any notification for it? And that leads to events and warnings. With every login, there are a number of events and warnings, and the plain truth is, many admins don't check them. Until one day, 13 months after the last upgrade, when the entire RHV environment dies for no apparent reason, and we consume a support team for hours or days while a whole customer company shuts down. Try telling the CEO of a major organization how an outage like this is their fault because their admin didn't recognize that a pesky security warning they've seen for the past few weeks, buried with lots of other pesky warnings, shut them down. This should be easy to address. Ideally, the whole certificate renewal process should be automated. But if automated renewals are not feasible, then at least, instead of blaming the victim, why not warn them? When, say, half a certificate's lifetime is gone, take new admin portal logins to a screen with a warning that says the certificates will expire on [date], this will break your RHV environment, and here is what you must do to avoid all that bad stuff. Click here to continue onto the admin portal. One simple warning like that could save lots of grief.
let's renew certificates automatically that are bound to expire 4 months in advance. let's alert a month before expiration that should be doable by 4.5 nack on any other changes, too late for that
See https://access.redhat.com/solutions/6865861
Engine/Host/CA certs are giving warnings less than 120 days before expiration in event log and alerts in period shorter than 30 days. Verified on Software Version:4.5.0.5-0.7.el8ev
note that bug 2079890 in ovirt-engine-4.5.0.8 is changing the warning (and renew) interval to 365 days
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4711