When certificates expire there is no way how to re-enroll without shutting down all the running VMs. The Enroll is possible only in Maintenance state and the only way how to get to Maintenance state from Non Responsive with running VMs is through "confirm host has been rebooted" - i.e. either reboot it or knowingly override the actual state triggering VM restart of HA VMs elsewhere and potentially causing split brain. We cannot fully re-enroll all certificates on a running system, but we can install new certificates and restart subset of services that allow VMs to be migrated away and move the host to proper Maintenance. Let's - allow the Enroll Certificate operation when host is in Non Responsive state - run the same enrollment code as in Maintenance - restart/reload libvirt (which in turn restarts vdsm), imageio, and OVN It introduces a small risk for losing track of ongoing actions (e.g. live merge completion) but we should mostly deal with those gracefully. Not all operations would work (e.g. VNC certificate cannot be reloaded so console connections will not be possible), but being able to control the host and lifecycle of VMs has more priority. Still, it shall be documented that this is a "desperate measure" for cases where certificates suddenly expire and there's no other way out. Such action must be followed by another re-enroll during Maintenance, and such running VMs *must* be restarted before they are fully functional again.
verified in ovirt-engine-4.5.0.7-0.9.el8ev.noarch Verification steps Host was put into a NonResponsive state after killing vdsmd service. (with and without running VMs on top of the hosts) Triggered enrolling certificate on the host which successfully completed