Created attachment 1068715 [details] engine.log, vdsm.log Description of problem: [root@dell-r210ii-13 ~]# openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem -enddate -noout ; date notAfter=Aug 29 16:36:25 2020 GMT Tue Aug 31 19:21:48 CEST 2021 then i tried to activate the host from maintenance (engine had same OS time). i expected an msg about expired host cert but got: 2021-Aug-31, 19:22 VDSM dell-r210ii-13.rhev.lab.eng.brq.redhat.com command failed: General SSLEngine problem and... 2021-08-31 19:23:17,264 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-83) [] Failure to refresh Vds runtime info: VDSGenericException: VDSNetworkException: General SSLEngine problem 2021-08-31 19:23:17,264 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-83) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: General SSLEngine problem at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:16) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) [vdsbroker.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:634) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:119) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:226) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor111.invoke(Unknown Source) [:1.8.0_51] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_51] at java.lang.reflect.Method.invoke(Method.java:497) [rt.jar:1.8.0_51] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:] 2021-08-31 19:23:20,276 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to dell-r210ii-13.rhev.lab.eng.brq.redhat.com/10.34.62.205 2021-08-31 19:23:20,286 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages 2021-08-31 19:23:20,289 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-42) [] Command 'GetAllVmStatsVDSCommand(HostName = dell-r210ii-13.rhev.lab.eng.brq.redhat.com, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='eb400a81-e668-42ff-9752-59bef822f253', vds='Host[dell-r210ii-13.rhev.lab.eng.brq.redhat.com,eb400a81-e668-42ff-9752-59bef822f253]'})' execution failed: VDSGenericException: VDSNetworkException: General SSLEngine problem 2021-08-31 19:23:20,289 INFO [org.ovirt.engine.core.vdsbroker.PollVmStatsRefresher] (DefaultQuartzScheduler_Worker-42) [] Failed to fetch vms info for host 'dell-r210ii-13.rhev.lab.eng.brq.redhat.com' - skipping VMs monitoring. Reactor thread::ERROR::2021-08-31 19:19:13,869::sslutils::336::ProtocolDetector.SSLHandshakeDispatcher::(handle_read) Error during handshake: sslv3 alert certificate expired Reactor thread::INFO::2021-08-31 19:19:24,956::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 10.34.60.185:57685 Reactor thread::ERROR::2021-08-31 19:19:25,259::sslutils::336::ProtocolDetector.SSLHandshakeDispatcher::(handle_read) Error during handshake: unexpected eof Version-Release number of selected component (if applicable): vdsm-4.17.3-1.el7ev.noarch rhevm-backend-3.6.0-0.12.master.el6.noarch How reproducible: 100% Steps to Reproduce: 1. have expired host cert while host is in maintenance and activate it 2. 3. Actual results: unexpected event msg Expected results: host ... certification has expired at $date. Please renew the host's certification. Additional info:
Discussing it with Moti, this is a low priority issue, as it will only happen if the certificate is expired but the host isn't up. Re-installing/enrolling the certificate will help in such case, and you'll probably have warnings from certificate expiration from other hosts in the system. Targeting to 4.0, as this will require API changes to get knowledge of the exact reason.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Jiri - I don't think this should block the feature. Anyhow, this won't be addressed in 3.6.
We are doing periodical checks of certificates validity for all hosts in statuses Up or NonOperational. Unfortunately we cannot do any monitoring of hosts in status Maintenance, because we have a "contract" with user, that hosts in status Maintenance won't be monitored by engine at all, so users are able to do maintenance without any interruptions from engine. So when host is in status Maintenance for a long time, its certificate has expired during that time and user activates the host, we are trying to open SSL connection to the host to start communication, but SSL exception is raised upon connection opening. Unfortunately at this moment we have no way how to distinguish between expired certificate or other SSL errors and without established SSL connections we are not able to check certificate validity. We could implement solution to check certificate validity using SSH connection prior to host activation, but this would add unwanted complexity to the whole activation process. Also having host in maintenance for long time is a bit corner case and even when certificate expired and host became Non Responsive, user can still change status to back to Maintenance and reinstall the host to fix certificate issue. So closing this as wontfix.