Description of problem: Since upgrading to the latest 4.4.6 rc release the engine cannot connect to the hosts anymore via SSH: 2021-02-22 14:41:30,337+01 INFO [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-1931) [ee97347f-91d5-4227-adba-9a2efb970d93] Opening SSH reboot session on host x.x.x.x 2021-02-22 14:41:30,405+01 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-1931) [ee97347f-91d5-4227-adba-9a2efb970d93] SSH reboot command failed on host 'x.x.x.x': SSH session closed during connection 'root.x.x' Stdout: Stderr: 2021-02-22 14:41:30,406+01 DEBUG [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-1931) [ee97347f-91d5-4227-adba-9a2efb970d93] Exception: java.io.IOException: SSH session closed during connection 'root.x.x' at org.ovirt.engine.core.uutils//org.ovirt.engine.core.uutils.ssh.SSHClient.connect(SSHClient.java:248) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeSshReboot(SshHostRebootCommand.java:61) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeCommand(SshHostRebootCommand.java:26) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1174) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1332) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2008) at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:140) at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:79) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1392) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:424) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.executeValidatedCommand(PrevalidatingMultipleActionsRunner.java:204) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.runCommands(PrevalidatingMultipleActionsRunner.java:176) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.lambda$invokeCommands$3(PrevalidatingMultipleActionsRunner.java:182) at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) at org.glassfish.javax.enterprise.concurrent//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227) Version-Release number of selected component (if applicable): ovirt-engine-4.4.5.6-1.el8.noarch How reproducible: Try to restart a host via the UI with SSH. This fails.
Can you please attach the full engine.log and audit.log?
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
2021-02-22 15:44:31,030+01 INFO [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] Running command: SshHostRebootCommand(WaitOnRebootSynchronous = false, RunSilent = false, VdsId = 52703c6b-31ab-4766-a4a5-2fff91bdeea1) internal: false. Entities affected : ID: 52703c6b-31ab-4766-a4a5-2fff91bdeea1 Type: VDSAction group MANIPULATE_HOST with role type ADMIN 2021-02-22 15:44:31,032+01 INFO [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] Opening SSH reboot session on host x.x.x.x 2021-02-22 15:44:31,107+01 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] SSH reboot command failed on host 'x.x.x.x': SSH session closed during connection 'root.x.x' Stdout: Stderr: 2021-02-22 15:44:31,107+01 DEBUG [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] Exception: java.io.IOException: SSH session closed during connection 'root.x.x' at org.ovirt.engine.core.uutils//org.ovirt.engine.core.uutils.ssh.SSHClient.connect(SSHClient.java:248) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeSshReboot(SshHostRebootCommand.java:61) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeCommand(SshHostRebootCommand.java:26) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1174) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1332) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2008) at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:140) at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:79) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1392) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:424) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.executeValidatedCommand(PrevalidatingMultipleActionsRunner.java:204) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.runCommands(PrevalidatingMultipleActionsRunner.java:176) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.lambda$invokeCommands$3(PrevalidatingMultipleActionsRunner.java:182) at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) at org.glassfish.javax.enterprise.concurrent//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227) 2021-02-22 15:44:31,113+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] EVENT_ID: SYSTEM_FAILED_SSH_HOST_RESTART(198), A restart using SSH initiated by the engine to Host ovirt-ovn001 has failed. type=CRYPTO_KEY_USER msg=audit(1614005072.668:467): pid=12228 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:65:9d:74:0e:4e:ae:af:d7:7b:e4:f1:07:64:d7:d2:3b:a1:1d:a5:d3:7a:31:c3:8c:c2:0a:55:bb:62:27:83:5e direction=? spid=12228 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root" type=CRYPTO_KEY_USER msg=audit(1614005072.669:468): pid=12228 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:d6:c7:82:fd:b1:73:dd:57:8a:d4:da:98:29:70:4a:35:e5:23:b5:b8:7c:b9:53:a9:a4:40:b7:5a:8c:a3:77 direction=? spid=12228 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root" type=CRYPTO_KEY_USER msg=audit(1614005072.669:469): pid=12228 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:b7:11:58:bc:0d:9f:e3:b2:53:68:5d:61:dd:1e:4c:83:ff:42:31:98:fd:76:65:2d:61:0f:b3:57:3d:93:94:83 direction=? spid=12228 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root" type=CRYPTO_SESSION msg=audit(1614005072.674:470): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-server cipher=aes128-ctr ksize=128 mac=hmac-sha2-256-etm pfs=ecdh-sha2-nistp521 spid=12228 suid=74 rport=56132 laddr=x.x.x.x lport=22 exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=? res=success'UID="root" AUID="unset" SUID="sshd" type=CRYPTO_SESSION msg=audit(1614005072.675:471): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-client cipher=aes128-ctr ksize=128 mac=hmac-sha2-256-etm pfs=ecdh-sha2-nistp521 spid=12228 suid=74 rport=56132 laddr=x.x.x.x lport=22 exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=? res=success'UID="root" AUID="unset" SUID="sshd" type=CRYPTO_KEY_USER msg=audit(1614005072.726:472): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=12228 suid=74 rport=56132 laddr=x.x.x.x lport=22 exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=? res=success'UID="root" AUID="unset" SUID="sshd" type=CRYPTO_KEY_USER msg=audit(1614005072.726:473): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:b7:11:58:bc:0d:9f:e3:b2:53:68:5d:61:dd:1e:4c:83:ff:42:31:98:fd:76:65:2d:61:0f:b3:57:3d:93:94:83 direction=? spid=12228 suid=74 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="sshd" type=CRYPTO_KEY_USER msg=audit(1614005072.726:474): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:65:9d:74:0e:4e:ae:af:d7:7b:e4:f1:07:64:d7:d2:3b:a1:1d:a5:d3:7a:31:c3:8c:c2:0a:55:bb:62:27:83:5e direction=? spid=12227 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root" type=CRYPTO_KEY_USER msg=audit(1614005072.726:475): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:d6:c7:82:fd:b1:73:dd:57:8a:d4:da:98:29:70:4a:35:e5:23:b5:b8:7c:b9:53:a9:a4:40:b7:5a:8c:a3:77 direction=? spid=12227 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root" type=CRYPTO_KEY_USER msg=audit(1614005072.726:476): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:b7:11:58:bc:0d:9f:e3:b2:53:68:5d:61:dd:1e:4c:83:ff:42:31:98:fd:76:65:2d:61:0f:b3:57:3d:93:94:83 direction=? spid=12227 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root" type=USER_LOGIN msg=audit(1614005072.726:477): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login acct="(unknown)" exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=ssh res=failed'UID="root" AUID="unset"
Please execute the following: 1. vim /etc/ssh/sshd_config 2. add the following line: LogLevel DEBUG3 then save the file 3. restart sshd: service sshd restart then restart the host again via UI, and upload (preferred as files, if possible) the output of /var/log/secure and /var/log/messages
Created attachment 1758770 [details] secure.log Attached a secure.log from the failed login. Nothing in messages, so not attached :)
Did also some more debugging now. When doing: engine=# update vds_static set sshkeyfingerprint = NULL where vds_name='ovirt-ovn001'; UPDATE 1 It works again! Now while the value in sshkeyfingerprint stays the same, after clearing the value, and doing a reboot via the CP. The value of 'ssh_public_key' gets inserted into the database. So it seems like when there is no 'ssh_public_key' value, the ssh connection fails. And this is the case for all the existing hypervisors!
Hi, Current (expected)behavior is that after upgrade it is possible to use ssh keys other than RSA for newly installed hosts. In this case, instead of fingerprints, the public keys are compared during verification process. Fingerprints are used only for 'older' hosts (installed before engine is upgraded) and it is supposed that RSA public keys were used on the hosts. Is you setup somehow different ie. have you installed manually non-rsa keys on hosts and placed already calculated fingerprints into vds_static?
Hi Artur, Did not do anything manual on my setup. All the hosts did have a sshkeyfingerprint set in the database etc. It just seems like the fallback with fingerprint verification is broken. If you just do: engine=# update vds_static set ssh_public_key = NULL where vds_name = 'ovirt-ovn001'; UPDATE 1 So you are in the situation like when just upgraded the CP. Then SSH connection is broken again.
*** Bug 1930560 has been marked as a duplicate of this bug. ***
*** Bug 1931869 has been marked as a duplicate of this bug. ***
Tested on: ovirt-engine-4.4.5.8 try to restart a host via the UI with SSH. Actual result: success without error
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
I still see this issue of SSH reboot failing, in my 3 node 'hyperconverged hosted-engine gluster-oVirt' setup which was upgraded from RHV 4.3.z to RHV 4.4.4, then updated to RHV 4.4.5.
(In reply to SATHEESARAN from comment #13) > I still see this issue of SSH reboot failing, in my 3 node 'hyperconverged > hosted-engine gluster-oVirt' setup which was upgraded from RHV 4.3.z to RHV > 4.4.4, > then updated to RHV 4.4.5. Could you please create a new bug for that, describe the flow where it's failing and attach relevant logs?
ad. comment #13: Resolved: An upgrade from 4.3 to 4.4 involved bare metal host's reprovisioning (new host's ssh keys were generated in the process) Later on ovirt-engine has been restore from db and vds_static.sshkeyfingeprint values has been overwritten with the old one therefore not matching the ones on the hosts. There are to options to mitigate this scenario: 1) before reprovisioning backup host's keys from /etc/ssh/ and restore them after an upgrade 2) after restoring ovirt-engine database for each host: 2.1) clear vds_static.sshkeyfingerprint (update vds_static set sshkeyfingerprint='' where vds_name = '<HOST_NAME>' ; ) 2.2) update manually vds_static.sshkeyfingerprint with the output of ssh-keygen -lf /etc/ssh/ssh_host_rsa_key.pub update vds_static set sshkeyfingerprint='<GENERATED_SSH_KEYFINGERPRINT>' where vds_name = '<HOST_NAME>' ;
(In reply to Martin Perina from comment #14) > (In reply to SATHEESARAN from comment #13) > > I still see this issue of SSH reboot failing, in my 3 node 'hyperconverged > > hosted-engine gluster-oVirt' setup which was upgraded from RHV 4.3.z to RHV > > 4.4.4, > > then updated to RHV 4.4.5. > > Could you please create a new bug for that, describe the flow where it's > failing and attach relevant logs? As Artur mentioned in comment15, the issue is resolved