Bug 1931474

Summary: SSH connection fails
Product: [oVirt] ovirt-engine Reporter: Jean-Louis Dupond <jean-louis>
Component: uutilsAssignee: Artur Socha <asocha>
Status: CLOSED CURRENTRELEASE QA Contact: Yaning Wang <yaniwang>
Severity: high Docs Contact:
Priority: high    
Version: 4.4.5.6CC: aoconnor, asocha, bugs, delfassy, mmartinv, mperina, sasundar
Target Milestone: ovirt-4.4.5Flags: pm-rhel: ovirt-4.4+
aoconnor: blocker+
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.5.8 Doc Type: Bug Fix
Doc Text:
Cause: Adding support for SSH keys other than RSA [BZ-1837221] introduced a bug not allowing to reboot 'legacy' hosts using SSH fingerprints. The fix applies when there are legacy hosts with fingerprints calculated out of RSA key. Unfortunately with the restrictions to RSA only family already being lifted the host's SSH server is always serving the strongest available key type ie. ecdsa. This makes impossible to authenticate using fingerprints which had been previously calculated out of RSA. Consequence: Unable to reboot hosts using SSH with only RSA fingerprints being stored Fix: For the hosts registered with legacy RSA keys use fingerprints for initial verification and store public key in engine's database so that following calls' verification would be conducted comparing SSH public keys only. For newly installed hosts use whatever the key is given by SSH server (the strongest) Result: It is possible to reboot hosts with SSH using both legacy RSA fingerprints and newly stored ssh public keys (in any available type).
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-18 15:15:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1837221, 1930115    
Attachments:
Description Flags
secure.log none

Description Jean-Louis Dupond 2021-02-22 13:54:52 UTC
Description of problem:
Since upgrading to the latest 4.4.6 rc release the engine cannot connect to the hosts anymore via SSH:

2021-02-22 14:41:30,337+01 INFO  [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-1931) [ee97347f-91d5-4227-adba-9a2efb970d93] Opening SSH reboot session on host x.x.x.x
2021-02-22 14:41:30,405+01 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-1931) [ee97347f-91d5-4227-adba-9a2efb970d93] SSH reboot command failed on host 'x.x.x.x': SSH session closed during connection 'root.x.x'
Stdout: 
Stderr: 
2021-02-22 14:41:30,406+01 DEBUG [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-1931) [ee97347f-91d5-4227-adba-9a2efb970d93] Exception: java.io.IOException: SSH session closed during connection 'root.x.x'
	at org.ovirt.engine.core.uutils//org.ovirt.engine.core.uutils.ssh.SSHClient.connect(SSHClient.java:248)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeSshReboot(SshHostRebootCommand.java:61)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeCommand(SshHostRebootCommand.java:26)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1174)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1332)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2008)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:140)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:79)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1392)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:424)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.executeValidatedCommand(PrevalidatingMultipleActionsRunner.java:204)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.runCommands(PrevalidatingMultipleActionsRunner.java:176)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.lambda$invokeCommands$3(PrevalidatingMultipleActionsRunner.java:182)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
	at org.glassfish.javax.enterprise.concurrent//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227)


Version-Release number of selected component (if applicable):
ovirt-engine-4.4.5.6-1.el8.noarch

How reproducible:
Try to restart a host via the UI with SSH. This fails.

Comment 1 Dana 2021-02-22 14:42:45 UTC
Can you please attach the full engine.log and audit.log?

Comment 2 RHEL Program Management 2021-02-22 14:42:48 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 3 Jean-Louis Dupond 2021-02-22 14:47:39 UTC
2021-02-22 15:44:31,030+01 INFO  [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] Running command: SshHostRebootCommand(WaitOnRebootSynchronous = false, RunSilent = false, VdsId = 52703c6b-31ab-4766-a4a5-2fff91bdeea1) internal: false. Entities affected :  ID: 52703c6b-31ab-4766-a4a5-2fff91bdeea1 Type: VDSAction group MANIPULATE_HOST with role type ADMIN
2021-02-22 15:44:31,032+01 INFO  [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] Opening SSH reboot session on host x.x.x.x

2021-02-22 15:44:31,107+01 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] SSH reboot command failed on host 'x.x.x.x': SSH session closed during connection 'root.x.x'
Stdout: 
Stderr: 
2021-02-22 15:44:31,107+01 DEBUG [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] Exception: java.io.IOException: SSH session closed during connection 'root.x.x'
	at org.ovirt.engine.core.uutils//org.ovirt.engine.core.uutils.ssh.SSHClient.connect(SSHClient.java:248)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeSshReboot(SshHostRebootCommand.java:61)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeCommand(SshHostRebootCommand.java:26)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1174)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1332)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2008)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:140)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:79)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1392)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:424)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.executeValidatedCommand(PrevalidatingMultipleActionsRunner.java:204)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.runCommands(PrevalidatingMultipleActionsRunner.java:176)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.lambda$invokeCommands$3(PrevalidatingMultipleActionsRunner.java:182)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
	at org.glassfish.javax.enterprise.concurrent//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227)

2021-02-22 15:44:31,113+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] EVENT_ID: SYSTEM_FAILED_SSH_HOST_RESTART(198), A restart using SSH initiated by the engine to Host ovirt-ovn001 has failed.



type=CRYPTO_KEY_USER msg=audit(1614005072.668:467): pid=12228 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:65:9d:74:0e:4e:ae:af:d7:7b:e4:f1:07:64:d7:d2:3b:a1:1d:a5:d3:7a:31:c3:8c:c2:0a:55:bb:62:27:83:5e direction=? spid=12228 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1614005072.669:468): pid=12228 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:d6:c7:82:fd:b1:73:dd:57:8a:d4:da:98:29:70:4a:35:e5:23:b5:b8:7c:b9:53:a9:a4:40:b7:5a:8c:a3:77 direction=? spid=12228 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1614005072.669:469): pid=12228 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:b7:11:58:bc:0d:9f:e3:b2:53:68:5d:61:dd:1e:4c:83:ff:42:31:98:fd:76:65:2d:61:0f:b3:57:3d:93:94:83 direction=? spid=12228 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_SESSION msg=audit(1614005072.674:470): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-server cipher=aes128-ctr ksize=128 mac=hmac-sha2-256-etm pfs=ecdh-sha2-nistp521 spid=12228 suid=74 rport=56132 laddr=x.x.x.x lport=22  exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=CRYPTO_SESSION msg=audit(1614005072.675:471): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-client cipher=aes128-ctr ksize=128 mac=hmac-sha2-256-etm pfs=ecdh-sha2-nistp521 spid=12228 suid=74 rport=56132 laddr=x.x.x.x lport=22  exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:472): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=12228 suid=74 rport=56132 laddr=x.x.x.x lport=22  exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:473): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:b7:11:58:bc:0d:9f:e3:b2:53:68:5d:61:dd:1e:4c:83:ff:42:31:98:fd:76:65:2d:61:0f:b3:57:3d:93:94:83 direction=? spid=12228 suid=74  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:474): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:65:9d:74:0e:4e:ae:af:d7:7b:e4:f1:07:64:d7:d2:3b:a1:1d:a5:d3:7a:31:c3:8c:c2:0a:55:bb:62:27:83:5e direction=? spid=12227 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:475): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:d6:c7:82:fd:b1:73:dd:57:8a:d4:da:98:29:70:4a:35:e5:23:b5:b8:7c:b9:53:a9:a4:40:b7:5a:8c:a3:77 direction=? spid=12227 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:476): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:b7:11:58:bc:0d:9f:e3:b2:53:68:5d:61:dd:1e:4c:83:ff:42:31:98:fd:76:65:2d:61:0f:b3:57:3d:93:94:83 direction=? spid=12227 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=USER_LOGIN msg=audit(1614005072.726:477): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login acct="(unknown)" exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=ssh res=failed'UID="root" AUID="unset"

Comment 4 Dana 2021-02-22 15:32:55 UTC
Please execute the following:
1. vim /etc/ssh/sshd_config
2. add the following line: LogLevel DEBUG3
then save the file
3. restart sshd: service sshd restart
then restart the host again via UI, and upload (preferred as files, if possible) the output of /var/log/secure and /var/log/messages

Comment 5 Jean-Louis Dupond 2021-02-23 07:24:20 UTC
Created attachment 1758770 [details]
secure.log

Attached a secure.log from the failed login.
Nothing in messages, so not attached :)

Comment 6 Jean-Louis Dupond 2021-02-23 08:01:09 UTC
Did also some more debugging now.

When doing:
engine=# update vds_static set sshkeyfingerprint = NULL where vds_name='ovirt-ovn001';
UPDATE 1

It works again!

Now while the value in sshkeyfingerprint stays the same, after clearing the value, and doing a reboot via the CP. The value of 'ssh_public_key' gets inserted into the database.
So it seems like when there is no 'ssh_public_key' value, the ssh connection fails. And this is the case for all the existing hypervisors!

Comment 7 Artur Socha 2021-02-23 08:25:22 UTC
Hi, 
Current (expected)behavior is that after upgrade it is possible to use ssh keys other than RSA for newly installed hosts. In this case, instead of fingerprints, the public keys are compared during verification process.

Fingerprints are used only for 'older' hosts (installed before engine is upgraded) and it is supposed that RSA public keys were used on the hosts.

Is you setup somehow different ie. have you installed manually non-rsa keys on hosts and placed already calculated fingerprints into vds_static?

Comment 8 Jean-Louis Dupond 2021-02-23 08:30:49 UTC
Hi Artur,

Did not do anything manual on my setup.
All the hosts did have a sshkeyfingerprint set in the database etc.

It just seems like the fallback with fingerprint verification is broken.

If you just do:
engine=# update vds_static set ssh_public_key = NULL where vds_name = 'ovirt-ovn001';
UPDATE 1

So you are in the situation like when just upgraded the CP. Then SSH connection is broken again.

Comment 9 SATHEESARAN 2021-03-01 11:43:19 UTC
*** Bug 1930560 has been marked as a duplicate of this bug. ***

Comment 10 Artur Socha 2021-03-04 10:03:51 UTC
*** Bug 1931869 has been marked as a duplicate of this bug. ***

Comment 11 Yaning Wang 2021-03-08 04:29:32 UTC
Tested on:

ovirt-engine-4.4.5.8

try to restart a host via the UI with SSH.

Actual result:
success without error

Comment 12 Sandro Bonazzola 2021-03-18 15:15:58 UTC
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 13 SATHEESARAN 2021-03-22 07:57:34 UTC
I still see this issue of SSH reboot failing, in my 3 node 'hyperconverged hosted-engine gluster-oVirt' setup which was upgraded from RHV 4.3.z to RHV 4.4.4,
then updated to RHV 4.4.5.

Comment 14 Martin Perina 2021-03-22 08:22:25 UTC
(In reply to SATHEESARAN from comment #13)
> I still see this issue of SSH reboot failing, in my 3 node 'hyperconverged
> hosted-engine gluster-oVirt' setup which was upgraded from RHV 4.3.z to RHV
> 4.4.4,
> then updated to RHV 4.4.5.

Could you please create a new bug for that, describe the flow where it's failing and attach relevant logs?

Comment 15 Artur Socha 2021-03-22 09:55:08 UTC
ad. comment #13:

Resolved: 
An upgrade from 4.3 to 4.4 involved bare metal host's reprovisioning (new host's ssh keys were generated in the process)
Later on ovirt-engine has been restore from db and vds_static.sshkeyfingeprint values has been overwritten with the old one therefore not matching the ones on the hosts.

There are to options to mitigate this scenario:
1) before reprovisioning backup host's keys from /etc/ssh/ and restore them after an upgrade
2) after restoring ovirt-engine database for each host:
 2.1) clear vds_static.sshkeyfingerprint  (update vds_static set sshkeyfingerprint='' where vds_name = '<HOST_NAME>' ; ) 
 2.2) update manually vds_static.sshkeyfingerprint with the output of
      ssh-keygen -lf /etc/ssh/ssh_host_rsa_key.pub
      update vds_static set sshkeyfingerprint='<GENERATED_SSH_KEYFINGERPRINT>' where vds_name = '<HOST_NAME>' ;

Comment 16 SATHEESARAN 2021-05-03 23:56:04 UTC
(In reply to Martin Perina from comment #14)
> (In reply to SATHEESARAN from comment #13)
> > I still see this issue of SSH reboot failing, in my 3 node 'hyperconverged
> > hosted-engine gluster-oVirt' setup which was upgraded from RHV 4.3.z to RHV
> > 4.4.4,
> > then updated to RHV 4.4.5.
> 
> Could you please create a new bug for that, describe the flow where it's
> failing and attach relevant logs?

As Artur mentioned in comment15, the issue is resolved