Bug 1931474 - SSH connection fails
Summary: SSH connection fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: uutils
Version: 4.4.5.6
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: ovirt-4.4.5
: ---
Assignee: Artur Socha
QA Contact: Yaning Wang
URL:
Whiteboard:
: 1930560 1931869 (view as bug list)
Depends On:
Blocks: 1837221 1930115
TreeView+ depends on / blocked
 
Reported: 2021-02-22 13:54 UTC by Jean-Louis Dupond
Modified: 2021-05-03 23:56 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.4.5.8
Clone Of:
Environment:
Last Closed: 2021-03-18 15:15:58 UTC
oVirt Team: Infra
Embargoed:
pm-rhel: ovirt-4.4+
aoconnor: blocker+


Attachments (Terms of Use)
secure.log (11.33 KB, text/plain)
2021-02-23 07:24 UTC, Jean-Louis Dupond
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 113666 0 master MERGED engine: legacy RSA fingerprints authentication 2021-03-02 09:09:33 UTC

Description Jean-Louis Dupond 2021-02-22 13:54:52 UTC
Description of problem:
Since upgrading to the latest 4.4.6 rc release the engine cannot connect to the hosts anymore via SSH:

2021-02-22 14:41:30,337+01 INFO  [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-1931) [ee97347f-91d5-4227-adba-9a2efb970d93] Opening SSH reboot session on host x.x.x.x
2021-02-22 14:41:30,405+01 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-1931) [ee97347f-91d5-4227-adba-9a2efb970d93] SSH reboot command failed on host 'x.x.x.x': SSH session closed during connection 'root.x.x'
Stdout: 
Stderr: 
2021-02-22 14:41:30,406+01 DEBUG [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-1931) [ee97347f-91d5-4227-adba-9a2efb970d93] Exception: java.io.IOException: SSH session closed during connection 'root.x.x'
	at org.ovirt.engine.core.uutils//org.ovirt.engine.core.uutils.ssh.SSHClient.connect(SSHClient.java:248)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeSshReboot(SshHostRebootCommand.java:61)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeCommand(SshHostRebootCommand.java:26)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1174)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1332)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2008)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:140)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:79)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1392)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:424)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.executeValidatedCommand(PrevalidatingMultipleActionsRunner.java:204)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.runCommands(PrevalidatingMultipleActionsRunner.java:176)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.lambda$invokeCommands$3(PrevalidatingMultipleActionsRunner.java:182)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
	at org.glassfish.javax.enterprise.concurrent//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227)


Version-Release number of selected component (if applicable):
ovirt-engine-4.4.5.6-1.el8.noarch

How reproducible:
Try to restart a host via the UI with SSH. This fails.

Comment 1 Dana 2021-02-22 14:42:45 UTC
Can you please attach the full engine.log and audit.log?

Comment 2 RHEL Program Management 2021-02-22 14:42:48 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 3 Jean-Louis Dupond 2021-02-22 14:47:39 UTC
2021-02-22 15:44:31,030+01 INFO  [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] Running command: SshHostRebootCommand(WaitOnRebootSynchronous = false, RunSilent = false, VdsId = 52703c6b-31ab-4766-a4a5-2fff91bdeea1) internal: false. Entities affected :  ID: 52703c6b-31ab-4766-a4a5-2fff91bdeea1 Type: VDSAction group MANIPULATE_HOST with role type ADMIN
2021-02-22 15:44:31,032+01 INFO  [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] Opening SSH reboot session on host x.x.x.x

2021-02-22 15:44:31,107+01 ERROR [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] SSH reboot command failed on host 'x.x.x.x': SSH session closed during connection 'root.x.x'
Stdout: 
Stderr: 
2021-02-22 15:44:31,107+01 DEBUG [org.ovirt.engine.core.bll.SshHostRebootCommand] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] Exception: java.io.IOException: SSH session closed during connection 'root.x.x'
	at org.ovirt.engine.core.uutils//org.ovirt.engine.core.uutils.ssh.SSHClient.connect(SSHClient.java:248)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeSshReboot(SshHostRebootCommand.java:61)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SshHostRebootCommand.executeCommand(SshHostRebootCommand.java:26)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1174)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1332)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2008)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:140)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:79)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1392)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:424)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.executeValidatedCommand(PrevalidatingMultipleActionsRunner.java:204)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.runCommands(PrevalidatingMultipleActionsRunner.java:176)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner.lambda$invokeCommands$3(PrevalidatingMultipleActionsRunner.java:182)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:96)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
	at org.glassfish.javax.enterprise.concurrent//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:227)

2021-02-22 15:44:31,113+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-2941) [09b415de-191c-4a15-815d-1f8de0911635] EVENT_ID: SYSTEM_FAILED_SSH_HOST_RESTART(198), A restart using SSH initiated by the engine to Host ovirt-ovn001 has failed.



type=CRYPTO_KEY_USER msg=audit(1614005072.668:467): pid=12228 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:65:9d:74:0e:4e:ae:af:d7:7b:e4:f1:07:64:d7:d2:3b:a1:1d:a5:d3:7a:31:c3:8c:c2:0a:55:bb:62:27:83:5e direction=? spid=12228 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1614005072.669:468): pid=12228 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:d6:c7:82:fd:b1:73:dd:57:8a:d4:da:98:29:70:4a:35:e5:23:b5:b8:7c:b9:53:a9:a4:40:b7:5a:8c:a3:77 direction=? spid=12228 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1614005072.669:469): pid=12228 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:b7:11:58:bc:0d:9f:e3:b2:53:68:5d:61:dd:1e:4c:83:ff:42:31:98:fd:76:65:2d:61:0f:b3:57:3d:93:94:83 direction=? spid=12228 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_SESSION msg=audit(1614005072.674:470): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-server cipher=aes128-ctr ksize=128 mac=hmac-sha2-256-etm pfs=ecdh-sha2-nistp521 spid=12228 suid=74 rport=56132 laddr=x.x.x.x lport=22  exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=CRYPTO_SESSION msg=audit(1614005072.675:471): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=start direction=from-client cipher=aes128-ctr ksize=128 mac=hmac-sha2-256-etm pfs=ecdh-sha2-nistp521 spid=12228 suid=74 rport=56132 laddr=x.x.x.x lport=22  exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:472): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=12228 suid=74 rport=56132 laddr=x.x.x.x lport=22  exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:473): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:b7:11:58:bc:0d:9f:e3:b2:53:68:5d:61:dd:1e:4c:83:ff:42:31:98:fd:76:65:2d:61:0f:b3:57:3d:93:94:83 direction=? spid=12228 suid=74  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="sshd"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:474): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:65:9d:74:0e:4e:ae:af:d7:7b:e4:f1:07:64:d7:d2:3b:a1:1d:a5:d3:7a:31:c3:8c:c2:0a:55:bb:62:27:83:5e direction=? spid=12227 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:475): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:32:d6:c7:82:fd:b1:73:dd:57:8a:d4:da:98:29:70:4a:35:e5:23:b5:b8:7c:b9:53:a9:a4:40:b7:5a:8c:a3:77 direction=? spid=12227 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=CRYPTO_KEY_USER msg=audit(1614005072.726:476): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:b7:11:58:bc:0d:9f:e3:b2:53:68:5d:61:dd:1e:4c:83:ff:42:31:98:fd:76:65:2d:61:0f:b3:57:3d:93:94:83 direction=? spid=12227 suid=0  exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" SUID="root"
type=USER_LOGIN msg=audit(1614005072.726:477): pid=12227 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login acct="(unknown)" exe="/usr/sbin/sshd" hostname=? addr=x.x.x.x terminal=ssh res=failed'UID="root" AUID="unset"

Comment 4 Dana 2021-02-22 15:32:55 UTC
Please execute the following:
1. vim /etc/ssh/sshd_config
2. add the following line: LogLevel DEBUG3
then save the file
3. restart sshd: service sshd restart
then restart the host again via UI, and upload (preferred as files, if possible) the output of /var/log/secure and /var/log/messages

Comment 5 Jean-Louis Dupond 2021-02-23 07:24:20 UTC
Created attachment 1758770 [details]
secure.log

Attached a secure.log from the failed login.
Nothing in messages, so not attached :)

Comment 6 Jean-Louis Dupond 2021-02-23 08:01:09 UTC
Did also some more debugging now.

When doing:
engine=# update vds_static set sshkeyfingerprint = NULL where vds_name='ovirt-ovn001';
UPDATE 1

It works again!

Now while the value in sshkeyfingerprint stays the same, after clearing the value, and doing a reboot via the CP. The value of 'ssh_public_key' gets inserted into the database.
So it seems like when there is no 'ssh_public_key' value, the ssh connection fails. And this is the case for all the existing hypervisors!

Comment 7 Artur Socha 2021-02-23 08:25:22 UTC
Hi, 
Current (expected)behavior is that after upgrade it is possible to use ssh keys other than RSA for newly installed hosts. In this case, instead of fingerprints, the public keys are compared during verification process.

Fingerprints are used only for 'older' hosts (installed before engine is upgraded) and it is supposed that RSA public keys were used on the hosts.

Is you setup somehow different ie. have you installed manually non-rsa keys on hosts and placed already calculated fingerprints into vds_static?

Comment 8 Jean-Louis Dupond 2021-02-23 08:30:49 UTC
Hi Artur,

Did not do anything manual on my setup.
All the hosts did have a sshkeyfingerprint set in the database etc.

It just seems like the fallback with fingerprint verification is broken.

If you just do:
engine=# update vds_static set ssh_public_key = NULL where vds_name = 'ovirt-ovn001';
UPDATE 1

So you are in the situation like when just upgraded the CP. Then SSH connection is broken again.

Comment 9 SATHEESARAN 2021-03-01 11:43:19 UTC
*** Bug 1930560 has been marked as a duplicate of this bug. ***

Comment 10 Artur Socha 2021-03-04 10:03:51 UTC
*** Bug 1931869 has been marked as a duplicate of this bug. ***

Comment 11 Yaning Wang 2021-03-08 04:29:32 UTC
Tested on:

ovirt-engine-4.4.5.8

try to restart a host via the UI with SSH.

Actual result:
success without error

Comment 12 Sandro Bonazzola 2021-03-18 15:15:58 UTC
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 13 SATHEESARAN 2021-03-22 07:57:34 UTC
I still see this issue of SSH reboot failing, in my 3 node 'hyperconverged hosted-engine gluster-oVirt' setup which was upgraded from RHV 4.3.z to RHV 4.4.4,
then updated to RHV 4.4.5.

Comment 14 Martin Perina 2021-03-22 08:22:25 UTC
(In reply to SATHEESARAN from comment #13)
> I still see this issue of SSH reboot failing, in my 3 node 'hyperconverged
> hosted-engine gluster-oVirt' setup which was upgraded from RHV 4.3.z to RHV
> 4.4.4,
> then updated to RHV 4.4.5.

Could you please create a new bug for that, describe the flow where it's failing and attach relevant logs?

Comment 15 Artur Socha 2021-03-22 09:55:08 UTC
ad. comment #13:

Resolved: 
An upgrade from 4.3 to 4.4 involved bare metal host's reprovisioning (new host's ssh keys were generated in the process)
Later on ovirt-engine has been restore from db and vds_static.sshkeyfingeprint values has been overwritten with the old one therefore not matching the ones on the hosts.

There are to options to mitigate this scenario:
1) before reprovisioning backup host's keys from /etc/ssh/ and restore them after an upgrade
2) after restoring ovirt-engine database for each host:
 2.1) clear vds_static.sshkeyfingerprint  (update vds_static set sshkeyfingerprint='' where vds_name = '<HOST_NAME>' ; ) 
 2.2) update manually vds_static.sshkeyfingerprint with the output of
      ssh-keygen -lf /etc/ssh/ssh_host_rsa_key.pub
      update vds_static set sshkeyfingerprint='<GENERATED_SSH_KEYFINGERPRINT>' where vds_name = '<HOST_NAME>' ;

Comment 16 SATHEESARAN 2021-05-03 23:56:04 UTC
(In reply to Martin Perina from comment #14)
> (In reply to SATHEESARAN from comment #13)
> > I still see this issue of SSH reboot failing, in my 3 node 'hyperconverged
> > hosted-engine gluster-oVirt' setup which was upgraded from RHV 4.3.z to RHV
> > 4.4.4,
> > then updated to RHV 4.4.5.
> 
> Could you please create a new bug for that, describe the flow where it's
> failing and attach relevant logs?

As Artur mentioned in comment15, the issue is resolved


Note You need to log in before you can comment on or make changes to this bug.