Description of problem: If we add additional options to [vars] section of /etc/vdsm/vdsm.conf (example can be found at [1]), then the engine (which was upgraded from 4.1 to 4.2/master), cannot add new host, because host-deploy process hangs when it receives last option to be added to [vars] section. [1] https://gerrit.ovirt.org/#/c/91809/1/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/hostdeploy/VdsDeployVdsmUnit.java Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Run upgrade-from-release-suite-master with patch [1] 2. 3. Actual results: Expected results: Additional info:
Created attachment 1446261 [details] hanged process thread dump
Created attachment 1446262 [details] host-deploy log
Created attachment 1446263 [details] host-deploy log
According to host-deploy logs: 2018-05-31 07:44:57,672-0400 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND 2018-05-31 08:04:54,449-0400 DEBUG otopi.context context._executeMethod:143 method exception Traceback (most recent call last): File "/tmp/ovirt-goPsgFOlPK/pythonlib/otopi/context.py", line 133, in _executeMethod method['method']() File "/tmp/ovirt-goPsgFOlPK/otopi-plugins/otopi/dialog/cli.py", line 137, in _customize "\nCustomization phase, use 'install' to proceed\nCOMMAND> " File "/tmp/ovirt-goPsgFOlPK/otopi-plugins/otopi/dialog/cli.py", line 92, in _runCommandPrompt prompt=True, File "/tmp/ovirt-goPsgFOlPK/otopi-plugins/otopi/dialog/machine.py", line 281, in queryString value = self._readline(hidden) File "/tmp/ovirt-goPsgFOlPK/pythonlib/otopi/dialog.py", line 248, in _readline raise IOError(_('End of file')) IOError: End of file 2018-05-31 08:04:54,455-0400 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Environment customization': End of file 2018-05-31 08:04:54,555-0400 DEBUG otopi.plugins.otopi.debug.debug_failure.debug_failure debug_failure._notification:100 tcp connections: id uid local foreign state pid exe 0: 0 0.0.0.0:22 0.0.0.0:0 LISTEN 1079 /usr/sbin/sshd 1: 0 192.168.201.3:53984 212.69.166.138:80 CLOSE_WAIT 1418 /usr/bin/python2.7 2: 0 192.168.201.3:22 192.168.201.1:36154 ESTABLISHED 1460 /usr/sbin/sshd 3: 0 192.168.201.3:57190 147.32.127.196:80 CLOSE_WAIT 1418 /usr/bin/python2.7 The host sent a "*%QEnd: CUSTOMIZATION_COMMAND" at 2018-05-31 07:44:57,672-0400 and the engine never moved to the next step and so at 2018-05-31 08:04:54,449-0400 the ssh connection got closed. host-deploy was still running.
It's correctly working on bare metal: 2018-06-01 13:53:08,029+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,029+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,056+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE env-query -k VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,056+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,056+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Please specify value for 'VDSM_CONFIG/vars/ssl': 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Response is VALUE VDSM_CONFIG/vars/ssl=type:value or ABORT VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:VALUE VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE VALUE VDSM_CONFIG/vars/ssl=str:true 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Customization phase, use 'install' to proceed 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### COMMAND> 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QHidden: FALSE 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE env-query -k VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Please specify value for 'VDSM_CONFIG/vars/ssl_excludes': 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Response is VALUE VDSM_CONFIG/vars/ssl_excludes=type:value or ABORT VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:VALUE VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE VALUE VDSM_CONFIG/vars/ssl_excludes=str:OP_NO_TLSv1,OP_NO_TLSv1_1 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Customization phase, use 'install' to proceed 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### COMMAND> 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QHidden: FALSE 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND
Created attachment 1446705 [details] engine.log in debug mode
Attaching engine.log in debug mode. The last interaction between the host and the engine is 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ### 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ### Customization phase, use 'install' to proceed 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ### COMMAND> 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: **%QHidden: FALSE 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] nextEvent: QueryString CUSTOMIZATION_COMMAND null 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [7696e2a8] Installation of lago-upgrade-from-release-suite-master-host-0: Event QueryString CUSTOMIZATION_COMMAND null 2018-06-01 09:59:31,541-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: **%QEnd: CUSTOMIZATION_COMMAND then nothing more happens. And the SSL session dies by timeout 20 minutes later. 2018-06-01 10:19:28,751-04 DEBUG [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [7696e2a8] Exception: javax.naming.TimeLimitExceededException: SSH session timeout host 'root@lago-upgrade-from-release-suite-master-host-0' at org.ovirt.engine.core.uutils.ssh.SSHClient.executeCommand(SSHClient.java:477) [uutils.jar:] at org.ovirt.engine.core.uutils.ssh.SSHDialog.executeCommand(SSHDialog.java:316) [uutils.jar:] at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.execute(VdsDeployBase.java:557) [bll.jar:]
On a successfully execution the engine correctly moves to the next value: 2018-06-01 17:40:17,200+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] env-query VDSM_CONFIG/vars/ssl_excludes=java.lang.String:OP_NO_TLSv1,OP_NO_TLSv1_1 2018-06-01 17:40:17,201+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QEnd: CUSTOMIZATION_COMMAND 2018-06-01 17:40:17,202+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QStart: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### Please specify value for 'VDSM_CONFIG/vars/ssl_excludes': 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### Response is VALUE VDSM_CONFIG/vars/ssl_excludes=type:value or ABORT VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ***Q:VALUE VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] nextEvent: QueryValue VDSM_CONFIG/vars/ssl_excludes=null abort=false 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QEnd: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### Customization phase, use 'install' to proceed 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### COMMAND> 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QHidden: FALSE 2018-06-01 17:40:17,205+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 17:40:17,205+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] nextEvent: QueryString CUSTOMIZATION_COMMAND null but when it hangs it doesn't. I think it's something on engine say, maybe a kind of deadlock with the LockManager ??? Please note that also https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/2773/artifact/exported-artifacts/test_logs/upgrade-from-release-suite-master/post-002_bootstrap.py/lago-upgrade-from-release-suite-master-engine/_var_log/ovirt-engine/server.log got abruptly truncated on 2018-06-01 09:59:28,562-04 INFO [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[43bd9c58]-nio2-thread-1) Server version string: SSH-2.0-OpenSSH_7.4 2018-06-01 09:59:28,578-04 INFO [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[43bd9c58]-nio2-thread-2) Kex: server->client aes128-ctr hmac-sha2-256 none 2018-06-01 09:59:28,578-04 INFO [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[43bd9c58]-nio2-thread-2) Kex: client->server aes128-ctr hmac-sha2-256 none 2018-06-01 09:59:28,886-04 WARN [org.apache.sshd.client.session.ClientConnectionService] (sshd-SshClient[43bd9c58]-nio2-thread-2) Unknown global request: hostkeys-00 almost about 09:59:31 when we have the latest activity on the ssh connection.
Created attachment 1446734 [details] JMX dump
Attaching a JMX dump from an hanged engine (reproduced once also on bare metal!!!), Martin, anything relevant?
Moving to engine side, since we understood that it hangs there during host-deploy interaction.
(In reply to Simone Tiraboschi from comment #10) > Attaching a JMX dump from an hanged engine (reproduced once also on bare > metal!!!), Martin, anything relevant? Thanks a lot Simone! Ravi, could you please debug the issue on the engine side?
updated the patch in comment #1 to fix the issue
Closing as NOTABUG, because the issue was caused by improper usage of host-deploy interface.