Bug 1584669
| Summary: | engine hangs on host-deploy communication when multiple parameters are passed to [vars] section of vdsm.conf | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Martin Perina <mperina> | ||||||||||||
| Component: | Host-Deploy | Assignee: | Ravi Nori <rnori> | ||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Pavel Stehlik <pstehlik> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | urgent | ||||||||||||||
| Version: | --- | CC: | bugs, danken, mperina, rnori, sbonazzo, stirabos | ||||||||||||
| Target Milestone: | ovirt-4.2.4 | Flags: | rule-engine:
ovirt-4.2+
rule-engine: blocker+ |
||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2018-06-05 16:15:53 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Bug Depends On: | |||||||||||||||
| Bug Blocks: | 1577593, 1582527 | ||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Martin Perina
2018-05-31 12:29:28 UTC
Created attachment 1446261 [details]
hanged process thread dump
Created attachment 1446262 [details]
host-deploy log
Created attachment 1446263 [details]
host-deploy log
According to host-deploy logs:
2018-05-31 07:44:57,672-0400 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND
2018-05-31 08:04:54,449-0400 DEBUG otopi.context context._executeMethod:143 method exception
Traceback (most recent call last):
File "/tmp/ovirt-goPsgFOlPK/pythonlib/otopi/context.py", line 133, in _executeMethod
method['method']()
File "/tmp/ovirt-goPsgFOlPK/otopi-plugins/otopi/dialog/cli.py", line 137, in _customize
"\nCustomization phase, use 'install' to proceed\nCOMMAND> "
File "/tmp/ovirt-goPsgFOlPK/otopi-plugins/otopi/dialog/cli.py", line 92, in _runCommandPrompt
prompt=True,
File "/tmp/ovirt-goPsgFOlPK/otopi-plugins/otopi/dialog/machine.py", line 281, in queryString
value = self._readline(hidden)
File "/tmp/ovirt-goPsgFOlPK/pythonlib/otopi/dialog.py", line 248, in _readline
raise IOError(_('End of file'))
IOError: End of file
2018-05-31 08:04:54,455-0400 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Environment customization': End of file
2018-05-31 08:04:54,555-0400 DEBUG otopi.plugins.otopi.debug.debug_failure.debug_failure debug_failure._notification:100 tcp connections:
id uid local foreign state pid exe
0: 0 0.0.0.0:22 0.0.0.0:0 LISTEN 1079 /usr/sbin/sshd
1: 0 192.168.201.3:53984 212.69.166.138:80 CLOSE_WAIT 1418 /usr/bin/python2.7
2: 0 192.168.201.3:22 192.168.201.1:36154 ESTABLISHED 1460 /usr/sbin/sshd
3: 0 192.168.201.3:57190 147.32.127.196:80 CLOSE_WAIT 1418 /usr/bin/python2.7
The host sent a "*%QEnd: CUSTOMIZATION_COMMAND" at 2018-05-31 07:44:57,672-0400 and the engine never moved to the next step and so at 2018-05-31 08:04:54,449-0400 the ssh connection got closed.
host-deploy was still running.
It's correctly working on bare metal: 2018-06-01 13:53:08,029+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,029+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,056+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE env-query -k VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,056+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,056+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Please specify value for 'VDSM_CONFIG/vars/ssl': 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Response is VALUE VDSM_CONFIG/vars/ssl=type:value or ABORT VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:VALUE VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE VALUE VDSM_CONFIG/vars/ssl=str:true 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Customization phase, use 'install' to proceed 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### COMMAND> 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QHidden: FALSE 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE env-query -k VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Please specify value for 'VDSM_CONFIG/vars/ssl_excludes': 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Response is VALUE VDSM_CONFIG/vars/ssl_excludes=type:value or ABORT VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:VALUE VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE VALUE VDSM_CONFIG/vars/ssl_excludes=str:OP_NO_TLSv1,OP_NO_TLSv1_1 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Customization phase, use 'install' to proceed 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### COMMAND> 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QHidden: FALSE 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND Created attachment 1446705 [details]
engine.log in debug mode
Attaching engine.log in debug mode. The last interaction between the host and the engine is 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ### 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ### Customization phase, use 'install' to proceed 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ### COMMAND> 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: **%QHidden: FALSE 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] nextEvent: QueryString CUSTOMIZATION_COMMAND null 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [7696e2a8] Installation of lago-upgrade-from-release-suite-master-host-0: Event QueryString CUSTOMIZATION_COMMAND null 2018-06-01 09:59:31,541-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: **%QEnd: CUSTOMIZATION_COMMAND then nothing more happens. And the SSL session dies by timeout 20 minutes later. 2018-06-01 10:19:28,751-04 DEBUG [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [7696e2a8] Exception: javax.naming.TimeLimitExceededException: SSH session timeout host 'root@lago-upgrade-from-release-suite-master-host-0' at org.ovirt.engine.core.uutils.ssh.SSHClient.executeCommand(SSHClient.java:477) [uutils.jar:] at org.ovirt.engine.core.uutils.ssh.SSHDialog.executeCommand(SSHDialog.java:316) [uutils.jar:] at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.execute(VdsDeployBase.java:557) [bll.jar:] On a successfully execution the engine correctly moves to the next value: 2018-06-01 17:40:17,200+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] env-query VDSM_CONFIG/vars/ssl_excludes=java.lang.String:OP_NO_TLSv1,OP_NO_TLSv1_1 2018-06-01 17:40:17,201+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QEnd: CUSTOMIZATION_COMMAND 2018-06-01 17:40:17,202+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QStart: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### Please specify value for 'VDSM_CONFIG/vars/ssl_excludes': 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### Response is VALUE VDSM_CONFIG/vars/ssl_excludes=type:value or ABORT VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ***Q:VALUE VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] nextEvent: QueryValue VDSM_CONFIG/vars/ssl_excludes=null abort=false 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QEnd: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### Customization phase, use 'install' to proceed 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### COMMAND> 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QHidden: FALSE 2018-06-01 17:40:17,205+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 17:40:17,205+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] nextEvent: QueryString CUSTOMIZATION_COMMAND null but when it hangs it doesn't. I think it's something on engine say, maybe a kind of deadlock with the LockManager ??? Please note that also https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/2773/artifact/exported-artifacts/test_logs/upgrade-from-release-suite-master/post-002_bootstrap.py/lago-upgrade-from-release-suite-master-engine/_var_log/ovirt-engine/server.log got abruptly truncated on 2018-06-01 09:59:28,562-04 INFO [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[43bd9c58]-nio2-thread-1) Server version string: SSH-2.0-OpenSSH_7.4 2018-06-01 09:59:28,578-04 INFO [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[43bd9c58]-nio2-thread-2) Kex: server->client aes128-ctr hmac-sha2-256 none 2018-06-01 09:59:28,578-04 INFO [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[43bd9c58]-nio2-thread-2) Kex: client->server aes128-ctr hmac-sha2-256 none 2018-06-01 09:59:28,886-04 WARN [org.apache.sshd.client.session.ClientConnectionService] (sshd-SshClient[43bd9c58]-nio2-thread-2) Unknown global request: hostkeys-00 almost about 09:59:31 when we have the latest activity on the ssh connection. Created attachment 1446734 [details]
JMX dump
Attaching a JMX dump from an hanged engine (reproduced once also on bare metal!!!), Martin, anything relevant? Moving to engine side, since we understood that it hangs there during host-deploy interaction. (In reply to Simone Tiraboschi from comment #10) > Attaching a JMX dump from an hanged engine (reproduced once also on bare > metal!!!), Martin, anything relevant? Thanks a lot Simone! Ravi, could you please debug the issue on the engine side? updated the patch in comment #1 to fix the issue Closing as NOTABUG, because the issue was caused by improper usage of host-deploy interface. |