Bug 1584669
Summary: | engine hangs on host-deploy communication when multiple parameters are passed to [vars] section of vdsm.conf | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Martin Perina <mperina> | ||||||||||||
Component: | Host-Deploy | Assignee: | Ravi Nori <rnori> | ||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Pavel Stehlik <pstehlik> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | urgent | ||||||||||||||
Version: | --- | CC: | bugs, danken, mperina, rnori, sbonazzo, stirabos | ||||||||||||
Target Milestone: | ovirt-4.2.4 | Flags: | rule-engine:
ovirt-4.2+
rule-engine: blocker+ |
||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2018-06-05 16:15:53 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1577593, 1582527 | ||||||||||||||
Attachments: |
|
Description
Martin Perina
2018-05-31 12:29:28 UTC
Created attachment 1446261 [details]
hanged process thread dump
Created attachment 1446262 [details]
host-deploy log
Created attachment 1446263 [details]
host-deploy log
According to host-deploy logs: 2018-05-31 07:44:57,672-0400 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND 2018-05-31 08:04:54,449-0400 DEBUG otopi.context context._executeMethod:143 method exception Traceback (most recent call last): File "/tmp/ovirt-goPsgFOlPK/pythonlib/otopi/context.py", line 133, in _executeMethod method['method']() File "/tmp/ovirt-goPsgFOlPK/otopi-plugins/otopi/dialog/cli.py", line 137, in _customize "\nCustomization phase, use 'install' to proceed\nCOMMAND> " File "/tmp/ovirt-goPsgFOlPK/otopi-plugins/otopi/dialog/cli.py", line 92, in _runCommandPrompt prompt=True, File "/tmp/ovirt-goPsgFOlPK/otopi-plugins/otopi/dialog/machine.py", line 281, in queryString value = self._readline(hidden) File "/tmp/ovirt-goPsgFOlPK/pythonlib/otopi/dialog.py", line 248, in _readline raise IOError(_('End of file')) IOError: End of file 2018-05-31 08:04:54,455-0400 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Environment customization': End of file 2018-05-31 08:04:54,555-0400 DEBUG otopi.plugins.otopi.debug.debug_failure.debug_failure debug_failure._notification:100 tcp connections: id uid local foreign state pid exe 0: 0 0.0.0.0:22 0.0.0.0:0 LISTEN 1079 /usr/sbin/sshd 1: 0 192.168.201.3:53984 212.69.166.138:80 CLOSE_WAIT 1418 /usr/bin/python2.7 2: 0 192.168.201.3:22 192.168.201.1:36154 ESTABLISHED 1460 /usr/sbin/sshd 3: 0 192.168.201.3:57190 147.32.127.196:80 CLOSE_WAIT 1418 /usr/bin/python2.7 The host sent a "*%QEnd: CUSTOMIZATION_COMMAND" at 2018-05-31 07:44:57,672-0400 and the engine never moved to the next step and so at 2018-05-31 08:04:54,449-0400 the ssh connection got closed. host-deploy was still running. It's correctly working on bare metal: 2018-06-01 13:53:08,029+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,029+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,056+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE env-query -k VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,056+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,056+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Please specify value for 'VDSM_CONFIG/vars/ssl': 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Response is VALUE VDSM_CONFIG/vars/ssl=type:value or ABORT VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:VALUE VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: VDSM_CONFIG/vars/ssl 2018-06-01 13:53:08,057+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE VALUE VDSM_CONFIG/vars/ssl=str:true 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Customization phase, use 'install' to proceed 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### COMMAND> 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QHidden: FALSE 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,058+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE env-query -k VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Please specify value for 'VDSM_CONFIG/vars/ssl_excludes': 2018-06-01 13:53:08,059+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Response is VALUE VDSM_CONFIG/vars/ssl_excludes=type:value or ABORT VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:VALUE VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE VALUE VDSM_CONFIG/vars/ssl_excludes=str:OP_NO_TLSv1,OP_NO_TLSv1_1 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,060+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Customization phase, use 'install' to proceed 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### COMMAND> 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QHidden: FALSE 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 13:53:08,061+0200 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND Created attachment 1446705 [details]
engine.log in debug mode
Attaching engine.log in debug mode. The last interaction between the host and the engine is 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ### 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ### Customization phase, use 'install' to proceed 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ### COMMAND> 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: **%QHidden: FALSE 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] nextEvent: QueryString CUSTOMIZATION_COMMAND null 2018-06-01 09:59:31,540-04 DEBUG [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [7696e2a8] Installation of lago-upgrade-from-release-suite-master-host-0: Event QueryString CUSTOMIZATION_COMMAND null 2018-06-01 09:59:31,541-04 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [7696e2a8] Got: **%QEnd: CUSTOMIZATION_COMMAND then nothing more happens. And the SSL session dies by timeout 20 minutes later. 2018-06-01 10:19:28,751-04 DEBUG [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [7696e2a8] Exception: javax.naming.TimeLimitExceededException: SSH session timeout host 'root@lago-upgrade-from-release-suite-master-host-0' at org.ovirt.engine.core.uutils.ssh.SSHClient.executeCommand(SSHClient.java:477) [uutils.jar:] at org.ovirt.engine.core.uutils.ssh.SSHDialog.executeCommand(SSHDialog.java:316) [uutils.jar:] at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.execute(VdsDeployBase.java:557) [bll.jar:] On a successfully execution the engine correctly moves to the next value: 2018-06-01 17:40:17,200+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] env-query VDSM_CONFIG/vars/ssl_excludes=java.lang.String:OP_NO_TLSv1,OP_NO_TLSv1_1 2018-06-01 17:40:17,201+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QEnd: CUSTOMIZATION_COMMAND 2018-06-01 17:40:17,202+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QStart: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### Please specify value for 'VDSM_CONFIG/vars/ssl_excludes': 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### Response is VALUE VDSM_CONFIG/vars/ssl_excludes=type:value or ABORT VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ***Q:VALUE VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] nextEvent: QueryValue VDSM_CONFIG/vars/ssl_excludes=null abort=false 2018-06-01 17:40:17,203+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QEnd: VDSM_CONFIG/vars/ssl_excludes 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QStart: CUSTOMIZATION_COMMAND 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### Customization phase, use 'install' to proceed 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ### COMMAND> 2018-06-01 17:40:17,204+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: **%QHidden: FALSE 2018-06-01 17:40:17,205+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] Got: ***Q:STRING CUSTOMIZATION_COMMAND 2018-06-01 17:40:17,205+02 DEBUG [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [f9f6b461-33b5-4841-b42f-79e5542fed73] nextEvent: QueryString CUSTOMIZATION_COMMAND null but when it hangs it doesn't. I think it's something on engine say, maybe a kind of deadlock with the LockManager ??? Please note that also https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/2773/artifact/exported-artifacts/test_logs/upgrade-from-release-suite-master/post-002_bootstrap.py/lago-upgrade-from-release-suite-master-engine/_var_log/ovirt-engine/server.log got abruptly truncated on 2018-06-01 09:59:28,562-04 INFO [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[43bd9c58]-nio2-thread-1) Server version string: SSH-2.0-OpenSSH_7.4 2018-06-01 09:59:28,578-04 INFO [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[43bd9c58]-nio2-thread-2) Kex: server->client aes128-ctr hmac-sha2-256 none 2018-06-01 09:59:28,578-04 INFO [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[43bd9c58]-nio2-thread-2) Kex: client->server aes128-ctr hmac-sha2-256 none 2018-06-01 09:59:28,886-04 WARN [org.apache.sshd.client.session.ClientConnectionService] (sshd-SshClient[43bd9c58]-nio2-thread-2) Unknown global request: hostkeys-00 almost about 09:59:31 when we have the latest activity on the ssh connection. Created attachment 1446734 [details]
JMX dump
Attaching a JMX dump from an hanged engine (reproduced once also on bare metal!!!), Martin, anything relevant? Moving to engine side, since we understood that it hangs there during host-deploy interaction. (In reply to Simone Tiraboschi from comment #10) > Attaching a JMX dump from an hanged engine (reproduced once also on bare > metal!!!), Martin, anything relevant? Thanks a lot Simone! Ravi, could you please debug the issue on the engine side? updated the patch in comment #1 to fix the issue Closing as NOTABUG, because the issue was caused by improper usage of host-deploy interface. |