Bug 1618984
| Summary: | Host deploy from fc28 engine on fc28 host fails, ssh connection terminated | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Gal Zaidman <gzaidman> | ||||
| Component: | General | Assignee: | Yuval Turgeman <yturgema> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Lukas Svaty <lsvaty> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.3.0 | CC: | bugs, mperina | ||||
| Target Milestone: | ovirt-4.3.0 | Keywords: | Reopened | ||||
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.3+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | ovirt-engine-4.3.0_rc | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-01-23 10:54:38 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1460625 | ||||||
| Attachments: |
|
||||||
Created attachment 1476864 [details]
engine/host/strace log files
a few description fixes:
1. the bug could be either on the engine side or on the host side, so relevant engine log:
2018-08-19 10:13:39,422+03 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler6) [] Rescheduling DEFAULT.org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshLightWeightDat
a#-9223372036854775801 as there is no unfired trigger.
2018-08-19 10:13:39,559+03 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [193f8c23] Error during deploy dialog
2018-08-19 10:13:39,559+03 DEBUG [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [193f8c23] Exception: java.io.IOException: Unexpected connection termination
at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:390) [otopi.jar:]
at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:407) [otopi.jar:]
at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.threadMain(VdsDeployBase.java:302) [bll.jar:]
at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_162]
2018-08-19 10:13:39,561+03 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-8) [193f8c23] SSH error running command root.17.42:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True': TimeLimitExceededException: SSH session timeout host 'root.17.42'
2018-08-19 10:13:39,561+03 DEBUG [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-8) [193f8c23] Exception: javax.naming.TimeLimitExceededException: SSH session timeout host 'root.17.42'
2018-08-19 10:13:39,569+03 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-8) [193f8c23] Timeout during host 10.35.17.42 install: SSH session timeout host 'root.17.42'
2018-08-19 10:13:39,569+03 DEBUG [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-8) [193f8c23] Exception: javax.naming.TimeLimitExceededException: SSH session timeout host 'root.17.42'
2018-08-19 10:13:39,591+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-8) [193f8c23] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has occurred during installation of Host temp: Processing stopped due to timeout.
2018-08-19 10:13:39,591+03 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-8) [193f8c23] Host installation failed for host 'be1302ef-fc7e-4851-8baf-1dfa2e16ab5d', 'temp': SSH session timeout host 'root.17.42'
2018-08-19 10:13:39,591+03 DEBUG [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-8) [193f8c23] Exception: javax.naming.TimeLimitExceededException: SSH session timeout host 'root.17.42'
2018-08-19 10:13:39,609+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-8) [193f8c23] EVENT_ID: VDS_INSTALL_FAILED(505), Host temp installation failed. SSH session timeout host 'root.17.42'.
2. in "Steps to Reproduce" on number 3 we need to install python3-ovirt-host-deploy
fixed in: https://gerrit.ovirt.org/#/c/94106/ the fix on: https://gerrit.ovirt.org/#/c/94106/ was reverted, therefore we need a different patch, discussion: https://lists.ovirt.org/archives/list/infra@ovirt.org/thread/YFXG2TVBXC4ZNTAYYBIUOFXNO33IGIYU/#QMRM2INTCRDPT7GPF24EEPNJAZRP4CUQ |
Description of problem: Adding fc28 host to fc28 engine fails with: Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib64/python3.6/logging/__init__.py", line 976, in flush self.stream.flush() BrokenPipeError: [Errno 32] Broken pipe During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib64/python3.6/logging/__init__.py", line 1943, in shutdown h.flush() File "/usr/lib64/python3.6/logging/__init__.py", line 976, in flush self.stream.flush() File "/tmp/ovirt-iPUTtKf1PF/pythonlib/otopi/main.py", line 53, in _signal raise RuntimeError("SIG%s" % signum) RuntimeError: SIG13 when we add a host engine starts host deploy process and sends a tar with ssh to the host when I tested that error I tried running strace on the host and trace the connection and found that the tar command never finished until after about 10-20 minutes the installation fails on unexpected connection termination, I am not sure why that happens but one idea is that on Fedora we use apache-sshd 0.14 and on Centos, we bundle it ourselves with version 0.12. I tried replacing the package in Fedora with the centos 0.12 version but the error remains the same Steps to Reproduce: 1.Installed engine on fc28, with python2 (python2-otopi) and default settings. 2.remove line: HostKey /etc/ssh/ssh_host_ecdsa_key, from /etc/ssh/sshd_config from engine and host (both fc28). workaround for bug: https://bugzilla.redhat.com/show_bug.cgi?id=1591801 3.fix broken links in host-deploy pythonlib, in: /usr/share/ovirt-host-deploy/interface-3/pythonlib/ links are broken because there is no python3-otopi/ovirt_host_mgmt/ovirt_host_deploy installed, dnf install python3-otopi, python-ovirt-host-deploy on engine side 4.log into engine and click Compute -> Hosts -> New and add a fc28 host Actual results: Installing task runs for about 10-20 minutes then fails. Expected results: Host add and host deploy installation finished. Additional info: adding engine log, host deploy log and strace file in strace you can search for 10:13:39 to get to the line in which the read is stoped/resumed