Bug 1559029
| Summary: | nodectl motd breaks host-deploy | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Igor Netkachev <inetkach> |
| Component: | rhev-hypervisor-ng | Assignee: | Ryan Barry <rbarry> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Yihui Zhao <yzhao> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.9 | CC: | bjarolim, cshao, dfediuck, didi, huzhao, inetkach, jiaczhan, lsurette, qiyuan, rbalakri, redhat-bugzilla, Rhev-m-bugs, sbonazzo, srevivo, stirabos, weiwang, yaniwang, ycui, ykaul, ylavi, yzhao |
| Target Milestone: | ovirt-4.2.4 | Flags: | lsvaty:
testing_plan_complete-
|
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-05-28 13:42:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Igor Netkachev
2018-03-21 14:34:39 UTC
Hi, according to the logs, adding host to engine failed on:
2018-03-13 11:52:18 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2018-03-13 11:52:19 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2018-03-13 11:52:21 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in install_failed state
2018-03-13 11:52:21 ERROR otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:97 The VDSM host was found in a failed state. Please check engine and bootstrap installation logs.
2018-03-13 11:52:21 ERROR otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:666 Unable to add rhev-h1.testumgebung.local to the manager
However, the error you are reporting has a different timestamp:
2018-03-13 12:17:09,168+01 ERROR [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [ab095588-5de1-4907-a3b9-f83abca9a8b1] Cannot parse input: java.lang.RuntimeException: Invalid data received during boots
trap
at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:358) [otopi.jar:]
at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:393) [otopi.jar:]
at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.threadMain(VdsDeployBase.java:307) [bll.jar:]
at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.lambda$new$0(VdsDeployBase.java:395) [bll.jar:]
at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_161]
The relevant one for hosted engine deployment is:
2018-03-13 11:52:18,577+01 INFO [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (org.ovirt.thread.pool-7-thread-1) [d8adaca] Installation of rhev-h1.testumgebung.local. Executing command via SSH umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True < /var/cache/ovirt-engine/ovirt-host-deploy.tar
2018-03-13 11:52:18,577+01 INFO [org.ovirt.engine.core.utils.archivers.tar.CachedTar] (org.ovirt.thread.pool-7-thread-1) [d8adaca] Tarball '/var/cache/ovirt-engine/ovirt-host-deploy.tar' refresh
2018-03-13 11:52:19,056+01 INFO [org.ovirt.engine.core.uutils.ssh.SSHDialog] (org.ovirt.thread.pool-7-thread-1) [d8adaca] SSH execute 'root.local' 'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True'
2018-03-13 11:52:19,401+01 ERROR [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [d8adaca] Cannot parse input: java.lang.RuntimeException: Invalid data received during bootstrap
at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:358) [otopi.jar:]
at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:393) [otopi.jar:]
at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.threadMain(VdsDeployBase.java:307) [bll.jar:]
at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.lambda$new$0(VdsDeployBase.java:395) [bll.jar:]
at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_161]
2018-03-13 11:52:19,407+01 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [d8adaca] Error during deploy dialog
but for both the errors, due to the execption occurred we are missing the ovirt-host-deploy log.
Since it seems to be reproducible on your environment, can you please reproduce it and attach the ovirt-host-deploy log you'll find on /tmp/ on the host?
Hello Sandro, One more update from the customer: ~~~ I have tried today to take another Hardware, unfortunately I got the same error. This way I believe we can exclude a cause on the hardware side. ~~~ -- regards, Igor Netkachev Technical Support Engineer, RHCE, RHCVA Red Hat Global Support Services nodectl motd prints: ''' node status: OK See `nodectl check` for more information Admin Console: https://10.12.12.100:9090/ or https://10.11.11.160:9090/ ''' So the first line got by the engine is just an empty line which is invalid in machine dialog syntax. And so otopi java code throws a "RuntimeException("Invalid data received during bootstrap")". and unfortunately the breaking like is logged just at DEBUG level and so it's normally discarded, see: https://github.com/oVirt/otopi/blob/otopi-1.6/src/java/MachineDialogParser.java#L343 As a workaround, comment the content of: /etc/profile.d/nodectl-run-banner.sh /etc/profile.d/nodectl-motd.sh Before deploying the rhv-h host. After that the deployment could still fail due to: https://bugzilla.redhat.com/show_bug.cgi?id=1571994 and this also currently needs a workaround: chown vdsm:kvm /var/log/ovirt-imageio-daemon/daemon.log With that workarounds the deployment should work. I'm not able to reproduce this with any of the versions from the report, and `nodectl motd` has been present since the introduction of 4.1 Were any customizations done in the customer environment? SSH jump hosts from RHVM? Changes to /etc/motd or /etc/issue on RHVH? Filed bug 1574433 for better logging. We added better logging of issues similar to this one in bug #1574433 in order to gather more info on this kind of issues. We didn't succeed in reproducing the issue in our labs and fixed the issue on the customer's system. I'm closing this bug with insufficient data resolution since we miss a reproduction of the issue. Please reopen if you can provide more information on how to reproduce the issue. Could you please provide information which RHV versions exactly contain the fixes so that the customer can try to reproduce and figure out if the issue still exists? (In reply to Robert Scheck from comment #20) > Could you please provide information which RHV versions exactly contain the > fixes so that the customer can try to reproduce and figure out if the issue > still exists? See bug 1574433 - will be in 4.2.4. It will not fix the problem, as we do not know what that is. It might help debug it further. BZ<2>Jira Resync The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |