Bug 1559029 - nodectl motd breaks host-deploy
Summary: nodectl motd breaks host-deploy
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: rhev-hypervisor-ng
Version: 4.1.9
Hardware: All
OS: Unspecified
high
high
Target Milestone: ovirt-4.2.4
: ---
Assignee: Ryan Barry
QA Contact: Yihui Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-21 14:34 UTC by Igor Netkachev
Modified: 2023-09-15 00:07 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-28 13:42:30 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1571994 0 unspecified CLOSED ovirt-image-daemon fails to start due to permissions on ovirt-image-daemon log file causing host deployment to fail 2021-12-10 16:16:22 UTC
Red Hat Bugzilla 1574433 0 medium CLOSED "Invalid data received during bootstrap" errors do not log the invalid data 2021-02-22 00:41:40 UTC

Internal Links: 1571994 1574433

Description Igor Netkachev 2018-03-21 14:34:39 UTC
Description of problem:

After installing the HEVM and logging in to Manager, the HE host cannot be installed/activated with the following error:
	
Failed to install Host $(HOSTNAME). Invalid data received during bootstrap.

Version-Release number of selected component (if applicable):
RHV 4.1.9
rhevm-4.1.9.1-0.1.el7.noarch
vdsm-4.19.45-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Install HE host following https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/red_hat_virtualization_hosts
2. Install HEVM following https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/self-hosted_engine_guide/initiating_self-hosted_engine_deployment_on_rhvh
3. Login to manager GUI and try to install HE host

Actual results:
Failed to install Host $(HOSTNAME). Invalid data received during bootstrap.


Expected results:
Host installation finishes successfully


Additional info:
The following log sequence is visible in error.log:

2018-03-13 12:17:08,037+01 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (org.ovirt.thread.pool-7-thread-1) [ab095588-5de1-4907-a3b9-f83abca9a8b1] Installation of $(HOSTNAME). Executing command via SSH umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True < /var/cache/ovirt-engine/ovirt-host-deploy.tar
2018-03-13 12:17:08,037+01 INFO  [org.ovirt.engine.core.utils.archivers.tar.CachedTar] (org.ovirt.thread.pool-7-thread-1) [ab095588-5de1-4907-a3b9-f83abca9a8b1] Tarball '/var/cache/ovirt-engine/ovirt-host-deploy.tar' refresh
2018-03-13 12:17:08,686+01 INFO  [org.ovirt.engine.core.uutils.ssh.SSHDialog] (org.ovirt.thread.pool-7-thread-1) [ab095588-5de1-4907-a3b9-f83abca9a8b1] SSH execute 'root@$(HOSTNAME)' 'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True'
2018-03-13 12:17:09,168+01 ERROR [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [ab095588-5de1-4907-a3b9-f83abca9a8b1] Cannot parse input: java.lang.RuntimeException: Invalid data received during bootstrap
    at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:358) [otopi.jar:]
    at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:393) [otopi.jar:]
    at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.threadMain(VdsDeployBase.java:307) [bll.jar:]
    at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.lambda$new$0(VdsDeployBase.java:395) [bll.jar:]
    at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_161]

Comment 3 Sandro Bonazzola 2018-04-06 08:35:43 UTC
Hi, according to the logs, adding host to engine failed on:

2018-03-13 11:52:18 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2018-03-13 11:52:19 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2018-03-13 11:52:21 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in install_failed state
2018-03-13 11:52:21 ERROR otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:97 The VDSM host was found in a failed state. Please check engine and bootstrap installation logs.
2018-03-13 11:52:21 ERROR otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:666 Unable to add rhev-h1.testumgebung.local to the manager

However, the error you are reporting has a different timestamp:
2018-03-13 12:17:09,168+01 ERROR [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [ab095588-5de1-4907-a3b9-f83abca9a8b1] Cannot parse input: java.lang.RuntimeException: Invalid data received during boots
trap
        at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:358) [otopi.jar:]
        at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:393) [otopi.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.threadMain(VdsDeployBase.java:307) [bll.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.lambda$new$0(VdsDeployBase.java:395) [bll.jar:]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_161]


The relevant one for hosted engine deployment is:
2018-03-13 11:52:18,577+01 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (org.ovirt.thread.pool-7-thread-1) [d8adaca] Installation of rhev-h1.testumgebung.local. Executing command via SSH umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True < /var/cache/ovirt-engine/ovirt-host-deploy.tar
2018-03-13 11:52:18,577+01 INFO  [org.ovirt.engine.core.utils.archivers.tar.CachedTar] (org.ovirt.thread.pool-7-thread-1) [d8adaca] Tarball '/var/cache/ovirt-engine/ovirt-host-deploy.tar' refresh
2018-03-13 11:52:19,056+01 INFO  [org.ovirt.engine.core.uutils.ssh.SSHDialog] (org.ovirt.thread.pool-7-thread-1) [d8adaca] SSH execute 'root.local' 'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True'
2018-03-13 11:52:19,401+01 ERROR [org.ovirt.otopi.dialog.MachineDialogParser] (VdsDeploy) [d8adaca] Cannot parse input: java.lang.RuntimeException: Invalid data received during bootstrap
        at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:358) [otopi.jar:]
        at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:393) [otopi.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.threadMain(VdsDeployBase.java:307) [bll.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.lambda$new$0(VdsDeployBase.java:395) [bll.jar:]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_161]

2018-03-13 11:52:19,407+01 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [d8adaca] Error during deploy dialog


but for both the errors, due to the execption occurred we are missing the ovirt-host-deploy log.

Since it seems to be reproducible on your environment, can you please reproduce it and attach the ovirt-host-deploy log you'll find on /tmp/ on the host?

Comment 5 Igor Netkachev 2018-04-12 13:34:35 UTC
Hello Sandro,

One more update from the customer:

~~~
I have tried today to take another Hardware, unfortunately I got the same error. This way I believe we can exclude a cause on the hardware side.
~~~


--
regards,
Igor Netkachev
Technical Support Engineer, RHCE, RHCVA
Red Hat Global Support Services

Comment 15 Simone Tiraboschi 2018-04-27 12:49:03 UTC
nodectl motd prints:
'''

  node status: OK
  See `nodectl check` for more information

Admin Console: https://10.12.12.100:9090/ or https://10.11.11.160:9090/
'''

So the first line got by the engine is just an empty line which is invalid in machine dialog syntax.

And so otopi java code throws a "RuntimeException("Invalid data received during bootstrap")".
and unfortunately the breaking like is logged just at DEBUG level and so it's normally discarded, see:
https://github.com/oVirt/otopi/blob/otopi-1.6/src/java/MachineDialogParser.java#L343

As a workaround, comment the content of:
 /etc/profile.d/nodectl-run-banner.sh
 /etc/profile.d/nodectl-motd.sh

Before deploying the rhv-h host.


After that the deployment could still fail due to:
https://bugzilla.redhat.com/show_bug.cgi?id=1571994
and this also currently needs a workaround:
 chown vdsm:kvm /var/log/ovirt-imageio-daemon/daemon.log

With that workarounds the deployment should work.

Comment 16 Ryan Barry 2018-04-27 18:02:43 UTC
I'm not able to reproduce this with any of the versions from the report, and `nodectl motd` has been present since the introduction of 4.1

Were any customizations done in the customer environment?

SSH jump hosts from RHVM? Changes to /etc/motd or /etc/issue on RHVH?

Comment 18 Yedidyah Bar David 2018-05-03 10:33:34 UTC
Filed bug 1574433 for better logging.

Comment 19 Sandro Bonazzola 2018-05-28 13:42:30 UTC
We added better logging of issues similar to this one in bug #1574433 in order to gather more info on this kind of issues. We didn't succeed in reproducing the issue in our labs and fixed the issue on the customer's system.
I'm closing this bug with insufficient data resolution since we miss a reproduction of the issue. Please reopen if you can provide more information on how to reproduce the issue.

Comment 20 Robert Scheck 2018-05-28 13:51:48 UTC
Could you please provide information which RHV versions exactly contain the
fixes so that the customer can try to reproduce and figure out if the issue
still exists?

Comment 21 Yedidyah Bar David 2018-05-30 07:23:16 UTC
(In reply to Robert Scheck from comment #20)
> Could you please provide information which RHV versions exactly contain the
> fixes so that the customer can try to reproduce and figure out if the issue
> still exists?

See bug 1574433 - will be in 4.2.4.

It will not fix the problem, as we do not know what that is. It might help debug it further.

Comment 22 Franta Kust 2019-05-16 13:06:05 UTC
BZ<2>Jira Resync

Comment 25 Red Hat Bugzilla 2023-09-15 00:07:03 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.