Created attachment 926039 [details] logs from engine and host Description of problem: Added a rhel7 host to my setup. The installation failed due to an error in vdsm: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/client.log' The error wasn't caught properly by engine, as seen in engine.log: 2014-08-12 12:05:26,862 ERROR [org.ovirt.engine.core.utils.ssh.SSHDialog] (org.ovirt.thread.pool-4-thread-49) SSH error running command root.102.11:'umask 0077; MYTMP="$(mktemp -t ovirt-XXXXXXXXXX)"; trap "c hmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; rm -fr "${MYTMP}" && mkdir "${MYTMP}" && tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/setup DIALOG/dialect=str:ma chine DIALOG/customization=bool:True': java.io.IOException: Command returned failure code 1 during SSH session 'root.102.11' at org.ovirt.engine.core.utils.ssh.SSHClient.executeCommand(SSHClient.java:527) [utils.jar:] at org.ovirt.engine.core.utils.ssh.SSHDialog.executeCommand(SSHDialog.java:318) [utils.jar:] Version-Release number of selected component (if applicable): rhev-3.4.2-av11 engine: rhevm-3.4.2-0.1.el6ev.noarch host: Red Hat Enterprise Linux Server release 7.0 (Maipo) vdsm-4.14.13-1.el7ev.x86_64 How reproducible: while https://bugzilla.redhat.com/show_bug.cgi?id=1129232 is reproduced Steps to Reproduce: 1. Create a DC and cluster in rhevm 2. Attach a rhel7 host to the setup Actual results: vdsm fails with: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/client.log' and the error isn't treated properly by engine. As far as I understand, this is a log issue, didn't see any other undesirable behavior. Host becomes non-operational and the following message is shown in webadmin: Failed to install Host green-b. Failed to execute stage 'Closing up': Command '/bin/systemctl' failed to execute. Expected results: The error from vdsm should be treated and reported nicely in the logs Additional info: logs from engine and host
The bug was opened on 3.4.1, but it occurs in 3.4.2. There is no such option.
I'm not sure the error you mention related to the same exception in vdsm. but anyhow, vdsm stopped to response after this exception iiuc, so the ssh communication might dropped and it reports about installation failure or that service could not start properly. what else do we except to see if vdsm doesn't response or cannot start? I think the current behavior and report are fine in such cases from engine's prospective
As I see it, the issue here is the ugly message in the log
but at the end you've got - 2014-08-12 12:05:26,870 ERROR [org.ovirt.engine.core.bll.InstallVdsCommand] (org.ovirt.thread.pool-4-thread-49) [7db12cab] Host installation failed for host db0b69b9-5b0e-4e88-9e18-934e24580492, gre en-b.: java.io.IOException: Command returned failure code 1 during SSH session 'root.102.11' at org.ovirt.engine.core.utils.ssh.SSHClient.executeCommand(SSHClient.java:527) [utils.jar:] which is the exact problem.. it leads you to check vdsm.log or the host-deploy log and understand what went wrong I don't see how else we can handle that. oved, what do you think?
I agree. The message here seems good to me, under these circumstances. We are explaining exactly what happened. Closing it as wontfix, although I think it is notabug at all, but can't argue with the niceness of error messages...