Bug 1129238

Summary: [engine-backend] bad handling with OSError
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.4.1-1CC: acathrow, bazulay, ebenahar, ecohen, gklein, iheim, lpeer, oourfali, pstehlik, Rhev-m-bugs, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-25 05:50:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs from engine and host none

Description Elad 2014-08-12 11:12:29 UTC
Created attachment 926039 [details]
logs from engine and host

Description of problem:
Added a rhel7 host to my setup. The installation failed due to an error in vdsm:
OSError: [Errno 2] No such file or directory: '/var/run/vdsm/client.log'

The error wasn't caught properly by engine, as seen in engine.log:

2014-08-12 12:05:26,862 ERROR [org.ovirt.engine.core.utils.ssh.SSHDialog] (org.ovirt.thread.pool-4-thread-49) SSH error running command root.102.11:'umask 0077; MYTMP="$(mktemp -t ovirt-XXXXXXXXXX)"; trap "c
hmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; rm -fr "${MYTMP}" && mkdir "${MYTMP}" && tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/setup DIALOG/dialect=str:ma
chine DIALOG/customization=bool:True': java.io.IOException: Command returned failure code 1 during SSH session 'root.102.11'
        at org.ovirt.engine.core.utils.ssh.SSHClient.executeCommand(SSHClient.java:527) [utils.jar:]
        at org.ovirt.engine.core.utils.ssh.SSHDialog.executeCommand(SSHDialog.java:318) [utils.jar:]


Version-Release number of selected component (if applicable):
rhev-3.4.2-av11

engine:
rhevm-3.4.2-0.1.el6ev.noarch

host:
Red Hat Enterprise Linux Server release 7.0 (Maipo)
vdsm-4.14.13-1.el7ev.x86_64


How reproducible:
while https://bugzilla.redhat.com/show_bug.cgi?id=1129232 is reproduced

Steps to Reproduce:
1. Create a DC and cluster in rhevm
2. Attach a rhel7 host to the setup

Actual results:
vdsm fails with:
OSError: [Errno 2] No such file or directory: '/var/run/vdsm/client.log'

and the error isn't treated properly by engine. As far as I understand, this is a log issue, didn't see any other undesirable behavior. Host becomes non-operational and the following message is shown in webadmin:

Failed to install Host green-b. Failed to execute stage 'Closing up': Command '/bin/systemctl' failed to execute.

Expected results:
The error from vdsm should be treated and reported nicely in the logs


Additional info: logs from engine and host

Comment 1 Elad 2014-08-12 11:17:46 UTC
The bug was opened on 3.4.1, but it occurs in 3.4.2. There is no such option.

Comment 2 Yaniv Bronhaim 2014-08-20 00:30:46 UTC
I'm not sure the error you mention related to the same exception in vdsm.
 but anyhow, vdsm stopped to response after this exception iiuc, so the ssh communication might dropped and it reports about installation failure or that service could not start properly. what else do we except to see if vdsm doesn't response or cannot start? I think the current behavior and report are fine in such cases from engine's prospective

Comment 3 Elad 2014-08-20 07:39:33 UTC
As I see it, the issue here is the ugly message in the log

Comment 4 Yaniv Bronhaim 2014-08-22 13:44:56 UTC
but at the end you've got - 
2014-08-12 12:05:26,870 ERROR [org.ovirt.engine.core.bll.InstallVdsCommand] (org.ovirt.thread.pool-4-thread-49) [7db12cab] Host installation failed for host db0b69b9-5b0e-4e88-9e18-934e24580492, gre
en-b.: java.io.IOException: Command returned failure code 1 during SSH session 'root.102.11'
        at org.ovirt.engine.core.utils.ssh.SSHClient.executeCommand(SSHClient.java:527) [utils.jar:]


which is the exact problem.. it leads you to check vdsm.log or the host-deploy log and understand what went wrong

I don't see how else we can handle that.

oved, what do you think?

Comment 5 Oved Ourfali 2014-08-25 05:50:26 UTC
I agree. The message here seems good to me, under these circumstances. We are explaining exactly what happened. Closing it as wontfix, although I think it is notabug at all, but can't argue with the niceness of error messages...