Bug 1438640

Summary: Host added to virt+gluster cluster displays a failed event message when checking for available updates for host with error message 'Command returned failure code 1 during SSH session'
Product: [oVirt] ovirt-engine Reporter: RamaKasturi <knarra>
Component: Frontend.WebAdminAssignee: bugs <bugs>
Status: CLOSED WONTFIX QA Contact: RamaKasturi <knarra>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.1.2CC: bugs, knarra, mperina, oourfali, sabose
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-30 11:46:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1411323    

Description RamaKasturi 2017-04-04 02:09:58 UTC
Description of problem:
Host which is added to virt+gluster cluster throws an error "failed to check available updates on host <host_name> with message 'Command returned failure code 1 during SSH session' 

Version-Release number of selected component (if applicable):
Red Hat Virtualization Manager Version: 4.1.1.2-0.1.el7

How reproducible:
Always

Steps to Reproduce:
1. Install HC with three hosts
2. 
3.

Actual results:
There is an event logged in the events tab which reads "Failed to check available updates on host <host_name> with message 'Command returned failure code 1 during SSH session'

Expected results:
There should not be any failure message while checking for updates.

Additional info:
Following error is seen in the engine.log
2017-04-03 06:47:50,865-04 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (pool-7-thread-3) [77d49287] SSH error running command root.eng.blr.redhat.c
om:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 
0; tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-mgmt DIALOG/dialect=str:machine DIALOG/customization=bool:True': Command returned failure code 1 dur
ing SSH session 'root.eng.blr.redhat.com'
2017-04-03 06:47:50,865-04 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (pool-7-thread-3) [77d49287] Exception: java.io.IOException: Command returned failure code 1 du
ring SSH session 'root.eng.blr.redhat.com'
        at org.ovirt.engine.core.uutils.ssh.SSHClient.executeCommand(SSHClient.java:503) [uutils.jar:]
        at org.ovirt.engine.core.uutils.ssh.SSHDialog.executeCommand(SSHDialog.java:317) [uutils.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.execute(VdsDeployBase.java:563) [bll.jar:]
        at org.ovirt.engine.core.bll.host.HostUpgradeManager.checkForUpdates(HostUpgradeManager.java:48) [bll.jar:]
        at org.ovirt.engine.core.bll.host.AvailableUpdatesFinder.checkForUpdates(AvailableUpdatesFinder.java:40) [bll.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.HostUpdatesChecker.checkForUpdates(HostUpdatesChecker.java:49) [bll.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.HostUpdatesCheckerService.lambda$submitCheckUpdatesForHost$1(HostUpdatesCheckerService.java:67) [bll.jar:]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]

2017-04-03 06:47:50,865-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (pool-7-thread-3) [77d49287] Error during host rhsqa-grafton2.lab.eng.blr.redhat.com in
stall: java.io.IOException: Command returned failure code 1 during SSH session 'root.eng.blr.redhat.com'
        at org.ovirt.engine.core.uutils.ssh.SSHClient.executeCommand(SSHClient.java:503) [uutils.jar:]
        at org.ovirt.engine.core.uutils.ssh.SSHDialog.executeCommand(SSHDialog.java:317) [uutils.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.execute(VdsDeployBase.java:563) [bll.jar:]
        at org.ovirt.engine.core.bll.host.HostUpgradeManager.checkForUpdates(HostUpgradeManager.java:48) [bll.jar:]
        at org.ovirt.engine.core.bll.host.AvailableUpdatesFinder.checkForUpdates(AvailableUpdatesFinder.java:40) [bll.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.HostUpdatesChecker.checkForUpdates(HostUpdatesChecker.java:49) [bll.jar:]
        at org.ovirt.engine.core.bll.hostdeploy.HostUpdatesCheckerService.lambda$submitCheckUpdatesForHost$1(HostUpdatesCheckerService.java:67) [bll.jar:]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]

2017-04-03 06:47:50,866-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.HostUpdatesChecker] (pool-7-thread-3) [77d49287] Failed to check if updates are available for host 'rh
sqa-grafton2.lab.eng.blr.redhat.com' with error message 'Command returned failure code 1 during SSH session 'root.eng.blr.redhat.com''
2017-04-03 06:47:50,869-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-7-thread-3) [77d49287] EVENT_ID: HOST_AVAILABLE_UPDATES_FAILED(
839), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Failed to check for available updates on host rhsqa-grafton2.lab.eng.blr.redhat.com with message 
'Command returned failure code 1 during SSH session 'root.eng.blr.redhat.com''.
2017-04-03 06:47:51,201-04 INFO  [

Comment 2 RamaKasturi 2017-04-04 09:13:41 UTC
Hi yaniv,
  
  I have copied engine and vdsm logs to the link below.

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1438640/

Thanks
kasturi

Comment 3 Yaniv Kaul 2017-04-04 09:37:50 UTC
(In reply to RamaKasturi from comment #2)
> Hi yaniv,
>   
>   I have copied engine and vdsm logs to the link below.
> 
> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1438640/
> 
> Thanks
> kasturi

Excellent, since now we can see in Engine the following:
 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [77d49287] Yum: Cannot queue package ovirt-node-ng-image-update: Package ovirt-node-ng-image-update cannot be found
2017-04-03 06:47:50,520-04 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [77d49287] Yum: Performing yum transaction rollback
2017-04-03 06:47:50,521-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [77d49287] Failed to execute stage 'Package installation': Package ovirt-node-ng-image-update cannot be found

Is that the case?

Comment 4 RamaKasturi 2017-04-04 09:59:59 UTC
(In reply to Yaniv Kaul from comment #3)
> (In reply to RamaKasturi from comment #2)
> > Hi yaniv,
> >   
> >   I have copied engine and vdsm logs to the link below.
> > 
> > http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1438640/
> > 
> > Thanks
> > kasturi
> 
> Excellent, since now we can see in Engine the following:
>  ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy)
> [77d49287] Yum: Cannot queue package ovirt-node-ng-image-update: Package
> ovirt-node-ng-image-update cannot be found
> 2017-04-03 06:47:50,520-04 INFO 
> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [77d49287]
> Yum: Performing yum transaction rollback
> 2017-04-03 06:47:50,521-04 ERROR
> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [77d49287]
> Failed to execute stage 'Package installation': Package
> ovirt-node-ng-image-update cannot be found
> 
> Is that the case?

Hi Yaniv,

  I changed the title of the bug to reflect the full error message where command returns failure during SSH session.

Thanks
kasturi

Comment 5 RamaKasturi 2017-04-04 10:47:22 UTC
Hi Yaniv,

  If the event message was "failed to check updates" i would be happy as there are no repos enabled on the node because of which it failed to check for update. But from the event message it appears to me that it fails to check update because 'Command returned failure code 1 during SSH session'. 


Thanks
kasturi

Comment 7 Martin Perina 2017-04-19 07:59:12 UTC
Every host (both type Centos/Fedora or NGN) needs to have oVirt repositories installed. If not or one of required packages are not available we fail check for upgrade. The error message itself is shown in Events tab, details are in specific host-deploy log (the exact name is shown in Events), but on engine side we just don't know why exactly host-deploy process failed, that's why we show stack trace on engine for all premature host-deploy SSH session exits.

Comment 8 Oved Ourfali 2017-04-30 11:46:02 UTC
As Martin stated, if everything is configured properly, we won't have any error.
For other cases, the events + host deploy logs are the address to troubleshoot.
Closing as wontfix.