Bug 869653

Summary: [ovirt-engine-backend] rhevh upgrade fails on java.io.IOException (Command returned failure code 1 during SSH session)
Product: Red Hat Enterprise Virtualization Manager Reporter: Martin Pavlik <mpavlik>
Component: ovirt-engineAssignee: Alon Bar-Lev <alonbl>
Status: CLOSED WONTFIX QA Contact: Martin Pavlik <mpavlik>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: dyasny, gklein, iheim, lpeer, Rhev-m-bugs, sgordon, yeylon, ykaul
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Release Note
Doc Text:
When a Hypervisor is upgraded for Red Hat Enterprise Virtualization 3.1 compatibility the upgrade will initially be listed in the Administration Portal as unreachable or in maintenance mode. This will be the case even where the upgrade was successful. To resolve this issue use the Administration Portal to put the Hypervisor into maintenance mode (if necessary) and then activate it manually to resume normal operation. This issue only exists when upgrading from Hypervisors that include vdsm-4.9 to Hypervisors that include vdsm-4.9.6. The issue will not occur on subsequent upgrades.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-25 16:34:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log_collector
none
screenshot 1
none
vds_bootstrap_upgrade.20121024_122039
none
vdsm-upgrade script none

Description Martin Pavlik 2012-10-24 13:43:34 UTC
Created attachment 632793 [details]
log_collector

Description of problem:
RHEVH upgrade is performed, but rhevm GUI reports that install failed. Following error is shown in GUI:
Host  dell-r210ii-08.rhev.lab.eng.brq.redhat.com installation failed. SSH  command failed while executing at host '10.34.66.81', refer to logs for  further information.)
Host is marked as install failed. However if user puts the host in maintanace and activates it, host is working properly.


Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Manager Version: '3.1.0-22.el6ev' 
both rhevh: vdsm-4.9-113.4.el6_3.x86_64

How reproducible:100%

Steps to Reproduce:
1. Add RHEVH 20121012.0.el6_3 host into setup (cluster 3.0)
2. Upgrade the host to RHEVH (20121023.0.el6_3) via rhevm GUI

  
Actual results:
Host is upgraded, GUI reports that install failed

Expected results:
Host is upgraded, GUI correctly reports about the upgrade

Additional info:
log-collector files attached

engine.log

2012-10-24 14:20:46,677 ERROR [org.ovirt.engine.core.utils.hostinstall.VdsInstallerSSH] (pool-4-thread-50) SSH error running command 10.34.66.81:'/usr/share/vdsm-reg/vdsm-upgrade': java.io.IOException: Command returned failure code 1 during SSH session 10.34.66.81:22' '/usr/share/vdsm-reg/vdsm-upgrade'
        at org.ovirt.engine.core.utils.ssh.SSHClient.executeCommand(SSHClient.java:442) [engine-utils.jar:]
        at org.ovirt.engine.core.utils.hostinstall.VdsInstallerSSH.executeCommand(VdsInstallerSSH.java:387) [engine-utils.jar:]
        at org.ovirt.engine.core.utils.hostinstall.VdsInstallerSSH.executeCommand(VdsInstallerSSH.java:426) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.OVirtUpgrader.RunStage(OVirtUpgrader.java:54) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.VdsInstaller.Install(VdsInstaller.java:280) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:110) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:825) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:916) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1300) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:168) [engine-utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:107) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:931) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:285) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommands(MultipleActionsRunner.java:182) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.RunCommands(MultipleActionsRunner.java:162) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner$1.run(MultipleActionsRunner.java:84) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:64) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_09-icedtea]

Comment 1 Martin Pavlik 2012-10-24 13:44:35 UTC
Created attachment 632794 [details]
screenshot 1

Comment 2 Alon Bar-Lev 2012-10-24 14:02:00 UTC
dup of bug#849315?

To confirm please attach logs at /var/log/vdsm-reg/vds_bootstrap_upgrade.*

Thanks.

Comment 3 Martin Pavlik 2012-10-24 14:07:32 UTC
Created attachment 632813 [details]
vds_bootstrap_upgrade.20121024_122039

Comment 4 Alon Bar-Lev 2012-10-24 14:45:57 UTC
Looks like bug#849315.

Comment 5 Martin Pavlik 2012-10-24 15:06:18 UTC
Created attachment 632837 [details]
vdsm-upgrade script

Comment 6 Alon Bar-Lev 2012-10-24 15:10:16 UTC
Confirmed.

*** This bug has been marked as a duplicate of bug 849315 ***

Comment 7 Martin Pavlik 2012-10-25 08:54:48 UTC
This bug is not really duplicate of 849315 since BZ 849315 solves this problem for rhevh hosts with vdsm 4.9.6.x I encountered this problem with rhevh host which contained vdsm 4.9-113.4.

Please push fixes used to solve bug 849315 also to z of 4.9.

please consider also pushing:

http://gerrit.ovirt.org/7301
http://gerrit.ovirt.org/7279

Comment 8 Alon Bar-Lev 2012-10-25 09:14:30 UTC
Every bug in one release likely to be in previous... :)

Barak, if you think we should backport this, please flag.

Comment 9 Itamar Heim 2012-10-25 10:49:52 UTC
(In reply to comment #8)
> Every bug in one release likely to be in previous... :)
> 
> Barak, if you think we should backport this, please flag.

wouldn't backporting this only cause upgrade from an older rhev-h to fail?
the next 3.0 update of vdsm is most likely to be 3.1 already, so unless there is a very good reason, we wouldn't backport this.

Comment 10 Alon Bar-Lev 2012-10-25 10:54:14 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Every bug in one release likely to be in previous... :)
> > 
> > Barak, if you think we should backport this, please flag.
> 
> wouldn't backporting this only cause upgrade from an older rhev-h to fail?
> the next 3.0 update of vdsm is most likely to be 3.1 already, so unless
> there is a very good reason, we wouldn't backport this.

Currently upgrading from older rhev-h does fail.
rhevm-3.0 does not check for status... so it is natural for this fix.

Our options:

1. Document in release notes that this is expected behaviour when upgrading older rhev-h.

2. Fix and push z stream, hoping that people will upgrade rhev-h to this version before upgrading to rhevm-3.1.

Comment 11 Itamar Heim 2012-10-25 10:59:57 UTC
(In reply to comment #10)

> Currently upgrading from older rhev-h does fail.

does it fail rhev-m 3.0 customers or 3.1 customers as well?

> rhevm-3.0 does not check for status... so it is natural for this fix.
> 
> Our options:
> 
> 1. Document in release notes that this is expected behaviour when upgrading
> older rhev-h.

if it fails for the first upgrade, then ok it is one thing.
if it always fail for a 3.0 customer with a newer rhev-h, it is a problem which we should consider fixing asap in 3.0.8.
we actually try very hard to avoid such regressions.
is there a workaround?

> 
> 2. Fix and push z stream, hoping that people will upgrade rhev-h to this
> version before upgrading to rhevm-3.1.

Comment 12 Alon Bar-Lev 2012-10-25 11:13:02 UTC
(In reply to comment #11)
> (In reply to comment #10)
> 
> > Currently upgrading from older rhev-h does fail.
> 
> does it fail rhev-m 3.0 customers or 3.1 customers as well?
> 
> > rhevm-3.0 does not check for status... so it is natural for this fix.
> > 

^^^^^^^^^^^^^^^^^^^^^^^

> > Our options:
> > 
> > 1. Document in release notes that this is expected behaviour when upgrading
> > older rhev-h.
> 
> if it fails for the first upgrade, then ok it is one thing.

This is what happens.

> if it always fail for a 3.0 customer with a newer rhev-h, it is a problem
> which we should consider fixing asap in 3.0.8.
> we actually try very hard to avoid such regressions.

Should not. I guess Martin can confirm.

> is there a workaround?

Yes, ignore the error as in comment#0, and start enable the host.

> 
> > 
> > 2. Fix and push z stream, hoping that people will upgrade rhev-h to this
> > version before upgrading to rhevm-3.1.

Comment 13 Itamar Heim 2012-10-25 11:20:23 UTC
if this only happens on the first upgrade to a new rhev-h, and same happens for 3.1 users, then what exactly is to fix? isn't this the same behavior we'd see for a 3.1 user for first upgrade from an older rhev-h?

Comment 14 Martin Pavlik 2012-10-25 11:32:56 UTC
What I described in this BZ is upgrade from
rhevh20121012.0.el6_3 to 20121023.0.el6_3 both of them have vdsm-4.9-113.4.el6_3.x86_64

Comment 15 Alon Bar-Lev 2012-10-25 11:39:01 UTC
(In reply to comment #13)
> if this only happens on the first upgrade to a new rhev-h, and same happens
> for 3.1 users, then what exactly is to fix? isn't this the same behavior
> we'd see for a 3.1 user for first upgrade from an older rhev-h?

I honestly don't understand the question.

Comment 16 Alon Bar-Lev 2012-10-25 16:34:19 UTC
Martin,

We cannot avoid this error, as even if we upgrade from node1 to node2, node1 has this bug... and as there is no planned nodes without vdsm-4.9.6 (maybe 1) it is not worth to distribute z of node.

I am closing this for now.

Thank you.