Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1149384

Summary: Host upgrade failed with "Host xxxxxx installation failed. SSH copy failed, invalid localDigest" error message in the rhvem GUI.
Product: Red Hat Enterprise Virtualization Manager Reporter: Udayendu Sekhar Kar <ukar>
Component: ovirt-engineAssignee: Alon Bar-Lev <alonbl>
Status: CLOSED NOTABUG QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.4.1-1CC: alonbl, ecohen, iheim, lpeer, lsurette, oourfali, rbalakri, Rhev-m-bugs, rpai, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-07 09:25:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine logs after failure none

Description Udayendu Sekhar Kar 2014-10-04 04:08:57 UTC
Description of problem:
rhev-h upgrade fail from rhev-hypervisor6-6.5-20140821.1.el6 to rhev-hypervisor6-6.5-20140930.1.el6 version while trying from rhevm GUI.

Version-Release number of selected component (if applicable):
rhevm 3.4
from: rhev-hypervisor6-6.5-20140821.1.el6
to: rhev-hypervisor6-6.5-20140930.1.el

How reproducible:
Not sure

Steps to Reproduce:
1. Put the host into maintenance mode in the rhevm GUI
2. select the host and start the upgrade in the rhevm GUI
3. Upgrade will start successfully and after some time it will fail with "Host xxxxxx installation failed. SSH copy failed, invalid localDigest" message in the rhevm GUI.

Actual results:
upgrade fail with "Host xxxxxx installation failed. SSH copy failed, invalid localDigest" message.

Expected results:
upgrade should work as expected.

Additional info:
2nd time upgrade works properly.

Comment 1 Udayendu Sekhar Kar 2014-10-04 04:12:37 UTC
Created attachment 943824 [details]
engine logs after failure

Comment 5 Alon Bar-Lev 2014-10-04 19:06:18 UTC
this suggest either:
1. file transfer is corrupted due to network failure.
2. utilities gunzip, md5sum at host are not working properly.
3. there is banner at /etc/profile or similar that is printed when ssh is accepting new sessions.

description lacks if it happens to every host or specific one, is it reproducible at lab etc...

Comment 6 Udayendu Sekhar Kar 2014-10-05 02:05:01 UTC
hi Alon,

This happened to most of the hosts.

First time its failling to complete the upgrade process but second time its working well. But I observed that for one host, after the upgrade complete the host has not rebooted and all the parameter got changed in General tab like host release, vdsm version, kvm version etc. Then I did the reboot manually.

But for other host, in 2nd attempt host upgrade went well and host got rebooted automatically.

So some time its confusing me whats the exact process of host upgradation. And from the rhevm GUI whats the difference between host upgradation & re-install.

Thanks,
Uday

Comment 7 Alon Bar-Lev 2014-10-05 05:53:37 UTC
and after upgrade is it still happening? move to maint and try to upgrade again to the same version.

Comment 8 Udayendu Sekhar Kar 2014-10-05 06:18:10 UTC
Now all the hosts are with the latest 30th sep release of rhev-h. And i have not tried to upgrade again. I will try as per comment#7 and let you know.

Comment 13 Alon Bar-Lev 2014-10-07 07:46:49 UTC
(In reply to Udayendu Sekhar Kar from comment #12)
> This is the production setiup. But here the ISO reached successfully from
> engine to node.

it is not happening a lot in production.
retry may help.
if it does not, workaround until 3.5 is to put apache-sshd-0.11.0 jar[1] in engine module path.

1. copy /usr/share/ovirt-engine/modules/org/apache/sshd directory to /usr/share/ovirt-engine-workaround/modules/org/apache/sshd

2. replace /usr/share/ovirt-engine-workaround/modules/org/apache/sshd/sshd-core.jar with [1].

3. add /etc/ovirt-engine/engine.conf.d/80-sshd-core-workaround.conf
---
ENGINE_JAVA_MODULEPATH="/usr/share/ovirt-engine-workaround/modules:${ENGINE_JAVA_MODULEPATH}"
---

4. restart engine.

[1] http://search.maven.org/remotecontent?filepath=org/apache/sshd/sshd-core/0.11.0/sshd-core-0.11.0.jar

Comment 14 Udayendu Sekhar Kar 2014-10-07 08:39:05 UTC
Alon,

After 2-3 try we are able to do the upgrade properly. So I believe we can wait till rhevm 3.5.

Thanks,
Uday

Comment 15 Alon Bar-Lev 2014-10-07 08:48:43 UTC
OK, so the issue was probably within previous node image, I am unsure what the issue was nor bug so cannot mark it as duplicate.

I suggest to close it at insufficient data.

Comment 16 Udayendu Sekhar Kar 2014-10-07 08:54:24 UTC
But have you got any clue with the latest engine.log that I have collected after the failure of the upgrade of node1.

Comment 17 Alon Bar-Lev 2014-10-07 09:08:36 UTC
(In reply to Udayendu Sekhar Kar from comment #16)
> But have you got any clue with the latest engine.log that I have collected
> after the failure of the upgrade of node1.

I see upgrade success:
---
2014-10-06 09:45:14,616 INFO  [org.ovirt.engine.core.bll.OVirtNodeUpgrade] (OVirtNodeUpgrade) update from host 192.168.145.3: <BSTRAP component='RHEV_INSTALL' status='OK'/>
2014-10-06 09:45:14,616 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (OVirtNodeUpgrade) Installation 192.168.145.3: Step: RHEV_INSTALL
2014-10-06 09:45:14,639 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (OVirtNodeUpgrade) Correlation ID: 4d734ac9, Call Stack: null, Custom Event ID: -1, Message: Installing Host node1. Step: RHEV_INSTALL.
2014-10-06 09:45:14,650 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-4-thread-11) [4d734ac9] START, SetVdsStatusVDSCommand(HostName = node1, HostId = cc6e75d1-e447-43c9-a902-1edc7806ca49, status=Reboot, nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 20215ac9
2014-10-06 09:45:14,663 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-4-thread-11) [4d734ac9] FINISH, SetVdsStatusVDSCommand, log id: 20215ac9
---

Then the following which is not host deploy related issue, so I guess a new bug should be filed, probably against engine core.
---
2014-10-06 09:51:02,595 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-96) [42e7e03] HostName = node1
2014-10-06 09:51:02,595 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-96) [42e7e03] Command GetCapabilitiesVDSCommand(HostName = node1, HostId = cc6e75d1-e447-43c9-a902-1edc7806ca49, vds=Host[node1,cc6e75d1-e447-43c9-a902-1edc7806ca49]) execution failed. Exception: VDSRecoveringException: Recovering from crash or Initializing
2014-10-06 09:51:02,651 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-96) [42e7e03] Correlation ID: null, Call Stack: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSRecoveringException: Recovering from crash or Initializing
	at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:42)
	at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:16)
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:96)
	at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56)
	at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31)
	at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:511)
	at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVdsRunTimeInfo(VdsUpdateRunTimeInfo.java:486)
	at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:342)
	at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:231)
	at sun.reflect.GeneratedMethodAccessor127.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
, Custom Event ID: -1, Message: Host node1 is initializing. Message: Recovering from crash or Initializing
2014-10-06 09:51:02,651 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-96) [42e7e03] Failed to refresh VDS , vds = cc6e75d1-e447-43c9-a902-1edc7806ca49 : node1, error = Recovering from crash or Initializing, continuing.
2014-10-06 09:51:05,678 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-93) Command 
---

Comment 18 Udayendu Sekhar Kar 2014-10-07 09:17:38 UTC
OK, I am closing this and will file a new one.

Thanks for your help Alon !