Bug 1149384
| Summary: | Host upgrade failed with "Host xxxxxx installation failed. SSH copy failed, invalid localDigest" error message in the rhvem GUI. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Udayendu Sekhar Kar <ukar> | ||||
| Component: | ovirt-engine | Assignee: | Alon Bar-Lev <alonbl> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | |||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 3.4.1-1 | CC: | alonbl, ecohen, iheim, lpeer, lsurette, oourfali, rbalakri, Rhev-m-bugs, rpai, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | infra | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-10-07 09:25:05 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Udayendu Sekhar Kar
2014-10-04 04:08:57 UTC
Created attachment 943824 [details]
engine logs after failure
this suggest either: 1. file transfer is corrupted due to network failure. 2. utilities gunzip, md5sum at host are not working properly. 3. there is banner at /etc/profile or similar that is printed when ssh is accepting new sessions. description lacks if it happens to every host or specific one, is it reproducible at lab etc... hi Alon, This happened to most of the hosts. First time its failling to complete the upgrade process but second time its working well. But I observed that for one host, after the upgrade complete the host has not rebooted and all the parameter got changed in General tab like host release, vdsm version, kvm version etc. Then I did the reboot manually. But for other host, in 2nd attempt host upgrade went well and host got rebooted automatically. So some time its confusing me whats the exact process of host upgradation. And from the rhevm GUI whats the difference between host upgradation & re-install. Thanks, Uday and after upgrade is it still happening? move to maint and try to upgrade again to the same version. Now all the hosts are with the latest 30th sep release of rhev-h. And i have not tried to upgrade again. I will try as per comment#7 and let you know. (In reply to Udayendu Sekhar Kar from comment #12) > This is the production setiup. But here the ISO reached successfully from > engine to node. it is not happening a lot in production. retry may help. if it does not, workaround until 3.5 is to put apache-sshd-0.11.0 jar[1] in engine module path. 1. copy /usr/share/ovirt-engine/modules/org/apache/sshd directory to /usr/share/ovirt-engine-workaround/modules/org/apache/sshd 2. replace /usr/share/ovirt-engine-workaround/modules/org/apache/sshd/sshd-core.jar with [1]. 3. add /etc/ovirt-engine/engine.conf.d/80-sshd-core-workaround.conf --- ENGINE_JAVA_MODULEPATH="/usr/share/ovirt-engine-workaround/modules:${ENGINE_JAVA_MODULEPATH}" --- 4. restart engine. [1] http://search.maven.org/remotecontent?filepath=org/apache/sshd/sshd-core/0.11.0/sshd-core-0.11.0.jar Alon, After 2-3 try we are able to do the upgrade properly. So I believe we can wait till rhevm 3.5. Thanks, Uday OK, so the issue was probably within previous node image, I am unsure what the issue was nor bug so cannot mark it as duplicate. I suggest to close it at insufficient data. But have you got any clue with the latest engine.log that I have collected after the failure of the upgrade of node1. (In reply to Udayendu Sekhar Kar from comment #16) > But have you got any clue with the latest engine.log that I have collected > after the failure of the upgrade of node1. I see upgrade success: --- 2014-10-06 09:45:14,616 INFO [org.ovirt.engine.core.bll.OVirtNodeUpgrade] (OVirtNodeUpgrade) update from host 192.168.145.3: <BSTRAP component='RHEV_INSTALL' status='OK'/> 2014-10-06 09:45:14,616 INFO [org.ovirt.engine.core.bll.InstallerMessages] (OVirtNodeUpgrade) Installation 192.168.145.3: Step: RHEV_INSTALL 2014-10-06 09:45:14,639 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (OVirtNodeUpgrade) Correlation ID: 4d734ac9, Call Stack: null, Custom Event ID: -1, Message: Installing Host node1. Step: RHEV_INSTALL. 2014-10-06 09:45:14,650 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-4-thread-11) [4d734ac9] START, SetVdsStatusVDSCommand(HostName = node1, HostId = cc6e75d1-e447-43c9-a902-1edc7806ca49, status=Reboot, nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 20215ac9 2014-10-06 09:45:14,663 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-4-thread-11) [4d734ac9] FINISH, SetVdsStatusVDSCommand, log id: 20215ac9 --- Then the following which is not host deploy related issue, so I guess a new bug should be filed, probably against engine core. --- 2014-10-06 09:51:02,595 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-96) [42e7e03] HostName = node1 2014-10-06 09:51:02,595 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-96) [42e7e03] Command GetCapabilitiesVDSCommand(HostName = node1, HostId = cc6e75d1-e447-43c9-a902-1edc7806ca49, vds=Host[node1,cc6e75d1-e447-43c9-a902-1edc7806ca49]) execution failed. Exception: VDSRecoveringException: Recovering from crash or Initializing 2014-10-06 09:51:02,651 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-96) [42e7e03] Correlation ID: null, Call Stack: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSRecoveringException: Recovering from crash or Initializing at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:42) at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:16) at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:96) at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56) at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31) at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:511) at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVdsRunTimeInfo(VdsUpdateRunTimeInfo.java:486) at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:342) at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:231) at sun.reflect.GeneratedMethodAccessor127.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) at org.quartz.core.JobRunShell.run(JobRunShell.java:213) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) , Custom Event ID: -1, Message: Host node1 is initializing. Message: Recovering from crash or Initializing 2014-10-06 09:51:02,651 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-96) [42e7e03] Failed to refresh VDS , vds = cc6e75d1-e447-43c9-a902-1edc7806ca49 : node1, error = Recovering from crash or Initializing, continuing. 2014-10-06 09:51:05,678 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-93) Command --- OK, I am closing this and will file a new one. Thanks for your help Alon ! |