Description of problem: Installed latest released version of 4.1.2 ovirt node. I see that there exists an upgrade button next to the host in the UI. Clicking on the upgrade button results in leaving the state of host to installed_failed. Below are the events which i see during the upgrade process. Attached the screenshot which shows the event messages. Version-Release number of selected component (if applicable): RHVM - 4.1.2.3-0.1.el7 RHVH - 4.1-0.20170522.0+1 How reproducible: Always Steps to Reproduce: 1. Install HC using latest released bits of RHV-H which is 4.1.2 2. An upgrade button is shown next to the host in the UI 3. Click on upgrade. Actual results: Clicking on upgrade leaves the host into 'Install Failed' state. Expected results: 1) An upgrade icon next to the host should not be shown since there is no upgrade available on the host. 2) clicking on upgrade icon should either install the updated versions if any or should say there is nothing to upgrade and the icon should disappear. 3) Clicking on upgrade should not leave the host in installed failed state. Additional info:
Created attachment 1293752 [details] Attaching screenshot for the event messages in UI
copied /tmp/imgbased.log, sosreports from the machine where the issue occured and engine.log into the location below. http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1467185/
Following exception is seen in the engine logs: ================================================== 2017-07-03 05:53:59,401-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [cfe76218-65da-4f0b-bc2c-343d3735f646] Error during deploy dialog: java.io.IOException: Unexpected connection termination at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:376) [otopi.jar:] at org.ovirt.otopi.dialog.MachineDialogParser.nextEvent(MachineDialogParser.java:393) [otopi.jar:] at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.threadMain(VdsDeployBase.java:304) [bll.jar:] at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.lambda$new$0(VdsDeployBase.java:383) [bll.jar:] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131] 2017-07-03 05:53:59,404-04 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (pool-5-thread-1) [cfe76218-65da-4f0b-bc2c-343d3735f646] SSH error running command root.eng.blr.redhat.com:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/ovirt-host-mgmt DIALOG/dialect=str:machine DIALOG/customization=bool:True': SSH session hard timeout host 'root.eng.blr.redhat.com' 2017-07-03 05:53:59,404-04 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (pool-5-thread-1) [cfe76218-65da-4f0b-bc2c-343d3735f646] Exception: javax.naming.TimeLimitExceededException: SSH session hard timeout host 'root.eng.blr.redhat.com' at org.ovirt.engine.core.uutils.ssh.SSHClient.executeCommand(SSHClient.java:475) [uutils.jar:] at org.ovirt.engine.core.uutils.ssh.SSHDialog.executeCommand(SSHDialog.java:317) [uutils.jar:] at org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase.execute(VdsDeployBase.java:563) [bll.jar:] at org.ovirt.engine.core.bll.host.HostUpgradeManager.update(HostUpgradeManager.java:99) [bll.jar:] at org.ovirt.engine.core.bll.hostdeploy.UpgradeHostInternalCommand.executeCommand(UpgradeHostInternalCommand.java:72) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) [bll.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:511) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:756) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_131] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_131] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_131] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437) at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:70) [wildfly-weld-7.0.0.GA-redhat-2.jar:7.0.0.GA-redhat-2] at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:80) [wildfly-weld-7.0.0.GA-redhat-2.jar:7.0.0.GA-redhat-2] at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:93) [wildfly-weld-7.0.0.GA-redhat-2.jar:7.0.0.GA-redhat-2] at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)
There was an upgrade available, and it was successfully pulled and installed. The host was upgraded to 4.1.2 async (for the stackguard CVE) Since the 4.1.3 upgrade performs additional tasks and takes longer, we also requested https://bugzilla.redhat.com/show_bug.cgi?id=1455667 There are a couple of possible reasons for this: * The disk was too slow * The upgrade RPM took too long to retrieve * RHV-M is not 4.1.3 Using a RHV-H 4.1.3 repo will also add timestamps to imgbased.log, which will give a better idea of what is taking longer, but for now, this is either a DUPLICATE or NOTABUG. Please upgrade RHV-M to 4.1.3 and retest.
Thanks Ryan for the update. Let me try the same with RHV-M 4.1.3 and retest this. so for RHHI customers who are on RHV-H 4.1.2, do we recommend them to upgrade to RHV-M to 4.1.3 first and upgrade the nodes next due to the timeout issue which is being fixed with 4.1.3 ?
RHV-H 4.1.3 also threads all of the upgrade operations instead of running them sequentially, so it will upgrade much faster in general. Upgrading RHV-M 4.1.3 first is probably a good idea, but not strictly necessary
I have updated the engine to latest Red Hat Virtualization Manager Version: 4.1.3.5-0.1.el7. I had three nodes, first node it succeeded, second one i saw that node HA status was in 'Local Maintenance" and upgrading third node gives me the error "Now i tried to upgrade the RHV-H node from UI i still see the issue where "processing stopped due to timeout." i have copied the /tmp/imgbased.log, vdsm and engine logs in the location below. http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1467185/
(In reply to RamaKasturi from comment #7) > I had three nodes, first node it succeeded, second one i saw that node HA > status was in 'Local Maintenance" and upgrading third node gives me the > error "Now i tried to upgrade the RHV-H node from UI i still see the issue > where "processing stopped due to timeout." > > i have copied the /tmp/imgbased.log, vdsm and engine logs in the location > below. > > http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1467185/ I'm not sure whether going into "Local Maintenance" for HA is normal or not, since this is handled by ovirt-hosted-engine-agent/broker. Unless you are also upgrading to RHVH 4.1.3 (which adds timestamps to the logs), it's not possible to say what's happening here. It's likely that the disks on the other host were simply too slow. The last reports from virt QE show that mkfs is taking ~4 minutes on systems with slow disks (or a large number of disks in the VG), which is not something under our control.
Is this still reproducible?
Hi Ryan, I could not get time to try again to see if this is still reproducible. But i think there should be a note added in the guide asking the user to upgrade the engine to 4.1.3 before proceeding with the upgrade of RHV-H. Thanks kasturi