Bug 1040663

Summary: RHEV-H update fails to upgrade from portal and can not be activated
Product: Red Hat Enterprise Virtualization Manager Reporter: wdaniel
Component: rhev-hypervisorAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED NEXTRELEASE QA Contact: Pavel Stehlik <pstehlik>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: alonbl, benglish, dfediuck, fdeutsch, iheim, wdaniel, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: node
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-13 18:16:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vds_bootstrap logs
none
vdsm logs none

Description wdaniel 2013-12-11 20:43:34 UTC
Created attachment 835471 [details]
vdsm-reg.log from hypervisor

Description of problem:

Customer is currently running:
Red Hat Enterprise Virtualization Hypervisor release 6.3 (20130129.0.el6_3)
vdsm-4.10.2-1.13.el6ev.x86_64

I gave him the workaround of "clicking activate" but RHEV comes back with the error message:
Cannot switch Host to Maintenance mode, Host is not operational.
I had him check the admin TUI post-upgrade to see if the upgrade was in fact successful, but it was not.

My customer has the ability to manually upgrade all his hosts via DVD, but obviously he would prefer not to. There was a ticket that dealt with something very similar, Bug 920671, but we haven't found a resolution yet.

Comment 3 Fabian Deutsch 2013-12-16 19:13:45 UTC
I'm seeing many SQL related IO errors in the logs.

Itamar, who could help here?

Comment 4 Fabian Deutsch 2013-12-16 19:17:56 UTC
Also HTTPS related errors

Comment 9 Fabian Deutsch 2013-12-19 09:58:21 UTC
(In reply to wdaniel from comment #0)
> I gave him the workaround of "clicking activate" but RHEV comes back with
> the error message:
> Cannot switch Host to Maintenance mode, Host is not operational.
> I had him check the admin TUI post-upgrade to see if the upgrade was in fact
> successful, but it was not.

Hey Wallay,

when does this message appear?

Before the update? (The host can not be set to maintenance mode to do the upgrade).

Or after the update?

Comment 11 wdaniel 2013-12-19 14:40:08 UTC
(In reply to Fabian Deutsch from comment #9)
> (In reply to wdaniel from comment #0)
> > I gave him the workaround of "clicking activate" but RHEV comes back with
> > the error message:
> > Cannot switch Host to Maintenance mode, Host is not operational.
> > I had him check the admin TUI post-upgrade to see if the upgrade was in fact
> > successful, but it was not.
> 
> Hey Wallay,
> 
> when does this message appear?
> 
> Before the update? (The host can not be set to maintenance mode to do the
> upgrade).
> 
> Or after the update?

Fabian,

In this order of events, the customer went through the regular upgrade process and hit the failure message. It is after the upgrade process that the customer attempts to click "activate" to work around this, and that is when he encounters the "Host is non-op" message.

Also, I wanted to repeat that that customer checked the admin TUI on the RHEV-H and the version reported there is still the old 6.3 version, not the expected upgraded version. As far as I can tell the failure that he initially runs into (not being able to activate) fits the same criteria as mentioned in this article:  
https://access.redhat.com/site/solutions/380313

This being the case, I would imagine that the hypervisor would still be updated. Is the hypervisor supposed to reboot somewhere in the upgrade process? Is there any way to tell on the "non-op" hypervisor where in the upgrade process it has stopped?

Comment 12 Fabian Deutsch 2013-12-20 10:52:06 UTC
Wallace,

yes - maybe Alon can help us here to tell where the logs reside when the update got initiated through RHEV-M.

Does the customer see a backup entry in grub when he reboots the machine?
And yes - a reboot is happening after the update was pushed ot the machine.

Comment 13 Alon Bar-Lev 2013-12-20 12:59:02 UTC
Upgrade messages are be written to:
- enine:/var/log/ovirt-engine/engine.log
- host:/var/log/vdsm-reg/vds_bootstrap_upgrade*.log
- host: ovirt-node logs.

Comment 14 wdaniel 2013-12-27 17:45:42 UTC
(In reply to Fabian Deutsch from comment #12)
> Wallace,
> 
> yes - maybe Alon can help us here to tell where the logs reside when the
> update got initiated through RHEV-M.
> 
> Does the customer see a backup entry in grub when he reboots the machine?
> And yes - a reboot is happening after the update was pushed ot the machine.

Fabian,

The customer has confirmed that the GRUB menu does not show any backup entries post-reboot, and the only entry is the currently installed 6.3 image.

Comment 15 Fabian Deutsch 2014-01-03 11:50:17 UTC
(In reply to wdaniel from comment #14)
> The customer has confirmed that the GRUB menu does not show any backup
> entries post-reboot, and the only entry is the currently installed 6.3 image.

Hey Wallace,

right - that means the updated did not happen. Please ask the customer to provide the logs named by Alon in comment 13.

Comment 16 wdaniel 2014-01-03 21:09:26 UTC
Created attachment 845083 [details]
vds_bootstrap logs

Comment 17 wdaniel 2014-01-03 21:12:16 UTC
Created attachment 845085 [details]
vdsm logs

Comment 18 wdaniel 2014-01-03 21:14:08 UTC
(In reply to Fabian Deutsch from comment #15)
> (In reply to wdaniel from comment #14)
> > The customer has confirmed that the GRUB menu does not show any backup
> > entries post-reboot, and the only entry is the currently installed 6.3 image.
> 
> Hey Wallace,
> 
> right - that means the updated did not happen. Please ask the customer to
> provide the logs named by Alon in comment 13.

Fabian, Alon,

The requested files have been attached to the bug, please let me know if there is anything else I can provide to you guys. Thanks!

Comment 19 Alon Bar-Lev 2014-01-04 06:11:33 UTC
Comment on attachment 845083 [details]
vds_bootstrap logs

I guess in ovirt-node log there will be more information

Wed, 27 Nov 2013 18:35:56 DEBUG    <BSTRAP component='setMountPoint' status='OK' message='Mount succeeded.'/>
Wed, 27 Nov 2013 18:35:56 INFO     Using default value for: BOOT_SIZE
Wed, 27 Nov 2013 18:35:56 INFO     Using default value for: ROOT_SIZE
Wed, 27 Nov 2013 18:35:56 INFO     Using default value for: CONFIG_SIZE
Wed, 27 Nov 2013 18:35:56 INFO     Using default value for: LOGGING_SIZE
Wed, 27 Nov 2013 18:35:56 INFO     Using default value for: DATA_SIZE
Wed, 27 Nov 2013 18:35:56 INFO     Using default value for: SWAP2_SIZE
Wed, 27 Nov 2013 18:35:56 INFO     Using default value for: DATA2_SIZE
Wed, 27 Nov 2013 18:35:56 ERROR    <BSTRAP component='RHEV_INSTALL' status='FAIL'/>

Comment 20 Fabian Deutsch 2014-01-06 07:10:57 UTC
Wallace, could you please also attach /var/log/ovirt.log

Comment 21 wdaniel 2014-01-08 15:39:23 UTC
Fabian, Alon,

Apologies for the file name mix up, it's rare that I have anyone request those specific files. Right now we have sosreports from 2 hosts in the environment where this is happening, and for one host (rhev5) there is no data in that 'ovirt.log' file. On the other (rhev6), there are only the following 3 lines:

2013-11-27 18:35:56,427 - DEBUG - ovirtfunctions - Translating: 
2013-11-27 18:35:56,442 - DEBUG - ovirtfunctions - Translating: 
2013-11-27 18:35:56,465 - INFO - install - Installing the image.

Is there anything else I can get to you guys?

Comment 23 Fabian Deutsch 2014-01-24 08:36:46 UTC
Hey Wallace,

could you please attach the sosreports form the nodes and the engine?

Comment 24 wdaniel 2014-01-27 21:50:31 UTC
(In reply to Fabian Deutsch from comment #23)
> Hey Wallace,
> 
> could you please attach the sosreports form the nodes and the engine?

Fabian, 

I'm happy to get those to you, however each sosreport is ~400MB, which seems to exceed Bugzilla's size limit. Are there any particular folders or files I can pick out and compress to get them to you?

Comment 25 wdaniel 2014-02-07 19:46:07 UTC
Fabian,

I hadn't heard anything back, but now have the log collector available to download at the following location:

http://file.rdu.redhat.com/~wdaniel/wdaniel/00988528-sosreport-LogCollector-20131202112026.tar.xz

It's 700MB but should have everything you need. Let me know if there is anything else I can get to you.

Comment 26 Fabian Deutsch 2014-02-09 23:01:12 UTC
Hey Daniel,

The RHEV-H version is quite old. I need to see how we can progress with debugging here.

Comment 27 Doron Fediuck 2014-03-13 18:16:45 UTC
This was opened on 3.2 / 6.3, while we already have 3.3 / 6.5 in support.
As mentioned in comment 26 this is too old, and we're not aware of this issue
in recent versions.