Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1403735 - [z-stream clone - 4.0.7] modify output of the hosted engine CLI to show info on auto import process
[z-stream clone - 4.0.7] modify output of the hosted engine CLI to show info ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha (Show other bugs)
3.6.9
Unspecified Unspecified
high Severity urgent
: ovirt-4.0.7
: ---
Assigned To: Simone Tiraboschi
Artyom
https://www.ovirt.org/documentation/h...
integration
: Triaged, ZStream
Depends On: 1396672
Blocks: 1403750
  Show dependency treegraph
 
Reported: 2016-12-12 05:04 EST by rhev-integ
Modified: 2017-03-20 07:52 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
With this update, the output of hosted-engine --vm-status has been modified to show if the configuration and the virtual machine specification has been correctly read from the shared storage on each reported host. Since Red Hat Enterprise Virtualization 3.6 ovirt-ha-agent has read the configuration and the virtual machine specification from the shared storage, whereas, until Red Hat Enterprise Virtualization 3.5 the configuration and virtual machine specification were local files replicated on each involved host.
Story Points: ---
Clone Of: 1396672
: 1403750 (view as bug list)
Environment:
Last Closed: 2017-03-16 11:28:53 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Integration
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 67893 None None None 2016-12-12 05:06 EST
oVirt gerrit 68052 None None None 2016-12-12 05:06 EST
oVirt gerrit 68053 None None None 2016-12-12 05:06 EST
Red Hat Product Errata RHBA-2017:0541 normal SHIPPED_LIVE ovirt-hosted-engine-ha bug fix update for 4.0.7 2017-03-16 15:24:59 EDT

  None (edit)
Description rhev-integ 2016-12-12 05:04:02 EST
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1396672 +++
======================================================================

Together with bz#1394448, we need to fix our documentation asap on how we recommended the HE upgrade process from 3.5 to 3.6.

In this bug we need to fix 3.5 to 3.6 with RHEL7 hosts section[1].

Procedure 6.5. Updating the RHEV-H Self-Hosted Engine Host
Step 3:
Need to explicitly explain why this step is there and what its importance is.
This step is required to trigger the upgrade of HE SD from 3.5 to 3.6.
It is an essential part of the upgrade process and if it fails, the user should not proceed.
How to verify the upgrade succeeded? Hosted Engine Storage Domain (HE SD) should appear in the UI under Storage Tab. Until this happened, the upgrade is not complete or failed.

[1]
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Self-Hosted_Engine_Guide/Upgrading_the_Self-Hosted_Engine.html

(Originally by Marina Kalinin)
Comment 1 rhev-integ 2016-12-12 05:04:10 EST
Simone, can you please review?

(Originally by Marina Kalinin)
Comment 2 rhev-integ 2016-12-12 05:04:14 EST
I'm re-checking https://access.redhat.com/solutions/2351141

The central point is how, for the user, to be sure that the upgrade procedure really upgraded since it's not interactive but just triggered by the upgrade of the RHEV-H 3.5/el7 host to RHEV-H 3.6/el7.

The best strategy is to grep /var/log/ovirt-hosted-engine-ha/agent.log on that host for '(upgrade_35_36) Successfully upgraded'.

The upgrade procedure should be pretty stable but it requires some attention
to be sure that it worked as expected. For instance it will work if, and only if, that host is  in maintenance mode at engine eyes.

So, if the user finds something like:

(upgrade_35_36) Unable to upgrade while not in maintenance mode: please put this host into maintenance mode from the engine, and manually restart this service when ready

under /var/log/ovirt-hosted-engine-ha/agent.log, he has to put that host into maintenance mode from the engine and eventually then manually restart ovirt-ha-agent on that host (systemd will try just 10 times in a row, so the user has to manually restart it if he wasn't fast enough).

At the end he should see:
'(upgrade_35_36) Successfully upgraded'.

That host should now score 3400 points and the hosted-engine VM should automatically migrate there.
In order to check it:

[root@rhevh72 admin]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : rh68he20161115h1.localdomain
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 579062
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=579062 (Tue Nov 22 15:23:59 2016)
	host-id=1
	score=2400
	maintenance=False
	state=EngineDown


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : rh68he20161115h2.localdomain
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 578990
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=578990 (Tue Nov 22 15:24:01 2016)
	host-id=2
	score=2400
	maintenance=False
	state=EngineDown


--== Host 3 status ==--

Status up-to-date                  : True
Hostname                           : rhevh72.localdomain
Host ID                            : 3
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 09ed71ab
Host timestamp                     : 1245

Another sign that the upgrade was successfully is that under /etc/ovirt-hosted-engine/hosted-engine.conf we should find:
spUUID=00000000-0000-0000-0000-000000000000
and
conf_volume_UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
conf_image_UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
where 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' means any value.

If something went wrong, for any issue, the user can retrigger the upgrade procedure restarting ovirt-ha-agent on the affected host.

At this point the user can reinstall other hosts (one at a time) with el7, add rhev agent 3.6 repo there and redeploy hosted-engine on each of them.

After that (it's really important that the user moves to the next step only when the previous one is OK!!!), on each host, he has to find '(upgrade_35_36) Successfully upgraded' under /var/log/ovirt-hosted-engine-ha/agent.log 

At the end all the HE hosts should reach a score of 3400 points.
Only at this point the user has to:
- upgrade the engine to 3.6
- move the the cluster compatibility level to 3.6.
The engine should trigger the import of the hosted-engine storage domain.
If successfully, the user should see the hosted-engine storage domain into the engine as active.

Is really really import that the user moves to the next action if and only if all the previous steps are OK.

(Originally by Simone Tiraboschi)
Comment 5 rhev-integ 2016-12-12 05:04:27 EST
Simone,
Thank you.

I will update the article with this very valuable information!

However, we still need to find the right wording for the official docs that cover el7 hosts 3.5 to 3.6 upgrade. And this is what this bug is about.
I think for the official documentation, it would be enough to say that the user should check the UI, and if HE SD does not show up, they shoudl contact support.

(Originally by Marina Kalinin)
Comment 6 rhev-integ 2016-12-12 05:04:32 EST
Other than properly documenting this, we can also modify, for 3.6.10, the output of
 hosted-engine --vm-status
to report, for each host, if everything was OK with the upgrade process.

(Originally by Simone Tiraboschi)
Comment 7 rhev-integ 2016-12-12 05:04:36 EST
Simone, is it also correct, that if there is no other Data Domain in the DC, auto import would not happen?
This is probably only theoretical scenarios, but worth to mention.

(Originally by Marina Kalinin)
Comment 8 rhev-integ 2016-12-12 05:04:40 EST
(In reply to Simone Tiraboschi from comment #6)
> Other than properly documenting this, we can also modify, for 3.6.10, the
> output of
>  hosted-engine --vm-status
> to report, for each host, if everything was OK with the upgrade process.

This would be wonderful.
Do you want me to open a separate bug on this?

(Originally by Marina Kalinin)
Comment 10 rhev-integ 2016-12-12 05:04:49 EST
(In reply to Marina from comment #8)
> (In reply to Simone Tiraboschi from comment #6)
> > Other than properly documenting this, we can also modify, for 3.6.10, the
> > output of
> >  hosted-engine --vm-status
> > to report, for each host, if everything was OK with the upgrade process.
> 
> This would be wonderful.
> Do you want me to open a separate bug on this?

Yes, please

(Originally by Simone Tiraboschi)
Comment 17 rhev-integ 2016-12-12 05:05:21 EST
Oh, another relevant info:
the auto-import procedure in the engine just looks for a storage domain called 'hosted_engine' but in 3.4 and earlier 3.5 days the user could customize that name at setup time.

In that case he has also to run on the engine VM:

engine-config -s HostedEngineStorageDomainName={my_custom_name}
and than restart the engine otherwise the engine will never found and import the hosted-engine storage domain.

(Originally by Simone Tiraboschi)
Comment 18 rhev-integ 2016-12-12 05:05:26 EST
(In reply to Simone Tiraboschi from comment #17)
> Oh, another relevant info:
> the auto-import procedure in the engine just looks for a storage domain
> called 'hosted_engine' but in 3.4 and earlier 3.5 days the user could
> customize that name at setup time.
> 
> In that case he has also to run on the engine VM:
> 
> engine-config -s HostedEngineStorageDomainName={my_custom_name}
> and than restart the engine otherwise the engine will never found and import
> the hosted-engine storage domain.

Thanks! I assume it's because BZ1301105 was never backported to 3.6.

(Originally by Germano Veit Michel)
Comment 19 rhev-integ 2016-12-12 05:05:31 EST
(In reply to Germano Veit Michel from comment #18)
> > engine-config -s HostedEngineStorageDomainName={my_custom_name}
> > and than restart the engine otherwise the engine will never found and import
> > the hosted-engine storage domain.
> 
> Thanks! I assume it's because BZ1301105 was never backported to 3.6.

Yes, exactly, and in order to upgrade the engine VM to 4.0/el7, the hosted-engine storage domain should be correctly imported when on 3.6

(Originally by Simone Tiraboschi)
Comment 20 rhev-integ 2016-12-12 05:05:35 EST
Can we please get a short clear list of the requested changes?

(Originally by Yaniv Dary)
Comment 21 rhev-integ 2016-12-12 05:05:39 EST
(In reply to Yaniv Dary from comment #20)
> Can we please get a short clear list of the requested changes?

* Steps to Confirm HE SD was Imported
* Steps to Confirm HE SD was upgraded to 3.6 (ha 1.3.xx, conf volume...)

Down the road, if the 3.5 to 3.6 upgrade is not done done properly, we get quite troubled 3.6 to 4.0 Upgrades. See BZ #1400800.

(Originally by Germano Veit Michel)
Comment 22 rhev-integ 2016-12-12 05:05:44 EST
(In reply to Germano Veit Michel from comment #21)
> (In reply to Yaniv Dary from comment #20)
> > Can we please get a short clear list of the requested changes?
> 
> * Steps to Confirm HE SD was Imported

This is quite/too complex from ovirt-ha-agent point of view since a proper fix will require to check the status of the hosted-engine storage domain in the engine over the API but: the engine could be down, currently we don't store any API credentials at ovirt-ha-agent side

> * Steps to Confirm HE SD was upgraded to 3.6 (ha 1.3.xx, conf volume...)

for each host, we could add a a couple of additional lines under the Extra metadata section in the output of hosted-engine --vm-status

(Originally by Simone Tiraboschi)
Comment 23 rhev-integ 2016-12-12 05:05:49 EST
(In reply to Simone Tiraboschi from comment #22)
> (In reply to Germano Veit Michel from comment #21)
> > (In reply to Yaniv Dary from comment #20)
> > > Can we please get a short clear list of the requested changes?
> > 
> > * Steps to Confirm HE SD was Imported
> 
> This is quite/too complex from ovirt-ha-agent point of view since a proper
> fix will require to check the status of the hosted-engine storage domain in
> the engine over the API but: the engine could be down, currently we don't
> store any API credentials at ovirt-ha-agent side

Why don't we check the OVFs? If it's imported the OVFs will be there. And we already to something very similar when extracting vm.conf.

> 
> > * Steps to Confirm HE SD was upgraded to 3.6 (ha 1.3.xx, conf volume...)
> 
> for each host, we could add a a couple of additional lines under the Extra
> metadata section in the output of hosted-engine --vm-status

Nice!

(Originally by Germano Veit Michel)
Comment 24 rhev-integ 2016-12-12 05:05:54 EST
Simone, I don't see this getting into 3.6.10. Postpone to 3.6.11?

(Originally by Yaniv Kaul)
Comment 25 rhev-integ 2016-12-12 05:05:59 EST
The relevant patch has already been merged on master (not sure why the gerrit hook didn't triggered), it's just about back-porting and verifying it.

(Originally by Simone Tiraboschi)
Comment 28 Artyom 2017-02-28 07:48:06 EST
Verified on 
# rpm -qa | grep hosted
ovirt-hosted-engine-ha-2.1.0.2-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.2-1.el7ev.noarch

# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : cyan-vdsf.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 5f945a94
local_conf_timestamp               : 3030979
Host timestamp                     : 3030961
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3030961 (Tue Feb 28 14:46:00 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=3030979 (Tue Feb 28 14:46:17 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False
Comment 29 Artyom 2017-02-28 14:41:07 EST
Verified on correct version 
# rpm -qa | grep hosted
ovirt-hosted-engine-setup-2.0.4.3-2.el7ev.noarch
ovirt-hosted-engine-ha-2.0.7-2.el7ev.noarch


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : cyan-vdsf.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : ab52e2b8
local_conf_timestamp               : 0
Host timestamp                     : 3055736
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3055736 (Tue Feb 28 21:38:55 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=0 (Thu Jan  1 02:00:00 1970)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineStart
        stopped=False
Comment 31 errata-xmlrpc 2017-03-16 11:28:53 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0541.html

Note You need to log in before you can comment on or make changes to this bug.