| Summary: | [z-stream clone - 4.0.7] modify output of the hosted engine CLI to show info on auto import process | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | rhev-integ | |
| Component: | ovirt-hosted-engine-ha | Assignee: | Simone Tiraboschi <stirabos> | |
| Status: | CLOSED ERRATA | QA Contact: | Artyom <alukiano> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.6.9 | CC: | alukiano, didi, gklein, gveitmic, lsurette, melewis, mkalinin, molasaga, rbalakri, srevivo, stirabos, ykaul, ylavi | |
| Target Milestone: | ovirt-4.0.7 | Keywords: | Triaged, ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| URL: | https://www.ovirt.org/documentation/how-to/hosted-engine-host-OS-upgrade/ | |||
| Whiteboard: | integration | |||
| Fixed In Version: | Doc Type: | Enhancement | ||
| Doc Text: |
With this update, the output of hosted-engine --vm-status has been modified to show if the configuration and the virtual machine specification has been correctly read from the shared storage on each reported host. Since Red Hat Enterprise Virtualization 3.6 ovirt-ha-agent has read the configuration and the virtual machine specification from the shared storage, whereas, until Red Hat Enterprise Virtualization 3.5 the configuration and virtual machine specification were local files replicated on each involved host.
|
Story Points: | --- | |
| Clone Of: | 1396672 | |||
| : | 1403750 (view as bug list) | Environment: | ||
| Last Closed: | 2017-03-16 15:28:53 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | 1396672 | |||
| Bug Blocks: | 1403750 | |||
|
Description
rhev-integ
2016-12-12 10:04:02 UTC
Simone, can you please review? (Originally by Marina Kalinin) I'm re-checking https://access.redhat.com/solutions/2351141 The central point is how, for the user, to be sure that the upgrade procedure really upgraded since it's not interactive but just triggered by the upgrade of the RHEV-H 3.5/el7 host to RHEV-H 3.6/el7. The best strategy is to grep /var/log/ovirt-hosted-engine-ha/agent.log on that host for '(upgrade_35_36) Successfully upgraded'. The upgrade procedure should be pretty stable but it requires some attention to be sure that it worked as expected. For instance it will work if, and only if, that host is in maintenance mode at engine eyes. So, if the user finds something like: (upgrade_35_36) Unable to upgrade while not in maintenance mode: please put this host into maintenance mode from the engine, and manually restart this service when ready under /var/log/ovirt-hosted-engine-ha/agent.log, he has to put that host into maintenance mode from the engine and eventually then manually restart ovirt-ha-agent on that host (systemd will try just 10 times in a row, so the user has to manually restart it if he wasn't fast enough). At the end he should see: '(upgrade_35_36) Successfully upgraded'. That host should now score 3400 points and the hosted-engine VM should automatically migrate there. In order to check it: [root@rhevh72 admin]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : rh68he20161115h1.localdomain Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 2400 Local maintenance : False Host timestamp : 579062 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=579062 (Tue Nov 22 15:23:59 2016) host-id=1 score=2400 maintenance=False state=EngineDown --== Host 2 status ==-- Status up-to-date : True Hostname : rh68he20161115h2.localdomain Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 2400 Local maintenance : False Host timestamp : 578990 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=578990 (Tue Nov 22 15:24:01 2016) host-id=2 score=2400 maintenance=False state=EngineDown --== Host 3 status ==-- Status up-to-date : True Hostname : rhevh72.localdomain Host ID : 3 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 09ed71ab Host timestamp : 1245 Another sign that the upgrade was successfully is that under /etc/ovirt-hosted-engine/hosted-engine.conf we should find: spUUID=00000000-0000-0000-0000-000000000000 and conf_volume_UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx conf_image_UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx where 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' means any value. If something went wrong, for any issue, the user can retrigger the upgrade procedure restarting ovirt-ha-agent on the affected host. At this point the user can reinstall other hosts (one at a time) with el7, add rhev agent 3.6 repo there and redeploy hosted-engine on each of them. After that (it's really important that the user moves to the next step only when the previous one is OK!!!), on each host, he has to find '(upgrade_35_36) Successfully upgraded' under /var/log/ovirt-hosted-engine-ha/agent.log At the end all the HE hosts should reach a score of 3400 points. Only at this point the user has to: - upgrade the engine to 3.6 - move the the cluster compatibility level to 3.6. The engine should trigger the import of the hosted-engine storage domain. If successfully, the user should see the hosted-engine storage domain into the engine as active. Is really really import that the user moves to the next action if and only if all the previous steps are OK. (Originally by Simone Tiraboschi) Simone, Thank you. I will update the article with this very valuable information! However, we still need to find the right wording for the official docs that cover el7 hosts 3.5 to 3.6 upgrade. And this is what this bug is about. I think for the official documentation, it would be enough to say that the user should check the UI, and if HE SD does not show up, they shoudl contact support. (Originally by Marina Kalinin) Other than properly documenting this, we can also modify, for 3.6.10, the output of hosted-engine --vm-status to report, for each host, if everything was OK with the upgrade process. (Originally by Simone Tiraboschi) Simone, is it also correct, that if there is no other Data Domain in the DC, auto import would not happen? This is probably only theoretical scenarios, but worth to mention. (Originally by Marina Kalinin) (In reply to Simone Tiraboschi from comment #6) > Other than properly documenting this, we can also modify, for 3.6.10, the > output of > hosted-engine --vm-status > to report, for each host, if everything was OK with the upgrade process. This would be wonderful. Do you want me to open a separate bug on this? (Originally by Marina Kalinin) (In reply to Marina from comment #8) > (In reply to Simone Tiraboschi from comment #6) > > Other than properly documenting this, we can also modify, for 3.6.10, the > > output of > > hosted-engine --vm-status > > to report, for each host, if everything was OK with the upgrade process. > > This would be wonderful. > Do you want me to open a separate bug on this? Yes, please (Originally by Simone Tiraboschi) Oh, another relevant info:
the auto-import procedure in the engine just looks for a storage domain called 'hosted_engine' but in 3.4 and earlier 3.5 days the user could customize that name at setup time.
In that case he has also to run on the engine VM:
engine-config -s HostedEngineStorageDomainName={my_custom_name}
and than restart the engine otherwise the engine will never found and import the hosted-engine storage domain.
(Originally by Simone Tiraboschi)
(In reply to Simone Tiraboschi from comment #17) > Oh, another relevant info: > the auto-import procedure in the engine just looks for a storage domain > called 'hosted_engine' but in 3.4 and earlier 3.5 days the user could > customize that name at setup time. > > In that case he has also to run on the engine VM: > > engine-config -s HostedEngineStorageDomainName={my_custom_name} > and than restart the engine otherwise the engine will never found and import > the hosted-engine storage domain. Thanks! I assume it's because BZ1301105 was never backported to 3.6. (Originally by Germano Veit Michel) (In reply to Germano Veit Michel from comment #18) > > engine-config -s HostedEngineStorageDomainName={my_custom_name} > > and than restart the engine otherwise the engine will never found and import > > the hosted-engine storage domain. > > Thanks! I assume it's because BZ1301105 was never backported to 3.6. Yes, exactly, and in order to upgrade the engine VM to 4.0/el7, the hosted-engine storage domain should be correctly imported when on 3.6 (Originally by Simone Tiraboschi) Can we please get a short clear list of the requested changes? (Originally by Yaniv Dary) (In reply to Yaniv Dary from comment #20) > Can we please get a short clear list of the requested changes? * Steps to Confirm HE SD was Imported * Steps to Confirm HE SD was upgraded to 3.6 (ha 1.3.xx, conf volume...) Down the road, if the 3.5 to 3.6 upgrade is not done done properly, we get quite troubled 3.6 to 4.0 Upgrades. See BZ #1400800. (Originally by Germano Veit Michel) (In reply to Germano Veit Michel from comment #21) > (In reply to Yaniv Dary from comment #20) > > Can we please get a short clear list of the requested changes? > > * Steps to Confirm HE SD was Imported This is quite/too complex from ovirt-ha-agent point of view since a proper fix will require to check the status of the hosted-engine storage domain in the engine over the API but: the engine could be down, currently we don't store any API credentials at ovirt-ha-agent side > * Steps to Confirm HE SD was upgraded to 3.6 (ha 1.3.xx, conf volume...) for each host, we could add a a couple of additional lines under the Extra metadata section in the output of hosted-engine --vm-status (Originally by Simone Tiraboschi) (In reply to Simone Tiraboschi from comment #22) > (In reply to Germano Veit Michel from comment #21) > > (In reply to Yaniv Dary from comment #20) > > > Can we please get a short clear list of the requested changes? > > > > * Steps to Confirm HE SD was Imported > > This is quite/too complex from ovirt-ha-agent point of view since a proper > fix will require to check the status of the hosted-engine storage domain in > the engine over the API but: the engine could be down, currently we don't > store any API credentials at ovirt-ha-agent side Why don't we check the OVFs? If it's imported the OVFs will be there. And we already to something very similar when extracting vm.conf. > > > * Steps to Confirm HE SD was upgraded to 3.6 (ha 1.3.xx, conf volume...) > > for each host, we could add a a couple of additional lines under the Extra > metadata section in the output of hosted-engine --vm-status Nice! (Originally by Germano Veit Michel) Simone, I don't see this getting into 3.6.10. Postpone to 3.6.11? (Originally by Yaniv Kaul) The relevant patch has already been merged on master (not sure why the gerrit hook didn't triggered), it's just about back-porting and verifying it. (Originally by Simone Tiraboschi) Verified on
# rpm -qa | grep hosted
ovirt-hosted-engine-ha-2.1.0.2-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.2-1.el7ev.noarch
# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : cyan-vdsf.qa.lab.tlv.redhat.com
Host ID : 1
Engine status : {"health": "good", "vm": "up", "detail": "up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 5f945a94
local_conf_timestamp : 3030979
Host timestamp : 3030961
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3030961 (Tue Feb 28 14:46:00 2017)
host-id=1
score=3400
vm_conf_refresh_time=3030979 (Tue Feb 28 14:46:17 2017)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
Verified on correct version
# rpm -qa | grep hosted
ovirt-hosted-engine-setup-2.0.4.3-2.el7ev.noarch
ovirt-hosted-engine-ha-2.0.7-2.el7ev.noarch
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : cyan-vdsf.qa.lab.tlv.redhat.com
Host ID : 1
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : ab52e2b8
local_conf_timestamp : 0
Host timestamp : 3055736
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3055736 (Tue Feb 28 21:38:55 2017)
host-id=1
score=3400
vm_conf_refresh_time=0 (Thu Jan 1 02:00:00 1970)
conf_on_shared_storage=True
maintenance=False
state=EngineStart
stopped=False
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0541.html |