Bug 1481680
Summary: | hosted-engine --upgrade-appliance fails with KeyError: 'stopped' if the metadata area contains references to 3.5 decommissioned hosts | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Marian Jankular <mjankula> | |
Component: | ovirt-hosted-engine-setup | Assignee: | Simone Tiraboschi <stirabos> | |
Status: | CLOSED ERRATA | QA Contact: | Artyom <alukiano> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.1.5 | CC: | bburmest, lsurette, pstehlik, ykaul, ylavi | |
Target Milestone: | ovirt-4.2.0 | Keywords: | Triaged, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Upgrading a hosted engine to 4.0 would fail if references to version 3.5 hosts still existed in the metadata volume of the engine. The user is now warned when this is the case.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1486579 (view as bug list) | Environment: | ||
Last Closed: | 2018-05-15 17:32:28 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1458709, 1486579 |
Description
Marian Jankular
2017-08-15 12:45:19 UTC
Let's try to recap: 2017-08-14 16:17:45 DEBUG otopi.ovirt_hosted_engine_setup.domains domains.check_available_space:116 Available space on /var/tmp is 42784Mb 2017-08-14 16:17:45 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2017-08-14 16:17:45 DEBUG otopi.context context.dumpEnvironment:770 ENV OVEHOSTED_STORAGE/ovfSizeGB=int:'50' 2017-08-14 16:17:45 DEBUG otopi.context context.dumpEnvironment:770 ENV OVEHOSTED_STORAGE/qcowSizeGB=int:'4' The OVF image is 4 GB and in /var/tmp we have 42784Mb free so there is enough space to extract there the image from the OVA archive. On the hosted-engine SD instead you have 2017-08-14 16:18:07 DEBUG otopi.plugins.gr_he_upgradeappliance.engine.misc misc._check_sd_and_disk_space:202 Successfully connected to the engine 2017-08-14 16:18:07 DEBUG otopi.plugins.gr_he_upgradeappliance.engine.misc misc._check_sd_and_disk_space:211 availalbe: 141733920768 Which are 132 GB so no issue there and indeed: 2017-08-14 16:18:07 INFO otopi.plugins.gr_he_upgradeappliance.engine.misc misc._check_sd_and_disk_space:236 The hosted-engine storage domain has enough free space to contain a new backup disk. The appliance disk is now sized at 50 Gb, we can grow it on the fly but we cannot shrink. The warning was about the size of the disk of the 3.6 engine VM that was at 40 GB while the new 4.0 appliance requires 50GB but the setup can grow it and so we have just a warning: 2017-08-14 16:18:07 WARNING otopi.plugins.gr_he_upgradeappliance.engine.misc misc._check_sd_and_disk_space:252 On the hosted-engine disk there is not enough available space to fit the new appliance disk: required 50GiB - available 40GiB. 2017-08-14 16:18:07 DEBUG otopi.plugins.otopi.dialog.human human.queryString:145 query UPGRADE_DISK_RESIZE_PROCEED 2017-08-14 16:18:07 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND This upgrade tool can resize the hosted-engine VM disk; before resizing a backup will be created. 2017-08-14 16:18:07 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Are you sure you want to continue? (Yes, No)[Yes]: And the user accepted to have the setup growing the VM disk automatically. No issue up to now. The issue is instead here trying to validate the status of the hosted-engine hosts from the metadata area on the shared storage: 2017-08-14 16:19:15 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-upgradeappliance/core/misc.py", line 303, in _validata_lm_volumes stopped = status['all_host_stats'][h]['stopped'] KeyError: 'stopped' And indeed from the logs we can see: 2: {'engine-status': '{"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}', 'extra': 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=2515837 (Fri Mar 10 13:46:56 2017)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 'host-id': 2, 'host-ts': 2515837, 'hostname': '****02.******.**', 'live-data': False, 'maintenance': False, 'score': 2400}, while we have instead something like this for 3.6 hosts: 9: {'conf_on_shared_storage': False, 'crc32': 'd96e718b', 'engine-status': '{"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}', 'extra': 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=10106154 (Mon Aug 14 16:19:08 2017)\nhost-id=9\nscore=3400\nvm_conf_refresh_time=10106160 (Mon Aug 14 16:19:15 2017)\nconf_on_shared_storage=False\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n', 'host-id': 9, 'host-ts': 10106154, 'hostname': '***09.******.**', 'live-data': True, 'local_conf_timestamp': 10106160, 'maintenance': False, 'score': 3400, 'stopped': False}, So there was still a reference to host 02 in metadata area on the shared storage but its structure was still in 3.5 shape missing 'stopped' attribute and so this issue. We check the datacenter and cluster level from the engine but here everything was fine there: 2017-08-14 16:18:11 DEBUG otopi.plugins.gr_he_upgradeappliance.engine.misc misc._check_upgrade_requirements:315 Successfully connected to the engine 2017-08-14 16:18:11 INFO otopi.plugins.gr_he_upgradeappliance.engine.misc misc._check_upgrade_requirements:344 All the datacenters and clusters are at a compatible level The 3.5 hosts have probably been removed from the engine but they are still present in the metadata area on the shared storage and so this issue. Workaround: run hosted-engine --vm-status and, one by one, remove all the decommissioned hosts with: hosted-engine --clean-metadata --host-id=<id> Verified on ovirt-hosted-engine-setup-2.2.1-1.el7ev.noarch [ INFO ] The hosted-engine storage domain has enough free space to contain a new backup disk. [ INFO ] Checking version requirements [ INFO ] Checking metadata area [ ERROR ] Metadata for host alma05.qa.lab.tlv.redhat.com is incompatible with this tool. Before proceeding with this upgrade, please correctly upgrade it to 3.6 or clean its metadata area with 'hosted-engine --clean-metadata --host-id=2' if decommissioned or not anymore involved in HE. [ ERROR ] Failed to execute stage 'Environment customization': Host with unsupported metadata area [ INFO ] Stage: Clean up [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine upgrade failed Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20171214101416-syfq5n.log Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1471 BZ<2>Jira Resync |