Bug 1361511
| Summary: | During host upgrade Upgrade process terminated info message shown | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Lukas Svaty <lsvaty> | ||||||||||
| Component: | Backend.Core | Assignee: | Ravi Nori <rnori> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> | ||||||||||
| Severity: | low | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 4.0.2.1 | CC: | bugs, mperina, oourfali, rnori | ||||||||||
| Target Milestone: | ovirt-4.1.0-alpha | Keywords: | Triaged | ||||||||||
| Target Release: | 4.1.0.2 | Flags: | rule-engine:
ovirt-4.1+
rule-engine: planning_ack+ mperina: devel_ack+ mavital: testing_ack+ |
||||||||||
| Hardware: | All | ||||||||||||
| OS: | All | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2017-03-16 14:50:04 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | 1398443, 1403956, 1406001, 1406527, 1406778, 1409203, 1416023 | ||||||||||||
| Bug Blocks: | |||||||||||||
| Attachments: |
|
||||||||||||
Ravi, could you please investigate? Moving to 4.1 for now, if needed it can be backported to 4.0.z The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified. Dec 27, 2016 4:19:29 PM
Failed to upgrade Host puma18.scl.lab.tlv.redhat.com (User: admin@internal-authz).
After the upgrade, host stays in local maintenance, although seen in UI as down.
puma19 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma18.scl.lab.tlv.redhat.com
Host ID : 1
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 0
stopped : False
Local maintenance : True
crc32 : 97891d50
local_conf_timestamp : 6309
Host timestamp : 6297
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=6297 (Tue Dec 27 16:28:30 2016)
host-id=1
score=0
vm_conf_refresh_time=6309 (Tue Dec 27 16:28:42 2016)
conf_on_shared_storage=True
maintenance=True
state=LocalMaintenance
stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma19.scl.lab.tlv.redhat.com
Host ID : 2
Engine status : {"health": "good", "vm": "up", "detail": "up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 81fbb545
local_conf_timestamp : 4828
Host timestamp : 4815
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=4815 (Tue Dec 27 16:28:19 2016)
host-id=2
score=3400
vm_conf_refresh_time=4828 (Tue Dec 27 16:28:31 2016)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
Manual intervention on host puma18 with "hosted-engine --set-maintenance --mode=none" successfully removed it from local maintenance and it went to up in CLI, but still appears as down in WEBADMIN.
puma19 ~]# hosted-engine --vm-status
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma18.scl.lab.tlv.redhat.com
Host ID : 1
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 17058747
local_conf_timestamp : 6574
Host timestamp : 6562
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=6562 (Tue Dec 27 16:32:55 2016)
host-id=1
score=3400
vm_conf_refresh_time=6574 (Tue Dec 27 16:33:07 2016)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : puma19.scl.lab.tlv.redhat.com
Host ID : 2
Engine status : {"health": "good", "vm": "up", "detail": "up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 0c4593ad
local_conf_timestamp : 5098
Host timestamp : 5085
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=5085 (Tue Dec 27 16:32:49 2016)
host-id=2
score=3400
vm_conf_refresh_time=5098 (Tue Dec 27 16:33:01 2016)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
Clearing cache and history in WEB browser did not helped.
I've set the host in to maintenance via WEBADMIN and then activated it back, only then host came back to active in UI. Host's symbol of "upgrade is available" was still shown near the host.
Components on engine:
ovirt-engine-setup-plugin-ovirt-engine-common-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-imageio-proxy-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-iso-uploader-4.1.0-0.0.master.20160909154152.git14502bd.el7.centos.noarch
ovirt-engine-userportal-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-dbscripts-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-extensions-api-impl-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
python-ovirt-engine-sdk4-4.1.0-0.1.a0.20161215git77fce51.el7.centos.x86_64
ovirt-host-deploy-java-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.1.2-1.el7.noarch
ovirt-engine-dwh-setup-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-imageio-proxy-setup-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-engine-tools-backup-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-websocket-proxy-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-backend-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-tools-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-webadmin-portal-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-restapi-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-vmconsole-proxy-helper-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-wildfly-overlay-10.0.0-1.el7.noarch
ovirt-engine-cli-3.6.9.2-1.el7.centos.noarch
ovirt-web-ui-0.1.1-2.el7.centos.x86_64
ovirt-engine-setup-base-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7.centos.noarch
ovirt-engine-dwh-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-hosts-ansible-inventory-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-dashboard-1.1.0-0.4.20161128git5ed6f96.el7.centos.noarch
ovirt-engine-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-guest-agent-common-1.0.13-1.20161220085008.git165fff1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.centos.noarch
ovirt-engine-wildfly-10.1.0-1.el7.x86_64
ovirt-engine-lib-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-vmconsole-proxy-1.0.4-1.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (builder.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Dec 6 23:06:41 UTC 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.3.1611 (Core)
Components on puma18 (host which I've tried to upgrade via the engine):
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20161221071755.git46cacd3.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-hosted-engine-ha-2.1.0-0.0.master.20161221070856.20161221070854.git387fa53.el7.centos.noarch
ovirt-engine-appliance-4.1-20161222.1.el7.centos.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
vdsm-4.18.999-1218.gitd36143e.el7.centos.x86_64
ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)
Components on puma19 (host on which engine was running and it was upgraded to latest bits and appeared without "Upgrade is available" symbol):
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20161221071755.git46cacd3.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-hosted-engine-ha-2.1.0-0.0.master.20161221070856.20161221070854.git387fa53.el7.centos.noarch
rhevm-appliance-20161116.0-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
vdsm-4.18.999-1218.gitd36143e.el7.centos.x86_64
ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Created attachment 1235523 [details]
puma18 sosreport
Created attachment 1235524 [details]
puma19 sosreport
Created attachment 1235525 [details]
logs from the engine
Nikolai, according to logs you have installed oVirt on RHEL hosts with RHV repositories enabled, that's why upgrade failed due to conflicts between qemu-kvm-ev (oVirt) and qemu-kvm-rhev (RHV). If you want to install oVirt on RHEL you need to clean RHEL without any RHV related repositories (just base and optional channels enabled). Anyway raising the error and failing host upgrade in engine is valid. This bug is only about fixing host upgrade code, when we improperly detected failure of moving host status to maintenance when the host is already in maintenance status (that's why we displayed confusing message "Host xxx failed to move to maintenance mode. Upgrade process is terminated." although upgrade process continued as normal). So I'm moving the bug back on ON_QA No error detected in engine.log, upgrade got finished successfully. Works for me on these components on hosts: libvirt-client-2.0.0-10.el7_3.4.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64 rhevm-appliance-20160721.0-2.el7ev.noarch mom-0.5.9-1.el7ev.noarch ovirt-hosted-engine-setup-2.1.0.3-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch vdsm-4.19.6-1.el7ev.x86_64 ovirt-host-deploy-1.6.0-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch ovirt-imageio-daemon-1.0.0-0.el7ev.noarch ovirt-setup-lib-1.1.0-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.0.3-1.el7ev.noarch Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) On engine: rhev-guest-tools-iso-4.1-3.el7ev.noarch rhevm-dependencies-4.1.0-1.el7ev.noarch rhevm-doc-4.1.0-2.el7ev.noarch rhevm-branding-rhev-4.1.0-1.el7ev.noarch rhevm-setup-plugins-4.1.0-1.el7ev.noarch rhevm-4.1.1.2-0.1.el7.noarch Linux version 3.10.0-514.6.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Feb 17 19:21:31 EST 2017 Linux 3.10.0-514.6.2.el7.x86_64 #1 SMP Fri Feb 17 19:21:31 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) |
Created attachment 1185418 [details] See line 31 Description of problem: During upgrade process of host via engine this log message appears: 2016-07-29 08:49:35,653 INFO [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback] (DefaultQuartzScheduler2) [2a46ccda] Host 'red' failed to move to maintenance mode. Upgrade process is terminated. This happens during upgrade of packages before Yum says - Downloading Packages Version-Release number of selected component (if applicable): ovirt-engine-backend-4.0.2.1-0.1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Have old version host 2. In WA portal cluck upgrade 3. See engine.log Actual results: Present INFO message about failed upgrade on successfull try. Expected results: If this message appears interrupt the upgrade process, dont continue and put it as ERROR message, however I believe this should not be present. Additional info: adding log see line 31