Bug 1361511 - During host upgrade Upgrade process terminated info message shown
Summary: During host upgrade Upgrade process terminated info message shown
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.0.2.1
Hardware: All
OS: All
unspecified
low
Target Milestone: ovirt-4.1.0-alpha
: 4.1.0.2
Assignee: Ravi Nori
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1398443 1403956 1406001 1406527 1406778 1409203 1416023
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-29 08:56 UTC by Lukas Svaty
Modified: 2017-03-16 14:50 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-03-16 14:50:04 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: planning_ack+
mperina: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
See line 31 (51.39 KB, text/plain)
2016-07-29 08:56 UTC, Lukas Svaty
no flags Details
puma18 sosreport (9.09 MB, application/x-xz)
2016-12-27 14:44 UTC, Nikolai Sednev
no flags Details
puma19 sosreport (8.59 MB, application/x-xz)
2016-12-27 14:45 UTC, Nikolai Sednev
no flags Details
logs from the engine (565.35 KB, application/x-gzip)
2016-12-27 14:50 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 61990 0 master MERGED engine : Host upgrade shows process terminated message 2020-02-09 09:34:17 UTC

Description Lukas Svaty 2016-07-29 08:56:50 UTC
Created attachment 1185418 [details]
See line 31

Description of problem:
During upgrade process of host via engine this log message appears:

2016-07-29 08:49:35,653 INFO  [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback] (DefaultQuartzScheduler2) [2a46ccda] Host 'red' failed to move to maintenance mode. Upgrade process is terminated.

This happens during upgrade of packages before Yum says - Downloading Packages

Version-Release number of selected component (if applicable):
ovirt-engine-backend-4.0.2.1-0.1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Have old version host
2. In WA portal cluck upgrade
3. See engine.log

Actual results:
Present INFO message about failed upgrade on successfull try.

Expected results:
If this message appears interrupt the upgrade process, dont continue and put it as ERROR message, however I believe this should not be present.
Additional info:
adding log see line 31

Comment 1 Martin Perina 2016-08-02 10:20:20 UTC
Ravi, could you please investigate?

Comment 2 Martin Perina 2016-08-08 12:12:46 UTC
Moving to 4.1 for now, if needed it can be backported to 4.0.z

Comment 3 Sandro Bonazzola 2016-12-12 14:02:03 UTC
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.

Comment 4 Nikolai Sednev 2016-12-27 14:43:03 UTC
Dec 27, 2016 4:19:29 PM	
Failed to upgrade Host puma18.scl.lab.tlv.redhat.com (User: admin@internal-authz).

After the upgrade, host stays in local maintenance, although seen in UI as down.
puma19 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : puma18.scl.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : 97891d50
local_conf_timestamp               : 6309
Host timestamp                     : 6297
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=6297 (Tue Dec 27 16:28:30 2016)
        host-id=1
        score=0
        vm_conf_refresh_time=6309 (Tue Dec 27 16:28:42 2016)
        conf_on_shared_storage=True
        maintenance=True
        state=LocalMaintenance
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : puma19.scl.lab.tlv.redhat.com
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 81fbb545
local_conf_timestamp               : 4828
Host timestamp                     : 4815
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=4815 (Tue Dec 27 16:28:19 2016)
        host-id=2
        score=3400
        vm_conf_refresh_time=4828 (Tue Dec 27 16:28:31 2016)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False
Manual intervention on host puma18 with "hosted-engine --set-maintenance --mode=none" successfully removed it from local maintenance and it went to up in CLI, but still appears as down in WEBADMIN.
puma19 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : puma18.scl.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 17058747
local_conf_timestamp               : 6574
Host timestamp                     : 6562
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=6562 (Tue Dec 27 16:32:55 2016)
        host-id=1
        score=3400
        vm_conf_refresh_time=6574 (Tue Dec 27 16:33:07 2016)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : puma19.scl.lab.tlv.redhat.com
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 0c4593ad
local_conf_timestamp               : 5098
Host timestamp                     : 5085
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=5085 (Tue Dec 27 16:32:49 2016)
        host-id=2
        score=3400
        vm_conf_refresh_time=5098 (Tue Dec 27 16:33:01 2016)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False
Clearing cache and history in WEB browser did not helped.

I've set the host in to maintenance via WEBADMIN and then activated it back, only then host came back to active in UI. Host's symbol of "upgrade is available" was still shown near the host.

Components on engine:
ovirt-engine-setup-plugin-ovirt-engine-common-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-imageio-proxy-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-iso-uploader-4.1.0-0.0.master.20160909154152.git14502bd.el7.centos.noarch
ovirt-engine-userportal-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-dbscripts-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-extensions-api-impl-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
python-ovirt-engine-sdk4-4.1.0-0.1.a0.20161215git77fce51.el7.centos.x86_64
ovirt-host-deploy-java-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.1.2-1.el7.noarch
ovirt-engine-dwh-setup-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-imageio-proxy-setup-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-engine-tools-backup-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-websocket-proxy-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-backend-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-tools-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-webadmin-portal-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-restapi-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-vmconsole-proxy-helper-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-wildfly-overlay-10.0.0-1.el7.noarch
ovirt-engine-cli-3.6.9.2-1.el7.centos.noarch
ovirt-web-ui-0.1.1-2.el7.centos.x86_64
ovirt-engine-setup-base-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7.centos.noarch
ovirt-engine-dwh-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-hosts-ansible-inventory-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-dashboard-1.1.0-0.4.20161128git5ed6f96.el7.centos.noarch
ovirt-engine-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-guest-agent-common-1.0.13-1.20161220085008.git165fff1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.centos.noarch
ovirt-engine-wildfly-10.1.0-1.el7.x86_64
ovirt-engine-lib-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-vmconsole-proxy-1.0.4-1.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (builder.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Dec 6 23:06:41 UTC 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.3.1611 (Core) 

Components on puma18 (host which I've tried to upgrade via the engine):
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20161221071755.git46cacd3.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-hosted-engine-ha-2.1.0-0.0.master.20161221070856.20161221070854.git387fa53.el7.centos.noarch
ovirt-engine-appliance-4.1-20161222.1.el7.centos.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
vdsm-4.18.999-1218.gitd36143e.el7.centos.x86_64
ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Components on puma19 (host on which engine was running and it was upgraded to latest bits and appeared without "Upgrade is available" symbol):
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20161221071755.git46cacd3.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-hosted-engine-ha-2.1.0-0.0.master.20161221070856.20161221070854.git387fa53.el7.centos.noarch
rhevm-appliance-20161116.0-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
vdsm-4.18.999-1218.gitd36143e.el7.centos.x86_64
ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Comment 5 Red Hat Bugzilla Rules Engine 2016-12-27 14:43:09 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 6 Nikolai Sednev 2016-12-27 14:44:08 UTC
Created attachment 1235523 [details]
puma18 sosreport

Comment 7 Nikolai Sednev 2016-12-27 14:45:02 UTC
Created attachment 1235524 [details]
puma19 sosreport

Comment 8 Nikolai Sednev 2016-12-27 14:50:34 UTC
Created attachment 1235525 [details]
logs from the engine

Comment 9 Martin Perina 2017-01-05 06:00:45 UTC
Nikolai, according to logs you have installed oVirt on RHEL hosts with RHV repositories enabled, that's why upgrade failed due to conflicts between qemu-kvm-ev (oVirt) and qemu-kvm-rhev (RHV). If you want to install oVirt on RHEL you need to clean RHEL without any RHV related repositories (just base and optional channels enabled).

Anyway raising the error and failing host upgrade in engine is valid. This bug is only about fixing host upgrade code, when we improperly detected failure of moving host status to maintenance when the host is already in maintenance status (that's why we displayed confusing message "Host xxx failed to move to maintenance mode. Upgrade process is terminated." although upgrade process continued as normal). So I'm moving the bug back on ON_QA

Comment 10 Nikolai Sednev 2017-03-05 20:26:46 UTC
No error detected in engine.log, upgrade got finished successfully.
Works for me on these components on hosts:
libvirt-client-2.0.0-10.el7_3.4.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64
rhevm-appliance-20160721.0-2.el7ev.noarch
mom-0.5.9-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.3-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
vdsm-4.19.6-1.el7ev.x86_64
ovirt-host-deploy-1.6.0-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
ovirt-hosted-engine-ha-2.1.0.3-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

On engine:
rhev-guest-tools-iso-4.1-3.el7ev.noarch
rhevm-dependencies-4.1.0-1.el7ev.noarch
rhevm-doc-4.1.0-2.el7ev.noarch
rhevm-branding-rhev-4.1.0-1.el7ev.noarch
rhevm-setup-plugins-4.1.0-1.el7ev.noarch
rhevm-4.1.1.2-0.1.el7.noarch
Linux version 3.10.0-514.6.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Feb 17 19:21:31 EST 2017
Linux 3.10.0-514.6.2.el7.x86_64 #1 SMP Fri Feb 17 19:21:31 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)


Note You need to log in before you can comment on or make changes to this bug.