Description of problem: The HE 3.5 -> 3.6 upgrade procedure is planned to be executed on 3.6 but it has specific requirements to trigger (host in maintenance mode for the engine...) and so nothing really ensures that it got correctly executed before the user upgrades his host to 4.0 or even 4.1. In 4.1 we completed the move to jsonrpc, ensure that the upgrade procedure is still working if executed on 4.1 with jsonrpc. Version-Release number of selected component (if applicable): ovirt-hosted-engine-ha-2.1.0.3 How reproducible: 100% Steps to Reproduce: The simplest path is: 1. installing upstream HE 3.5 running hosted-engine-setup on an el7 host 2. put the host into maintenance mode from the engine 3. upgrade it with rpms from 4.1 upstream repo Actual results: it fails Expected results: it successfully upgrade the HE cluster to 3.6 level: check for '(upgrade_35_36) Successfully upgraded' ovirt-ha-agent logs Additional info:
The scenario is pretty uncommon but we got reports from an upstream user on the #ovirt IRC channel.
If upgrading el7.3 host 3.5->4.1 and then trying to attach it back to 3.5 compatible host cluster running under 3.6 engine, then it fails with: "Host puma18 is compatible with versions (3.6,4.0,4.1) and cannot join Cluster Default which is set to version 3.5." 4.1 host should not join 3.5 host cluster.
(In reply to Nikolai Sednev from comment #2) > If upgrading el7.3 host 3.5->4.1 and then trying to attach it back to 3.5 > compatible host cluster running under 3.6 engine, then it fails with: > "Host puma18 is compatible with versions (3.6,4.0,4.1) and cannot join > Cluster Default which is set to version 3.5." > 4.1 host should not join 3.5 host cluster. Yes, this is now expected so this is definitively not an allowed path. The user has to correctly complete the 3.5 -> 3.6 upgrade before moving to 4.y. Unfortunately we don't have an easy way to enforce it at host level.
If the user directly upgrades an host from 3.5/el7 -> 4.1 the host could not be active in the engine but nothing is stopping ovirt-ha-agent on the host and so it should be able to upgrade the hosted-engine storage domain to 3.6 structure also with json rpc. when the user will be able to raise the cluster compatibility level to 3.6, 4.0 or 4.1, the host could be activated as well. This is not the recommended upgrade path.
Following Procedure 6.6. Updating the RHEL-Based Self-Hosted Engine Host: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Self-Hosted_Engine_Guide/Upgrading_the_Self-Hosted_Engine.html I've got vdsmd service down with: puma19 ~]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: failed (Result: start-limit) since Thu 2017-04-13 19:19:58 IDT; 9s ago Process: 17175 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=1/FAILURE) Main PID: 8832 (code=exited, status=0/SUCCESS) Apr 13 19:19:58 puma19.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager. Apr 13 19:19:58 puma19.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state. Apr 13 19:19:58 puma19.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed. Apr 13 19:19:58 puma19.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart. Apr 13 19:19:58 puma19.scl.lab.tlv.redhat.com systemd[1]: start request repeated too quickly for vdsmd.service Apr 13 19:19:58 puma19.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager. Apr 13 19:19:58 puma19.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state. Apr 13 19:19:58 puma19.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed. I've rebooted puma19, while HE-VM was still running on puma18, which was already upgraded 3.5-?3.6, meanwhile host cluster was still in 3.5 compatibility and in global maintenance. puma19 was in maintenance from UI prior to upgrading it 3.5->4.1. After host puma was restarted, vdsmd service still failed to get started: puma19 ~]# systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: failed (Result: start-limit) since Thu 2017-04-13 19:24:45 IDT; 50s ago Process: 9615 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=1/FAILURE) Apr 13 19:24:45 puma19.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager. Apr 13 19:24:45 puma19.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state. Apr 13 19:24:45 puma19.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed. Apr 13 19:24:45 puma19.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart. Apr 13 19:24:45 puma19.scl.lab.tlv.redhat.com systemd[1]: start request repeated too quickly for vdsmd.service Apr 13 19:24:45 puma19.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager. Apr 13 19:24:45 puma19.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state. Apr 13 19:24:45 puma19.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed. Sosreport from puma19 being attached.
Forth to that upgrade has failed on 4.1 host from previous comment, moving this bug back to assigned. Components from host puma19: mom-0.5.9-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 vdsm-4.19.10.1-1.el7ev.x86_64 ovirt-hosted-engine-setup-2.1.0.5-1.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch libvirt-client-2.0.0-10.el7_3.5.x86_64 ovirt-imageio-daemon-1.0.0-0.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-host-deploy-1.6.3-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.0.5-1.el7ev.noarch ovirt-setup-lib-1.1.0-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017 Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) Components from puma18 (3.6 upgraded host): rhevm-sdk-python-3.6.9.1-1.el7ev.noarch vdsm-4.17.39-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 ovirt-hosted-engine-ha-1.3.5.10-1.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch libvirt-client-2.0.0-10.el7_3.5.x86_64 ovirt-vmconsole-1.0.4-1.el7ev.noarch mom-0.5.6-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.4-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch Linux version 3.10.0-327.53.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Tue Mar 14 10:49:09 EDT 2017 Linux 3.10.0-327.53.1.el7.x86_64 #1 SMP Tue Mar 14 10:49:09 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) Components from engine: rhevm-setup-plugin-vmconsole-proxy-helper-3.6.11-0.2.el6.noarch rhevm-sdk-python-3.6.9.1-1.el6ev.noarch rhevm-spice-client-x86-cab-3.6-7.el6.noarch rhevm-backend-3.6.11-0.2.el6.noarch rhevm-setup-base-3.6.11-0.2.el6.noarch rhevm-setup-plugin-websocket-proxy-3.6.11-0.2.el6.noarch rhevm-doc-3.6.10-1.el6ev.noarch rhevm-tools-backup-3.6.11-0.2.el6.noarch rhevm-webadmin-portal-3.6.11-0.2.el6.noarch ovirt-setup-lib-1.0.1-1.el6ev.noarch rhevm-setup-3.6.11-0.2.el6.noarch rhevm-cli-3.6.9.0-1.el6ev.noarch rhevm-branding-rhev-3.6.0-10.el6ev.noarch ovirt-vmconsole-proxy-1.0.4-1.el6ev.noarch rhevm-spice-client-x64-msi-3.6-7.el6.noarch rhevm-dwh-3.6.8-1.el6ev.noarch ovirt-engine-extension-aaa-jdbc-1.0.7-2.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.11-0.2.el6.noarch rhevm-reports-setup-3.6.5.1-1.el6ev.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-dependencies-3.6.1-1.el6ev.noarch ovirt-host-deploy-java-1.4.1-1.el6ev.noarch rhevm-vmconsole-proxy-helper-3.6.11-0.2.el6.noarch rhevm-spice-client-x86-msi-3.6-7.el6.noarch rhevm-userportal-3.6.11-0.2.el6.noarch rhevm-3.6.11-0.2.el6.noarch rhevm-lib-3.6.11-0.2.el6.noarch rhevm-dwh-setup-3.6.8-1.el6ev.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-websocket-proxy-3.6.11-0.2.el6.noarch rhevm-extensions-api-impl-3.6.11-0.2.el6.noarch rhevm-dbscripts-3.6.11-0.2.el6.noarch rhevm-guest-agent-common-1.0.11-6.el6ev.noarch rhevm-setup-plugins-3.6.5-1.el6ev.noarch rhevm-image-uploader-3.6.1-2.el6ev.noarch ovirt-vmconsole-1.0.4-1.el6ev.noarch rhevm-spice-client-x64-cab-3.6-7.el6.noarch rhevm-reports-3.6.5.1-1.el6ev.noarch rhev-guest-tools-iso-3.6-6.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.11-0.2.el6.noarch ovirt-host-deploy-1.4.1-1.el6ev.noarch rhevm-tools-3.6.11-0.2.el6.noarch rhevm-restapi-3.6.11-0.2.el6.noarch Linux version 2.6.32-573.41.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Thu Mar 2 11:08:17 EST 2017 Linux 2.6.32-573.41.1.el6.x86_64 #1 SMP Thu Mar 2 11:08:17 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.8 (Santiago)
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1271450 [details] engine
Created attachment 1271451 [details] puma18 (3.6 ha-host) normally running the engine engine
Sosreport from problematic 4.1 puma19: https://drive.google.com/open?id=0B85BEaDBcF88bjZWS0VUeWZZSEk
On the vdsm the issue is here: Apr 13 19:29:26 puma19 systemd: Starting Virtual Desktop Server Manager... Apr 13 19:29:26 puma19 vdsmd_init_common.sh: vdsm: Running mkdirs Apr 13 19:29:26 puma19 vdsmd_init_common.sh: vdsm: Running configure_coredump Apr 13 19:29:26 puma19 vdsmd_init_common.sh: vdsm: Running configure_vdsm_logs Apr 13 19:29:26 puma19 vdsmd_init_common.sh: vdsm: Running wait_for_network Apr 13 19:29:26 puma19 vdsmd_init_common.sh: vdsm: Running run_init_hooks Apr 13 19:29:26 puma19 vdsmd_init_common.sh: vdsm: Running upgraded_version_check Apr 13 19:29:26 puma19 vdsmd_init_common.sh: vdsm: Running check_is_configured Apr 13 19:29:26 puma19 vdsmd_init_common.sh: Error: Apr 13 19:29:26 puma19 vdsmd_init_common.sh: One of the modules is not configured to work with VDSM. Apr 13 19:29:26 puma19 vdsmd_init_common.sh: To configure the module use the following: Apr 13 19:29:26 puma19 vdsmd_init_common.sh: 'vdsm-tool configure [--module module-name]'. Apr 13 19:29:26 puma19 vdsmd_init_common.sh: If all modules are not configured try to use: Apr 13 19:29:26 puma19 vdsmd_init_common.sh: 'vdsm-tool configure --force' Apr 13 19:29:26 puma19 vdsmd_init_common.sh: (The force flag will stop the module's service and start it Apr 13 19:29:26 puma19 vdsmd_init_common.sh: afterwards automatically to load the new configuration.) Apr 13 19:29:26 puma19 vdsmd_init_common.sh: Current revision of multipath.conf detected, preserving Apr 13 19:29:26 puma19 vdsmd_init_common.sh: WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on vdsm configuration Apr 13 19:29:26 puma19 vdsmd_init_common.sh: lvm requires configuration Apr 13 19:29:26 puma19 vdsmd_init_common.sh: libvirt is already configured for vdsm Apr 13 19:29:26 puma19 vdsmd_init_common.sh: Modules lvm are not configured Apr 13 19:29:26 puma19 vdsmd_init_common.sh: vdsm: stopped during execute check_is_configured task (task returned with error code 1). So it seams that simply updating the rpm of vdsm from 3.5 ones to 4.1 one is not enough to have it restarting. This is not hosted-engine specific. A direct update from 3.5 -> to 4.1 is not a supported flow so I'm not sure about what to do on this.
Moving back to ON_QA since the direct upgrade 3.5 to 4.1 is not a supported scenario. Upgrade test should be 3.5 -> 3.6 -> 4.1
Created attachment 1272389 [details] sosreport-puma19
Created attachment 1272393 [details] sosreport-nsednev-he-4
Created attachment 1272395 [details] sosreport-puma18
The issue with the upgrade of vdsm seams due to: Apr 18 20:27:57 puma19 vdsm-tool: module dump_bonding_defaults could not load to vdsm-tool: Traceback (most recent call last):#012 File "/usr/bin/vdsm-tool", line 91, in load_modules#012 mod_absp, mod_desc)#012 File "/usr/lib/python2.7/site-packages/vdsm/tool/dump_bonding_defaults.py", line 25, in <module>#012 from ..netinfo import (BONDING_MASTERS, BONDING_OPT, BONDING_DEFAULTS,#012 File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 38, in <module>#012 from .ipwrapper import drv_name#012 File "/usr/lib/python2.7/site-packages/vdsm/ipwrapper.py", line 38, in <module>#012 from .utils import execCmd#012ImportError: cannot import name execCmd Apr 18 20:27:57 puma19 vdsm-tool: module upgrade_300_networks could not load to vdsm-tool: Traceback (most recent call last):#012 File "/usr/bin/vdsm-tool", line 91, in load_modules#012 mod_absp, mod_desc)#012 File "/usr/lib/python2.7/site-packages/vdsm/tool/upgrade_300_networks.py", line 24, in <module>#012 from vdsm import netinfo#012 File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 38, in <module>#012 from .ipwrapper import drv_name#012 File "/usr/lib/python2.7/site-packages/vdsm/ipwrapper.py", line 38, in <module>#012 from .utils import execCmd#012ImportError: cannot import name execCmd Apr 18 20:27:57 puma19 vdsm-tool: module validate_ovirt_certs could not load to vdsm-tool: Traceback (most recent call last):#012 File "/usr/bin/vdsm-tool", line 91, in load_modules#012 mod_absp, mod_desc)#012 File "/usr/lib/python2.7/site-packages/vdsm/tool/validate_ovirt_certs.py", line 25, in <module>#012 from ..utils import execCmd#012ImportError: cannot import name execCmd
Moving to verified with an exception of a required work around for 4.1 hosts: Once you have upgraded your host to 4.1 components and host is still in maintenance from an engine, run these on host: "vdsm-tool configure --force && systemctl restart vdsmd && systemctl restart ovirt-ha-broker && systemctl restart ovirt-ha-agent" wait for a while, it can take several minutes, then run on 4.1 host "less /var/log/ovirt-hosted-engine-ha/agent.log | grep "Successfully upgraded", you should see: "MainThread::INFO::2017-04-19 10:25:22,099::upgrade::1035::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Successfully upgraded". Worked for me on these components on 4.1 host after work around: mom-0.5.9-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 vdsm-4.19.10.1-1.el7ev.x86_64 ovirt-hosted-engine-setup-2.1.0.5-1.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch libvirt-client-2.0.0-10.el7_3.5.x86_64 ovirt-imageio-daemon-1.0.0-0.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-host-deploy-1.6.3-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.0.5-1.el7ev.noarch ovirt-setup-lib-1.1.0-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 Linux version 3.10.0-327.53.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Tue Mar 14 10:49:09 EDT 2017 Linux 3.10.0-327.53.1.el7.x86_64 #1 SMP Tue Mar 14 10:49:09 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) Engine: rhevm-setup-plugin-vmconsole-proxy-helper-3.6.11.1-0.1.el6.noarch rhevm-sdk-python-3.6.9.1-1.el6ev.noarch ovirt-host-deploy-java-1.4.1-1.el6ev.noarch rhevm-spice-client-x64-msi-3.6-7.el6.noarch rhevm-setup-base-3.6.11.1-0.1.el6.noarch rhevm-setup-plugin-websocket-proxy-3.6.11.1-0.1.el6.noarch rhevm-doc-3.6.10-1.el6ev.noarch rhevm-spice-client-x86-msi-3.6-7.el6.noarch rhevm-backend-3.6.11.1-0.1.el6.noarch ovirt-setup-lib-1.0.1-1.el6ev.noarch rhevm-setup-3.6.11.1-0.1.el6.noarch rhevm-cli-3.6.9.0-1.el6ev.noarch ovirt-vmconsole-proxy-1.0.4-1.el6ev.noarch rhevm-restapi-3.6.11.1-0.1.el6.noarch rhevm-dwh-3.6.8-1.el6ev.noarch ovirt-engine-extension-aaa-jdbc-1.0.7-2.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.11.1-0.1.el6.noarch rhevm-reports-setup-3.6.5.1-1.el6ev.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch ovirt-host-deploy-1.4.1-1.el6ev.noarch rhevm-vmconsole-proxy-helper-3.6.11.1-0.1.el6.noarch rhevm-spice-client-x64-cab-3.6-7.el6.noarch rhevm-tools-3.6.11.1-0.1.el6.noarch rhevm-3.6.11.1-0.1.el6.noarch rhevm-lib-3.6.11.1-0.1.el6.noarch rhevm-dwh-setup-3.6.8-1.el6ev.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-websocket-proxy-3.6.11.1-0.1.el6.noarch rhevm-extensions-api-impl-3.6.11.1-0.1.el6.noarch rhevm-webadmin-portal-3.6.11.1-0.1.el6.noarch rhevm-guest-agent-common-1.0.11-6.el6ev.noarch rhevm-setup-plugins-3.6.5-1.el6ev.noarch rhevm-image-uploader-3.6.1-2.el6ev.noarch rhevm-branding-rhev-3.6.0-10.el6ev.noarch ovirt-vmconsole-1.0.4-1.el6ev.noarch rhevm-userportal-3.6.11.1-0.1.el6.noarch rhevm-reports-3.6.5.1-1.el6ev.noarch rhev-guest-tools-iso-3.6-6.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.11.1-0.1.el6.noarch rhevm-dependencies-3.6.1-1.el6ev.noarch rhevm-tools-backup-3.6.11.1-0.1.el6.noarch rhevm-spice-client-x86-cab-3.6-7.el6.noarch rhevm-dbscripts-3.6.11.1-0.1.el6.noarch Linux version 2.6.32-573.41.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Thu Mar 2 11:08:17 EST 2017 Linux 2.6.32-573.41.1.el6.x86_64 #1 SMP Thu Mar 2 11:08:17 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.8 (Santiago)