Hide Forgot
Description of problem: Can't rise DC compatibility mode 3.4->3.5->3.6 on environment that already running on 3.6HE with 3.6CL and 3.6 hosts. I've got in to this during 3.4->3.5->3.6 upgrade scenario. I see that HE-SD was successfully auto-imported after I've tried to risen the DC compatibility mode 3.4->3.5->3.6 several times and also restarted the HE "hosted-engine --vm-poweroff" then waited for engine to get started by ovirt-ha-agent. 2016-03-17 17:16:34,009 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-33) [fb387f6] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM command failed: Upgrading a pool while an upgrade is in process is unsupported (pool: `00000002-0002-0002-0002-0000000001ae`): '' 2016-03-17 17:16:34,009 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (org.ovirt.thread.pool-6-thread-33) [fb387f6] Command 'UpgradeStoragePoolVDSCommand( UpgradeStoragePoolVDSCommandParameters:{runAsync='true', storagePoolId='00000002-0002-0002-0002-0000000001ae', ignoreFailoverLimit='false', storagePoolId='00000002-0002-0002-0002-0000000001ae', poolVersion ='3'})' execution failed: IRSGenericException: IRSErrorException: Upgrading a pool while an upgrade is in process is unsupported (pool: `00000002-0002-0002-0002-0000000001ae`): '' 2016-03-17 17:16:34,009 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (org.ovirt.thread.pool-6-thread-33) [fb387f6] FINISH, UpgradeStoragePoolVDSCommand, log id: 702d2d8e Version-Release number of selected component (if applicable): Host: ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.4.3-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.8.x86_64 ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 libvirt-client-1.2.17-13.el7_2.4.x86_64 mom-0.5.2-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.3.4-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch vdsm-4.17.23-0.el7ev.noarch Linux version 3.10.0-327.13.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Feb 29 13:22:02 EST 2016 Engine: rhevm-sdk-python-3.6.3.0-1.el6ev.noarch rhevm-extensions-api-impl-3.6.3.4-0.1.el6.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-dependencies-3.6.0-1.el6ev.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.3.4-0.1.el6.noarch rhevm-restapi-3.6.3.4-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.3.4-0.1.el6.noarch rhevm-websocket-proxy-3.6.3.4-0.1.el6.noarch rhevm-spice-client-x86-msi-3.6-6.el6.noarch rhevm-tools-3.6.3.4-0.1.el6.noarch rhevm-setup-plugin-websocket-proxy-3.6.3.4-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-3.6.3.4-0.1.el6.noarch rhevm-dwh-setup-3.6.2-1.el6ev.noarch rhevm-cli-3.6.2.0-1.el6ev.noarch rhevm-branding-rhev-3.6.0-8.el6ev.noarch rhevm-doc-3.6.0-4.el6eng.noarch rhevm-dbscripts-3.6.3.4-0.1.el6.noarch rhevm-reports-3.6.3-1.el6ev.noarch rhevm-reports-setup-3.6.3-1.el6ev.noarch rhevm-setup-base-3.6.3.4-0.1.el6.noarch rhevm-setup-plugins-3.6.3-1.el6ev.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-spice-client-x64-msi-3.6-6.el6.noarch rhevm-spice-client-x64-cab-3.6-6.el6.noarch rhevm-userportal-3.6.3.4-0.1.el6.noarch rhevm-3.6.3.4-0.1.el6.noarch rhevm-dwh-3.6.2-1.el6ev.noarch rhevm-setup-3.6.3.4-0.1.el6.noarch rhevm-spice-client-x86-cab-3.6-6.el6.noarch rhevm-backend-3.6.3.4-0.1.el6.noarch rhevm-lib-3.6.3.4-0.1.el6.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch rhevm-guest-agent-common-1.0.11-2.el6ev.noarch rhevm-vmconsole-proxy-helper-3.6.3.4-0.1.el6.noarch rhevm-webadmin-portal-3.6.3.4-0.1.el6.noarch ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch ovirt-engine-extension-aaa-ldap-1.1.2-1.el6ev.noarch ovirt-host-deploy-1.4.1-1.el6ev.noarch ovirt-host-deploy-java-1.4.1-1.el6ev.noarch ovirt-setup-lib-1.0.1-1.el6ev.noarch ovirt-vmconsole-1.0.0-1.el6ev.noarch ovirt-engine-extension-aaa-jdbc-1.0.6-1.el6ev.noarch Linux version 2.6.32-573.18.1.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Wed Jan 6 11:20:49 EST 2016 How reproducible: Happened during upgrade from 3.4->3.5->3.6. Steps to Reproduce: 1.upgrade from 3.4->3.5->3.6. 2.Get to 3.6 engine with 3.6 hosts and 3.6 host cluster. 3.Try to change compatibility mode of the DC from 3.4 to 3.5 or to 3.6. Actual results: DC version jumps back from 3.5 or 3.6 right after customer changes it. Expected results: DC compatibility version should change and stay the same as cusomer selected. Additional info: Sosreports from engine and host attached.
Sosreport from the engine: https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88NEIzX0d0RHByUms/view?usp=sharing Sosreport from the host: https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88RkJDNC10a1dYWDA/view?usp=sharing
According to the vdsm logs it seems that there is a variable in vdsm which isn't getting cleaned properly and therefore, It can't upgrade the DC version. Specifically, vdsm/storage/sp.py: _domainsToUpgrade. traceback from the vdsm log: """ jsonrpc.Executor/2::DEBUG::2016-03-17 17:16:22,434::resourceManager::652::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.upgrade_00000002-0002-0002-0002-0000000001ae', Clearing records. jsonrpc.Executor/2::ERROR::2016-03-17 17:16:22,434::task::866::Storage.TaskManager.Task::(_setError) Task=`a1b1e088-83ea-4e28-89c9-96b9b26da363`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3539, in upgradeStoragePool pool._upgradePool(targetDomVersion) File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 442, in _upgradePool raise se.PoolUpgradeInProgress(self.spUUID) PoolUpgradeInProgress: Upgrading a pool while an upgrade is in process is unsupported (pool: `00000002-0002-0002-0002-0000000001ae`): '' """
@Nir - what do you think? is it a known issue?
Does this reproduce in a clean env, without the HE mess around it?
I've tried to reproduce this without the HE mess but everything is working properly.
Created attachment 1138760 [details] Debug Patch Please try to reproduce the issue with the following patch applied and post the vdsm.log.
After looking at the logs I see no evidence as to how the _domainsToUpgrade list could have an element in it. Please try to reproduce this with the supplied debug patch and post the vdsm.log. That should shine some light on what is going on here.
Nikolai and I took a look at the problematic env. After Simone applied the fix for bug 1316143 on this env, the upgrade passes without issue. Pushing out to 3.6.6 as a placeholder to double-verify that there isn't any additional AI on the storage upgrade flow, but for now it does not seem like there is.
Why not test this as test only in 3.6.4? Shouldn't this be ON_QA?
(In reply to Yaniv Dary from comment #9) > Why not test this as test only in 3.6.4? Shouldn't this be ON_QA? Fair enough.
Works for me on these components: libvirt-client-1.2.17-13.el7_2.4.x86_64 ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64 mom-0.5.2-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch vdsm-4.17.23.1-0.el7ev.noarch Red Hat Enterprise Linux Server release 7.2 (Maipo) Linux alma04.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux