Bug 1318725 - Can't raise DC compatibility mode 3.4->3.5->3.6 on environment that already running on 3.6HE with 3.6CL and 3.6 hosts.
Summary: Can't raise DC compatibility mode 3.4->3.5->3.6 on environment that already r...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.4
Hardware: x86_64
OS: Linux
medium
medium vote
Target Milestone: ovirt-3.6.4
: ---
Assignee: Adam Litke
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-17 15:29 UTC by Nikolai Sednev
Modified: 2017-05-11 11:06 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-05 13:53:33 UTC
oVirt Team: Storage
tnisan: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
Debug Patch (912 bytes, patch)
2016-03-21 18:43 UTC, Adam Litke
no flags Details | Diff

Description Nikolai Sednev 2016-03-17 15:29:50 UTC
Description of problem:
Can't rise DC compatibility mode 3.4->3.5->3.6 on environment that already running on 3.6HE with 3.6CL and 3.6 hosts. I've got in to this during 3.4->3.5->3.6 upgrade scenario.
I see that HE-SD was successfully auto-imported after I've tried to risen the DC compatibility mode 3.4->3.5->3.6 several times and also restarted the HE "hosted-engine --vm-poweroff" then waited for engine to get started by ovirt-ha-agent.

2016-03-17 17:16:34,009 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-33) [fb387f6] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM command failed: Upgrading a pool while an upgrade is in process is unsupported (pool: `00000002-0002-0002-0002-0000000001ae`): ''
2016-03-17 17:16:34,009 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (org.ovirt.thread.pool-6-thread-33) [fb387f6] Command 'UpgradeStoragePoolVDSCommand( UpgradeStoragePoolVDSCommandParameters:{runAsync='true', storagePoolId='00000002-0002-0002-0002-0000000001ae', ignoreFailoverLimit='false', storagePoolId='00000002-0002-0002-0002-0000000001ae', poolVersion ='3'})' execution failed: IRSGenericException: IRSErrorException: Upgrading a pool while an upgrade is in process is unsupported (pool: `00000002-0002-0002-0002-0000000001ae`): ''
2016-03-17 17:16:34,009 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (org.ovirt.thread.pool-6-thread-33) [fb387f6] FINISH, UpgradeStoragePoolVDSCommand, log id: 702d2d8e


Version-Release number of selected component (if applicable):

Host:
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.4.3-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.8.x86_64
ovirt-vmconsole-1.0.0-1.el7ev.noarch
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64
libvirt-client-1.2.17-13.el7_2.4.x86_64
mom-0.5.2-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.3.4-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
vdsm-4.17.23-0.el7ev.noarch
Linux version 3.10.0-327.13.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Feb 29 13:22:02 EST 2016

Engine:
rhevm-sdk-python-3.6.3.0-1.el6ev.noarch
rhevm-extensions-api-impl-3.6.3.4-0.1.el6.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
rhevm-dependencies-3.6.0-1.el6ev.noarch
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.3.4-0.1.el6.noarch
rhevm-restapi-3.6.3.4-0.1.el6.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.3.4-0.1.el6.noarch
rhevm-websocket-proxy-3.6.3.4-0.1.el6.noarch
rhevm-spice-client-x86-msi-3.6-6.el6.noarch
rhevm-tools-3.6.3.4-0.1.el6.noarch
rhevm-setup-plugin-websocket-proxy-3.6.3.4-0.1.el6.noarch
rhevm-setup-plugin-ovirt-engine-3.6.3.4-0.1.el6.noarch
rhevm-dwh-setup-3.6.2-1.el6ev.noarch
rhevm-cli-3.6.2.0-1.el6ev.noarch
rhevm-branding-rhev-3.6.0-8.el6ev.noarch
rhevm-doc-3.6.0-4.el6eng.noarch
rhevm-dbscripts-3.6.3.4-0.1.el6.noarch
rhevm-reports-3.6.3-1.el6ev.noarch
rhevm-reports-setup-3.6.3-1.el6ev.noarch
rhevm-setup-base-3.6.3.4-0.1.el6.noarch
rhevm-setup-plugins-3.6.3-1.el6ev.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
rhevm-spice-client-x64-msi-3.6-6.el6.noarch
rhevm-spice-client-x64-cab-3.6-6.el6.noarch
rhevm-userportal-3.6.3.4-0.1.el6.noarch
rhevm-3.6.3.4-0.1.el6.noarch
rhevm-dwh-3.6.2-1.el6ev.noarch
rhevm-setup-3.6.3.4-0.1.el6.noarch
rhevm-spice-client-x86-cab-3.6-6.el6.noarch
rhevm-backend-3.6.3.4-0.1.el6.noarch
rhevm-lib-3.6.3.4-0.1.el6.noarch
rhevm-image-uploader-3.6.0-1.el6ev.noarch
rhevm-guest-agent-common-1.0.11-2.el6ev.noarch
rhevm-vmconsole-proxy-helper-3.6.3.4-0.1.el6.noarch
rhevm-webadmin-portal-3.6.3.4-0.1.el6.noarch
ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch
ovirt-engine-extension-aaa-ldap-1.1.2-1.el6ev.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
ovirt-setup-lib-1.0.1-1.el6ev.noarch
ovirt-vmconsole-1.0.0-1.el6ev.noarch
ovirt-engine-extension-aaa-jdbc-1.0.6-1.el6ev.noarch
Linux version 2.6.32-573.18.1.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Wed Jan 6 11:20:49 EST 2016

How reproducible:
Happened during upgrade from 3.4->3.5->3.6.

Steps to Reproduce:
1.upgrade from 3.4->3.5->3.6.
2.Get to 3.6 engine with 3.6 hosts and 3.6 host cluster.
3.Try to change compatibility mode of the DC from 3.4 to 3.5 or to 3.6.

Actual results:
DC version jumps back from 3.5 or 3.6 right after customer changes it.

Expected results:
DC compatibility version should change and stay the same as cusomer selected.

Additional info:
Sosreports from engine and host attached.

Comment 2 Tomer Saban 2016-03-20 12:43:28 UTC
According to the vdsm logs it seems that there is a variable in vdsm which isn't getting cleaned properly and therefore, It can't upgrade the DC version.
Specifically, vdsm/storage/sp.py: _domainsToUpgrade.

traceback from the vdsm log:
"""
jsonrpc.Executor/2::DEBUG::2016-03-17 17:16:22,434::resourceManager::652::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.upgrade_00000002-0002-0002-0002-0000000001ae', Clearing records.
jsonrpc.Executor/2::ERROR::2016-03-17 17:16:22,434::task::866::Storage.TaskManager.Task::(_setError) Task=`a1b1e088-83ea-4e28-89c9-96b9b26da363`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3539, in upgradeStoragePool
    pool._upgradePool(targetDomVersion)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 442, in _upgradePool
    raise se.PoolUpgradeInProgress(self.spUUID)
PoolUpgradeInProgress: Upgrading a pool while an upgrade is in process is unsupported (pool: `00000002-0002-0002-0002-0000000001ae`): ''
"""

Comment 3 Daniel Erez 2016-03-20 13:13:32 UTC
@Nir - what do you think? is it a known issue?

Comment 4 Allon Mureinik 2016-03-21 13:06:09 UTC
Does this reproduce in a clean env, without the HE mess around it?

Comment 5 Adam Litke 2016-03-21 15:05:48 UTC
I've tried to reproduce this without the HE mess but everything is working properly.

Comment 6 Adam Litke 2016-03-21 18:43:15 UTC
Created attachment 1138760 [details]
Debug Patch

Please try to reproduce the issue with the following patch applied and post the vdsm.log.

Comment 7 Adam Litke 2016-03-21 18:45:40 UTC
After looking at the logs I see no evidence as to how the _domainsToUpgrade list could have an element in it.  Please try to reproduce this with the supplied debug patch and post the vdsm.log.  That should shine some light on what is going on here.

Comment 8 Allon Mureinik 2016-03-22 10:19:16 UTC
Nikolai and I took a look at the problematic env. After Simone applied the fix for bug 1316143 on this env, the upgrade passes without issue.

Pushing out to 3.6.6 as a placeholder to double-verify that there isn't any additional AI on the storage upgrade flow, but for now it does not seem like there is.

Comment 9 Yaniv Lavi 2016-03-23 14:16:58 UTC
Why not test this as test only in 3.6.4? Shouldn't this be ON_QA?

Comment 10 Allon Mureinik 2016-03-23 14:49:59 UTC
(In reply to Yaniv Dary from comment #9)
> Why not test this as test only in 3.6.4? Shouldn't this be ON_QA?
Fair enough.

Comment 11 Nikolai Sednev 2016-03-24 07:38:20 UTC
Works for me on these components:
libvirt-client-1.2.17-13.el7_2.4.x86_64
ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64
mom-0.5.2-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch
ovirt-vmconsole-1.0.0-1.el7ev.noarch
vdsm-4.17.23.1-0.el7ev.noarch
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux alma04.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux


Note You need to log in before you can comment on or make changes to this bug.