Bug 1242092 - [vdsm] SpmStart fails after "offline" upgrade of DC with V1 master domain to 3.5
Summary: [vdsm] SpmStart fails after "offline" upgrade of DC with V1 master domain to...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.1
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: ovirt-3.6.0-rc3
: 3.6.0
Assignee: Liron Aravot
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On: 1118349 1120712
Blocks: 1269411
TreeView+ depends on / blocked
 
Reported: 2015-07-10 21:53 UTC by Allie DeVolder
Modified: 2019-10-10 09:56 UTC (History)
28 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1118349
: 1260428 1269411 (view as bug list)
Environment:
Last Closed: 2016-03-09 19:42:40 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0362 0 normal SHIPPED_LIVE vdsm 3.6.0 bug fix and enhancement update 2016-03-09 23:49:32 UTC
oVirt gerrit 45680 0 None DRAFT core: fix spm election after pool upgrade Never
oVirt gerrit 45763 0 master MERGED core: moving InquireNotSupportedError to storage_exception.py Never
oVirt gerrit 45764 0 master MERGED sp: startSpm - clusterlock inquire leads to failure Never
oVirt gerrit 45929 0 master MERGED JsonRpcVdsServer: spmStart() to pass the needed domain version Never
oVirt gerrit 45930 0 master MERGED core: spmStart failure when clusterlock inquire isn't supported Never
oVirt gerrit 45965 0 ovirt-3.6 MERGED core: moving InquireNotSupportedError to storage_exception.py Never
oVirt gerrit 45966 0 ovirt-3.6 MERGED sp: startSpm - clusterlock inquire leads to failure Never
oVirt gerrit 45975 0 ovirt-3.5 MERGED core: moving InquireNotSupportedError to storage_exception.py Never
oVirt gerrit 45976 0 ovirt-3.5 MERGED sp: startSpm - clusterlock inquire leads to failure Never
oVirt gerrit 46019 0 ovirt-engine-3.6 MERGED JsonRpcVdsServer: spmStart() to pass the needed domain version Never
oVirt gerrit 46020 0 ovirt-engine-3.6 MERGED core: spmStart failure when clusterlock inquire isn't supported Never
oVirt gerrit 46027 0 ovirt-engine-3.5 MERGED JsonRpcVdsServer: spmStart() to pass the needed domain version Never
oVirt gerrit 46028 0 ovirt-engine-3.5 MERGED core: spmStart failure when clusterlock inquire isn't supported Never
oVirt gerrit 47120 0 ovirt-engine-3.6.0 MERGED JsonRpcVdsServer: spmStart() to pass the needed domain version Never
oVirt gerrit 47121 0 ovirt-engine-3.6.0 MERGED core: spmStart failure when clusterlock inquire isn't supported Never

Comment 3 Allon Mureinik 2015-07-12 14:18:42 UTC
Although the error seems similar, I don't think the flow is - here, you have a pre-exsiting DC, you are not creating a new DC on an old domain.

Liron, can you take a look please?

Comment 4 Liron Aravot 2015-07-16 09:32:08 UTC
Allan, can you elaborate on what is the reported issue here?
i assume that you wanted to verify that having both domains in V2 is legit?

what is your DC level? is it a DC that was upgraded?

thanks,
Liron.

Comment 6 Liron Aravot 2015-07-30 08:38:17 UTC
Allan,
can you please clarify what is the reported issue here and updated the bug description? IIUC your concern is about the domains version is V2?

additionally, can you please respond to the rest of my questions in https://bugzilla.redhat.com/show_bug.cgi?id=1242092#c4 ? what is the DC level? (you wrote about the cluster version) and please attach the relevant log to this bug? (I'd like to get logs including startSpm and connectStoragePool()

thanks, Liron.

Comment 11 Liron Aravot 2015-08-30 16:09:00 UTC
There are multiple issues in the upgrade flow.
I'm working on fixing them - the root causes for the upgrade relates issues are:

1. when startSpm is called and there is an unreachable domain, it starts an upgrade thread for it (to be executed when the domain becomes available). if you try to execute upgradeStoragePool() before the domain was upgraded it'll fail because there is already an upgrade in process.

2. the engine handling of the flow is wrong and will leave wrong db values.

3. need to look at deactivateSd/upgrade process handling.

Comment 12 Roman Hodain 2015-08-31 06:12:57 UTC
(In reply to Liron Aravot from comment #11)
> There are multiple issues in the upgrade flow.
> I'm working on fixing them - the root causes for the upgrade relates issues
> are:
> 
> 1. when startSpm is called and there is an unreachable domain, it starts an
> upgrade thread for it (to be executed when the domain becomes available). if
> you try to execute upgradeStoragePool() before the domain was upgraded it'll
> fail because there is already an upgrade in process.
> 
> 2. the engine handling of the flow is wrong and will leave wrong db values.
> 
> 3. need to look at deactivateSd/upgrade process handling.

Hi,

thanks for clarification. We have changed the DC compatibility version to 3.0 and then restarted vdsm which made the DC up again. I suppose that if we fix the issue with the missing SD then we can trigger the upgrade again, right?

Comment 13 Liron Aravot 2015-08-31 08:04:22 UTC
Yes, in the current situation you can trigger an update safely only when all domains are available. another option would be to put the domain in maintenance, stop the spm and start it again.

Comment 15 Liron Aravot 2015-08-31 08:12:55 UTC
for other readers of this issue - note that the solution for each scenario of this issue is different and relates to the dc version, hosts that were added/refreshed and so on, so don't use it as a general solution for the problem.

Comment 17 Yaniv Lavi 2015-10-07 13:49:15 UTC
Can you please check if this should be on MODIFIED?

Comment 19 Kevin Alon Goldblatt 2015-11-15 16:45:36 UTC
Hi,

Can you please confirm the following steps to reproduce this manually:


1.Create a DC with compatibility version 3.0
2.Add a cluster version 3.0
3.Add a host
4.Add a new ISCSI domain to the DC >>> Upgrade should be initiated and Domain added successfully.

Comment 20 Liron Aravot 2015-11-16 11:51:33 UTC
1.Create a DC with compatibility version 3.0
2.Add a cluster version 3.0
3.Add a host
4.Add a storage domain, wait for the domain to become master and the spm to start on the host.
5. move the domain to maintenance.
6. upgrade the dc to 3.5
7. active the domain
8. verify that the spm starts successfully.

Comment 21 Kevin Alon Goldblatt 2015-11-16 14:34:48 UTC
Verified with the following code:
------------------------------------------
vdsm-4.17.10.1-0.el7ev.noarch
rhevm-3.6.0.3-0.1.el6.noarch

Verified with the following scenario:
-----------------------------------------
Steps to reproduce:
-------------------
1.Create a DC with compatibility version 3.0
2.Add a cluster version 3.5
3.Add a host
4.Add a storage domain, wait for the domain to become master and the spm to start on the host.
5. move the domain to maintenance.
6. upgrade the dc to 3.5
7. active the domain
8. verify that the spm starts successfully.


Moving to VERIFIED!

Comment 23 errata-xmlrpc 2016-03-09 19:42:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html


Note You need to log in before you can comment on or make changes to this bug.