Bug 1242092

Summary: [vdsm] SpmStart fails after "offline" upgrade of DC with V1 master domain to 3.5
Product: Red Hat Enterprise Virtualization Manager Reporter: Allie DeVolder <adevolder>
Component: vdsmAssignee: Liron Aravot <laravot>
Status: CLOSED ERRATA QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.1CC: acanan, adevolder, ahino, alitke, amureini, bazulay, bugs, ebenahar, eedri, fromani, fsimonce, gickowic, laravot, lpeer, lsurette, mgoldboi, nlevinki, pkliczew, ratamir, rbalakri, rhodain, sbonazzo, s.kieske, tnisan, ycui, yeylon, ykaul, ylavi
Target Milestone: ovirt-3.6.0-rc3Keywords: Regression, ZStream
Target Release: 3.6.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1118349
: 1260428 1269411 (view as bug list) Environment:
Last Closed: 2016-03-09 19:42:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1118349, 1120712    
Bug Blocks: 1269411    

Comment 3 Allon Mureinik 2015-07-12 14:18:42 UTC
Although the error seems similar, I don't think the flow is - here, you have a pre-exsiting DC, you are not creating a new DC on an old domain.

Liron, can you take a look please?

Comment 4 Liron Aravot 2015-07-16 09:32:08 UTC
Allan, can you elaborate on what is the reported issue here?
i assume that you wanted to verify that having both domains in V2 is legit?

what is your DC level? is it a DC that was upgraded?

thanks,
Liron.

Comment 6 Liron Aravot 2015-07-30 08:38:17 UTC
Allan,
can you please clarify what is the reported issue here and updated the bug description? IIUC your concern is about the domains version is V2?

additionally, can you please respond to the rest of my questions in https://bugzilla.redhat.com/show_bug.cgi?id=1242092#c4 ? what is the DC level? (you wrote about the cluster version) and please attach the relevant log to this bug? (I'd like to get logs including startSpm and connectStoragePool()

thanks, Liron.

Comment 11 Liron Aravot 2015-08-30 16:09:00 UTC
There are multiple issues in the upgrade flow.
I'm working on fixing them - the root causes for the upgrade relates issues are:

1. when startSpm is called and there is an unreachable domain, it starts an upgrade thread for it (to be executed when the domain becomes available). if you try to execute upgradeStoragePool() before the domain was upgraded it'll fail because there is already an upgrade in process.

2. the engine handling of the flow is wrong and will leave wrong db values.

3. need to look at deactivateSd/upgrade process handling.

Comment 12 Roman Hodain 2015-08-31 06:12:57 UTC
(In reply to Liron Aravot from comment #11)
> There are multiple issues in the upgrade flow.
> I'm working on fixing them - the root causes for the upgrade relates issues
> are:
> 
> 1. when startSpm is called and there is an unreachable domain, it starts an
> upgrade thread for it (to be executed when the domain becomes available). if
> you try to execute upgradeStoragePool() before the domain was upgraded it'll
> fail because there is already an upgrade in process.
> 
> 2. the engine handling of the flow is wrong and will leave wrong db values.
> 
> 3. need to look at deactivateSd/upgrade process handling.

Hi,

thanks for clarification. We have changed the DC compatibility version to 3.0 and then restarted vdsm which made the DC up again. I suppose that if we fix the issue with the missing SD then we can trigger the upgrade again, right?

Comment 13 Liron Aravot 2015-08-31 08:04:22 UTC
Yes, in the current situation you can trigger an update safely only when all domains are available. another option would be to put the domain in maintenance, stop the spm and start it again.

Comment 15 Liron Aravot 2015-08-31 08:12:55 UTC
for other readers of this issue - note that the solution for each scenario of this issue is different and relates to the dc version, hosts that were added/refreshed and so on, so don't use it as a general solution for the problem.

Comment 17 Yaniv Lavi 2015-10-07 13:49:15 UTC
Can you please check if this should be on MODIFIED?

Comment 19 Kevin Alon Goldblatt 2015-11-15 16:45:36 UTC
Hi,

Can you please confirm the following steps to reproduce this manually:


1.Create a DC with compatibility version 3.0
2.Add a cluster version 3.0
3.Add a host
4.Add a new ISCSI domain to the DC >>> Upgrade should be initiated and Domain added successfully.

Comment 20 Liron Aravot 2015-11-16 11:51:33 UTC
1.Create a DC with compatibility version 3.0
2.Add a cluster version 3.0
3.Add a host
4.Add a storage domain, wait for the domain to become master and the spm to start on the host.
5. move the domain to maintenance.
6. upgrade the dc to 3.5
7. active the domain
8. verify that the spm starts successfully.

Comment 21 Kevin Alon Goldblatt 2015-11-16 14:34:48 UTC
Verified with the following code:
------------------------------------------
vdsm-4.17.10.1-0.el7ev.noarch
rhevm-3.6.0.3-0.1.el6.noarch

Verified with the following scenario:
-----------------------------------------
Steps to reproduce:
-------------------
1.Create a DC with compatibility version 3.0
2.Add a cluster version 3.5
3.Add a host
4.Add a storage domain, wait for the domain to become master and the spm to start on the host.
5. move the domain to maintenance.
6. upgrade the dc to 3.5
7. active the domain
8. verify that the spm starts successfully.


Moving to VERIFIED!

Comment 23 errata-xmlrpc 2016-03-09 19:42:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html