Bug 789381

Summary: safelease broken for FS Storage Domains
Product: [Retired] oVirt Reporter: Adam Litke <alitke>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED DUPLICATE QA Contact: yeylon <yeylon>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, bazulay, fsimonce, iheim, srevivo, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-13 16:56:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Test case
none
vsdm.log snippet when the error occurs none

Description Adam Litke 2012-02-10 16:18:52 UTC
Created attachment 560931 [details]
Test case

Description of problem:

Since commit 1676396f18cf5c300d87e18169eba66cd39f0267, vdsm is not able to correctly acquire the SPM role when working with LOCALFS storage.


Version-Release number of selected component (if applicable):
1676396f18cf5c300d87e18169eba66cd39f0267

How reproducible: Always


Steps to Reproduce:
1. Run the attached python script on the host.
  
Actual results:
deactivateStorageDomain fails with the following error:

Exception: {'status': {'message': 'Not SPM', 'code': 654}}


Expected results:
The script completes without errors.

Additional info:
See attached script.

Comment 1 Federico Simoncelli 2012-02-12 15:47:12 UTC
Hi Adam, thanks for the bug report and the test case. In the future if you attach also the relevant log parts you'll speed up the bug triage (I might be able to understand the problem even without reproducing it).

Comment 2 Adam Litke 2012-02-12 23:07:15 UTC
Created attachment 561333 [details]
vsdm.log snippet when the error occurs

Hi Federico.  After further investigation, this problem only seems to impact the master SD.  In another test, I was able to create/attach/activate/deactivate/detach/format a secondary storage domain without issue but as soon as I tried doing the same to the master SD, I got the Not SPM error.

Please find attached a section of vdsm.log that shows the problem.  In this session I did the following xmlrpc commands:

spmStart(sp, -1,-1,-1,0,0)
getTaskInfo('d73642da-e524-4c51-aa97-a2f90b7ad68b')
getTaskStatus('d73642da-e524-4c51-aa97-a2f90b7ad68b')
deactivateStorageDomain('def32ac7-1e12-4823-8e8c-8c887333fe16',
                        '6e4d6a96-d3da-419c-8905-b5eec55c44e2',
                        '00000000-0000-0000-0000-000000000000', 1)

Comment 3 Federico Simoncelli 2012-02-13 11:24:55 UTC
vdsOK(s.deactivateStorageDomain(sd, sp, BLANK_UUID, masterVersion))
vdsOK(s.detachStorageDomain(sd, sp, BLANK_UUID, masterVersion))
vdsOK(s.formatStorageDomain(sd))
vdsOK(s.spmStop(sp))

The flow above looks wrong to me when it comes to the master domain.
For sake of completeness let's assume that we are using sanlock as lock manager (but the idea applies to safelease too).
Since we can have attached domains that are not active (eg: storage in maintenance) the acquire/release host id (lockspace) was wired into activate/deactivate storage domain.
Without a lockspace (host id) you cannot hold a resource (SPM).
Since the pool cluster lock is kept into the master domain either you migrate it somewhere else (to maintain the SPM status) before deactivating/detaching/formatting or you just destroy the storage domain:

tid = vdsOK(s.spmStart(sp, -1, -1, -1, 0))['uuid']
waitTask(s, tid)

vdsOK(s.destroyStoragePool(sp, hostID, sp_key))
vdsOK(s.formatStorageDomain(sd))

Comment 4 Ayal Baron 2012-02-13 12:11:35 UTC
(In reply to comment #3)
> vdsOK(s.deactivateStorageDomain(sd, sp, BLANK_UUID, masterVersion))
> vdsOK(s.detachStorageDomain(sd, sp, BLANK_UUID, masterVersion))
> vdsOK(s.formatStorageDomain(sd))
> vdsOK(s.spmStop(sp))
> 
> The flow above looks wrong to me when it comes to the master domain.
> For sake of completeness let's assume that we are using sanlock as lock manager
> (but the idea applies to safelease too).
> Since we can have attached domains that are not active (eg: storage in
> maintenance) the acquire/release host id (lockspace) was wired into
> activate/deactivate storage domain.
> Without a lockspace (host id) you cannot hold a resource (SPM).
> Since the pool cluster lock is kept into the master domain either you migrate
> it somewhere else (to maintain the SPM status) before
> deactivating/detaching/formatting or you just destroy the storage domain:
> 
> tid = vdsOK(s.spmStart(sp, -1, -1, -1, 0))['uuid']
> waitTask(s, tid)
> 
> vdsOK(s.destroyStoragePool(sp, hostID, sp_key))
> vdsOK(s.formatStorageDomain(sd))

I agree that the flow is wrong as far as master domain is concerned, but that is not the issue.
The failing flow as I understand it is:
createSD
createPool
connectPool
spmStart
*any* spm command

last part fails on 'Not SPM'

Comment 5 Federico Simoncelli 2012-02-13 12:36:02 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > vdsOK(s.deactivateStorageDomain(sd, sp, BLANK_UUID, masterVersion))
> > vdsOK(s.detachStorageDomain(sd, sp, BLANK_UUID, masterVersion))
> > vdsOK(s.formatStorageDomain(sd))
> > vdsOK(s.spmStop(sp))
> > 
> > The flow above looks wrong to me when it comes to the master domain.
> > For sake of completeness let's assume that we are using sanlock as lock manager
> > (but the idea applies to safelease too).
> > Since we can have attached domains that are not active (eg: storage in
> > maintenance) the acquire/release host id (lockspace) was wired into
> > activate/deactivate storage domain.
> > Without a lockspace (host id) you cannot hold a resource (SPM).
> > Since the pool cluster lock is kept into the master domain either you migrate
> > it somewhere else (to maintain the SPM status) before
> > deactivating/detaching/formatting or you just destroy the storage domain:
> > 
> > tid = vdsOK(s.spmStart(sp, -1, -1, -1, 0))['uuid']
> > waitTask(s, tid)
> > 
> > vdsOK(s.destroyStoragePool(sp, hostID, sp_key))
> > vdsOK(s.formatStorageDomain(sd))
> 
> I agree that the flow is wrong as far as master domain is concerned, but that
> is not the issue.
> The failing flow as I understand it is:
> createSD
> createPool
> connectPool
> spmStart
> *any* spm command
> 
> last part fails on 'Not SPM'

That is because inside deactivateStorageDomain (for the msd) there is an "hidden" stopSpm (if you deactivate the domain where you hold the SPM resource you lose it).
Then detachStorageDomain fails on 'Not SPM'.

Comment 6 Federico Simoncelli 2012-02-13 12:43:52 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > tid = vdsOK(s.spmStart(sp, -1, -1, -1, 0))['uuid']
> > > waitTask(s, tid)
> > > 
> > > vdsOK(s.destroyStoragePool(sp, hostID, sp_key))
> > > vdsOK(s.formatStorageDomain(sd))
> > 
> > I agree that the flow is wrong as far as master domain is concerned, but that
> > is not the issue.
> > The failing flow as I understand it is:
> > createSD
> > createPool
> > connectPool
> > spmStart
> > *any* spm command
> > 
> > last part fails on 'Not SPM'
> 
> That is because inside deactivateStorageDomain (for the msd) there is an
> "hidden" stopSpm (if you deactivate the domain where you hold the SPM resource
> you lose it).
> Then detachStorageDomain fails on 'Not SPM'.

Wait let me rephrase, anyway we need a patch, either we want to explicitly forbid the msd deactivation/detachment or we want to make deactivateStorageDomain succeed and lose the spm status (there is a minor code issue to fix for that).
I'd go for the first solution (even if it's a broader change that involves the manager too).

Comment 7 Federico Simoncelli 2012-02-13 16:56:52 UTC
Closing as clone of bug 790014. Adam if you feel that your issue is different feel free to reopen.

*** This bug has been marked as a duplicate of bug 790014 ***