Hide Forgot
Created attachment 560931 [details] Test case Description of problem: Since commit 1676396f18cf5c300d87e18169eba66cd39f0267, vdsm is not able to correctly acquire the SPM role when working with LOCALFS storage. Version-Release number of selected component (if applicable): 1676396f18cf5c300d87e18169eba66cd39f0267 How reproducible: Always Steps to Reproduce: 1. Run the attached python script on the host. Actual results: deactivateStorageDomain fails with the following error: Exception: {'status': {'message': 'Not SPM', 'code': 654}} Expected results: The script completes without errors. Additional info: See attached script.
Hi Adam, thanks for the bug report and the test case. In the future if you attach also the relevant log parts you'll speed up the bug triage (I might be able to understand the problem even without reproducing it).
Created attachment 561333 [details] vsdm.log snippet when the error occurs Hi Federico. After further investigation, this problem only seems to impact the master SD. In another test, I was able to create/attach/activate/deactivate/detach/format a secondary storage domain without issue but as soon as I tried doing the same to the master SD, I got the Not SPM error. Please find attached a section of vdsm.log that shows the problem. In this session I did the following xmlrpc commands: spmStart(sp, -1,-1,-1,0,0) getTaskInfo('d73642da-e524-4c51-aa97-a2f90b7ad68b') getTaskStatus('d73642da-e524-4c51-aa97-a2f90b7ad68b') deactivateStorageDomain('def32ac7-1e12-4823-8e8c-8c887333fe16', '6e4d6a96-d3da-419c-8905-b5eec55c44e2', '00000000-0000-0000-0000-000000000000', 1)
vdsOK(s.deactivateStorageDomain(sd, sp, BLANK_UUID, masterVersion)) vdsOK(s.detachStorageDomain(sd, sp, BLANK_UUID, masterVersion)) vdsOK(s.formatStorageDomain(sd)) vdsOK(s.spmStop(sp)) The flow above looks wrong to me when it comes to the master domain. For sake of completeness let's assume that we are using sanlock as lock manager (but the idea applies to safelease too). Since we can have attached domains that are not active (eg: storage in maintenance) the acquire/release host id (lockspace) was wired into activate/deactivate storage domain. Without a lockspace (host id) you cannot hold a resource (SPM). Since the pool cluster lock is kept into the master domain either you migrate it somewhere else (to maintain the SPM status) before deactivating/detaching/formatting or you just destroy the storage domain: tid = vdsOK(s.spmStart(sp, -1, -1, -1, 0))['uuid'] waitTask(s, tid) vdsOK(s.destroyStoragePool(sp, hostID, sp_key)) vdsOK(s.formatStorageDomain(sd))
(In reply to comment #3) > vdsOK(s.deactivateStorageDomain(sd, sp, BLANK_UUID, masterVersion)) > vdsOK(s.detachStorageDomain(sd, sp, BLANK_UUID, masterVersion)) > vdsOK(s.formatStorageDomain(sd)) > vdsOK(s.spmStop(sp)) > > The flow above looks wrong to me when it comes to the master domain. > For sake of completeness let's assume that we are using sanlock as lock manager > (but the idea applies to safelease too). > Since we can have attached domains that are not active (eg: storage in > maintenance) the acquire/release host id (lockspace) was wired into > activate/deactivate storage domain. > Without a lockspace (host id) you cannot hold a resource (SPM). > Since the pool cluster lock is kept into the master domain either you migrate > it somewhere else (to maintain the SPM status) before > deactivating/detaching/formatting or you just destroy the storage domain: > > tid = vdsOK(s.spmStart(sp, -1, -1, -1, 0))['uuid'] > waitTask(s, tid) > > vdsOK(s.destroyStoragePool(sp, hostID, sp_key)) > vdsOK(s.formatStorageDomain(sd)) I agree that the flow is wrong as far as master domain is concerned, but that is not the issue. The failing flow as I understand it is: createSD createPool connectPool spmStart *any* spm command last part fails on 'Not SPM'
(In reply to comment #4) > (In reply to comment #3) > > vdsOK(s.deactivateStorageDomain(sd, sp, BLANK_UUID, masterVersion)) > > vdsOK(s.detachStorageDomain(sd, sp, BLANK_UUID, masterVersion)) > > vdsOK(s.formatStorageDomain(sd)) > > vdsOK(s.spmStop(sp)) > > > > The flow above looks wrong to me when it comes to the master domain. > > For sake of completeness let's assume that we are using sanlock as lock manager > > (but the idea applies to safelease too). > > Since we can have attached domains that are not active (eg: storage in > > maintenance) the acquire/release host id (lockspace) was wired into > > activate/deactivate storage domain. > > Without a lockspace (host id) you cannot hold a resource (SPM). > > Since the pool cluster lock is kept into the master domain either you migrate > > it somewhere else (to maintain the SPM status) before > > deactivating/detaching/formatting or you just destroy the storage domain: > > > > tid = vdsOK(s.spmStart(sp, -1, -1, -1, 0))['uuid'] > > waitTask(s, tid) > > > > vdsOK(s.destroyStoragePool(sp, hostID, sp_key)) > > vdsOK(s.formatStorageDomain(sd)) > > I agree that the flow is wrong as far as master domain is concerned, but that > is not the issue. > The failing flow as I understand it is: > createSD > createPool > connectPool > spmStart > *any* spm command > > last part fails on 'Not SPM' That is because inside deactivateStorageDomain (for the msd) there is an "hidden" stopSpm (if you deactivate the domain where you hold the SPM resource you lose it). Then detachStorageDomain fails on 'Not SPM'.
(In reply to comment #5) > (In reply to comment #4) > > (In reply to comment #3) > > > tid = vdsOK(s.spmStart(sp, -1, -1, -1, 0))['uuid'] > > > waitTask(s, tid) > > > > > > vdsOK(s.destroyStoragePool(sp, hostID, sp_key)) > > > vdsOK(s.formatStorageDomain(sd)) > > > > I agree that the flow is wrong as far as master domain is concerned, but that > > is not the issue. > > The failing flow as I understand it is: > > createSD > > createPool > > connectPool > > spmStart > > *any* spm command > > > > last part fails on 'Not SPM' > > That is because inside deactivateStorageDomain (for the msd) there is an > "hidden" stopSpm (if you deactivate the domain where you hold the SPM resource > you lose it). > Then detachStorageDomain fails on 'Not SPM'. Wait let me rephrase, anyway we need a patch, either we want to explicitly forbid the msd deactivation/detachment or we want to make deactivateStorageDomain succeed and lose the spm status (there is a minor code issue to fix for that). I'd go for the first solution (even if it's a broader change that involves the manager too).
Closing as clone of bug 790014. Adam if you feel that your issue is different feel free to reopen. *** This bug has been marked as a duplicate of bug 790014 ***