Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1585460

Summary: [4.0 DC] DeactivateStorageDomain failed on SYSTEM_MASTER_DOMAIN_NOT_IN_SYNC
Product: [oVirt] ovirt-engine Reporter: Elad <ebenahar>
Component: BLL.StorageAssignee: Tal Nisan <tnisan>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Elad <ebenahar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.4.1CC: bugs, ebenahar, frolland
Target Milestone: ---Flags: ebenahar: needinfo-
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-07 12:22:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs
none
correct engine.log none

Description Elad 2018-06-03 08:50:51 UTC
Created attachment 1447145 [details]
logs

Description of problem:
On 4.0 data center, storage domain deactivation failed on SYSTEM_MASTER_DOMAIN_NOT_IN_SYNC

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.4-0.1.el7.noarch
vdsm-4.20.28-1.el7ev.x86_64

How reproducible:
Happened once

Steps to Reproduce:
1. Have a 4.0 DC with 1 host and 1 storage domain 
2. Deactivate the domain


Actual results:
Storage domain deactivation failed:

2018-05-29 07:50:14,685+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-46264) [] Master domain version is not in sync between DB and VDSM. Domain sd_TestCase18932_2907145929 marked as master, but the version in DB: 2 and in VDSM: 1
2018-05-29 07:50:14,692+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-46264) [] EVENT_ID: SYSTEM_MASTER_DOMAIN_NOT_IN_SYNC(990), Sync Error on Master Domain between Host host_mixed_2 and oVirt Engine. Domain: sd_TestCase18932_2907145929 is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.



2018-05-29 07:50:14,832+0300 INFO  (jsonrpc/1) [vdsm.api] START disconnectStoragePool(spUUID='a44cbb18-1c52-4488-8460-d19a4d885ed8', hostID=1, remove=False, options=None) from=::ffff:10.35.161.182,60500, flow_id
=6ec15182, task_id=efde5705-a71e-4732-b54e-941a0b995052 (api:46)
2018-05-29 07:50:14,833+0300 INFO  (jsonrpc/1) [vdsm.api] FINISH disconnectStoragePool error=Operation not allowed while SPM is active: ('a44cbb18-1c52-4488-8460-d19a4d885ed8',) from=::ffff:10.35.161.182,60500, 
flow_id=6ec15182, task_id=efde5705-a71e-4732-b54e-941a0b995052 (api:50)
2018-05-29 07:50:14,833+0300 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='efde5705-a71e-4732-b54e-941a0b995052') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in disconnectStoragePool
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1134, in disconnectStoragePool
    self.getPool(spUUID).validateNotSPM()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 151, in validateNotSPM
    raise se.IsSpm(self.spUUID)
IsSpm: Operation not allowed while SPM is active: ('a44cbb18-1c52-4488-8460-d19a4d885ed8',)


Expected results:
Storage domain deactivation should succeed

Additional info:
logs

Comment 1 Elad 2018-06-03 09:17:05 UTC
Created attachment 1447147 [details]
correct engine.log

Comment 2 Tal Nisan 2018-06-03 10:33:13 UTC
Fred, you're this week's QE contact, can you have a look please?

Comment 3 Fred Rolland 2018-06-03 13:57:59 UTC
Elad,
Does this happen only on 4.0 DC?

Comment 4 Fred Rolland 2018-06-03 13:59:08 UTC
I can see in the log that stopping the SPM failed:

2018-05-29 07:50:14,817+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-engine-Thread-46264) [6ec15182] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id 'a44cbb18-1c52-4488-8460-d19a4d885ed8' as there are uncleared tasks 'Task 'fedd4156-7d4f-4ddd-9a3f-92d99d02e353', status 'finished''

Comment 5 Elad 2018-06-03 14:01:16 UTC
We haven't encountered it in any other case, nor in any other occurrence of this specific test case (of 4.0 DC)

Comment 6 Tal Nisan 2018-06-03 14:20:54 UTC
Really hard to say what went on here from the logs, Freddy if you don't find anything of value in the logs and it doesn't reproduce I recommend closing it

Comment 7 Fred Rolland 2018-06-06 11:34:02 UTC
From the logs, the SPM could not be stopped due to an uncleared task.

I guess that an operation was still running then.

If you can reproduce, please provide a clear scenario.