Created attachment 1353502 [details] logs Description of problem: I got to a situation in which the master domain was not synchronized between the DB and VDSM. I ended up with 2 master storage domain in the same storage pool. Version-Release number of selected component (if applicable): ovirt-engine-4.2.0-0.0.master.20171114111003.git7aa1b91.el7.centos.noarch collectd-postgresql-5.7.2-3.el7.x86_64 postgresql-jdbc-9.2.1002-5.el7.noarch rh-postgresql95-postgresql-libs-9.5.7-2.el7.x86_64 postgresql-libs-9.2.23-1.el7_4.x86_64 rh-postgresql95-postgresql-server-9.5.7-2.el7.x86_64 rh-postgresql95-postgresql-9.5.7-2.el7.x86_64 rh-postgresql95-runtime-2.2-2.el7.x86_64 How reproducible: Can't say, don't have steps to reproduce either Actual results: Master domain is not synced between engine and VDSM 2017-11-16 15:19:36,195+0200 ERROR (jsonrpc/4) [storage.Dispatcher] FINISH connectStoragePool error=Wrong Master domain or its version: u'SD=e5525ed6-0970-49da-b022-5d9c342d928b, pool=2669a 110-2c7b-45d9-86f2-4ce975a04ba2' (dispatcher:82) 2017-11-16 15:50:34,742+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3112) [] EVENT_ID: SYSTEM_MASTER_DOMAIN_NOT_IN _SYNC(990), Sync Error on Master Domain between Host host_mixed_2 and oVirt Engine. Domain: nfs_0 is marked as Master in oVirt Engine database but not on the Storage side. Please consult wi th Support on how to fix this issue. 2 master domains in the same storage pool. Reconstruct master takes place in an endless loop. Expected results: In case the master domain is not synced between DB and VDSM, reconstruct should not take place. Additional info: storage_pool_iso_map table: storage_id | storage_pool_id | status --------------------------------------+--------------------------------------+-------- e5525ed6-0970-49da-b022-5d9c342d928b | 2669a110-2c7b-45d9-86f2-4ce975a04ba2 | 0 925d34aa-6664-4183-b0dd-d894a2fe261d | 2669a110-2c7b-45d9-86f2-4ce975a04ba2 | 0 70bb6e39-a072-4638-8b7c-306a4bc54cbc | 2669a110-2c7b-45d9-86f2-4ce975a04ba2 | 0 2e760c3d-d521-443b-8556-71d6bd553063 | 2669a110-2c7b-45d9-86f2-4ce975a04ba2 | 0
Assigning for investigation, but without steps to reproduce, I'm not sure what we can do anything meaningful.
*** Bug 1524146 has been marked as a duplicate of this bug. ***
In the duplicate bug 1524146, the master storage domain is of GlusterFS type - not sure if this is material to the situation, but worth checking.
*** Bug 1547048 has been marked as a duplicate of this bug. ***
We're encountering this bug quite a lot in our testing environments. Raising severity and priority
(In reply to Elad from comment #5) > We're encountering this bug quite a lot in our testing environments. Raising > severity and priority Do you have logs from the last occurrence?
Based on our latest automation runs (tier1-2-3 for 4.2.3) we are planning to mark this bug as verified (didn't witness anything abnormal). Fred, Are there any specific areas/operations that we should pay attention to?
Natalie, Thanks for the update. The most important flow that needs to be tested in addition would be moving the master domain to maintenance and see that another domain takes the role.
Following commant8 and comment9 moving to verified. Builds used (rhv-4.2.3-1): ovirt-engine-4.2.3-0.1.el7.noarch vdsm-4.20.23-1.el7ev.x86_64 Automation TP: test_domain_lifecycle (includes reconstruct master) passed successfully.
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.