Bug 1514025 - DC report two master storage domain.
Summary: DC report two master storage domain.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.1.7.6
Hardware: x86_64
OS: Unspecified
high
urgent
Target Milestone: ovirt-4.2.3
: 4.2.3
Assignee: Fred Rolland
QA Contact: Natalie Gavrielov
URL:
Whiteboard:
: 1524146 1547048 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-16 13:59 UTC by Elad
Modified: 2018-05-10 06:31 UTC (History)
8 users (show)

Fixed In Version: ovirt-engine-4.2.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-10 06:31:02 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+
ylavi: ovirt-4.3+
rule-engine: blocker+


Attachments (Terms of Use)
logs (2.32 MB, application/x-gzip)
2017-11-16 13:59 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 89010 0 master MERGED engine: Check deactivate SD result 2020-12-30 09:23:01 UTC
oVirt gerrit 89947 0 ovirt-engine-4.2 MERGED engine: Check deactivate SD result 2020-12-30 09:23:01 UTC

Description Elad 2017-11-16 13:59:51 UTC
Created attachment 1353502 [details]
logs

Description of problem:
I got to a situation in which the master domain was not synchronized between the DB and VDSM. I ended up with 2 master storage domain in the same storage pool.

 

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20171114111003.git7aa1b91.el7.centos.noarch
collectd-postgresql-5.7.2-3.el7.x86_64
postgresql-jdbc-9.2.1002-5.el7.noarch
rh-postgresql95-postgresql-libs-9.5.7-2.el7.x86_64
postgresql-libs-9.2.23-1.el7_4.x86_64
rh-postgresql95-postgresql-server-9.5.7-2.el7.x86_64
rh-postgresql95-postgresql-9.5.7-2.el7.x86_64
rh-postgresql95-runtime-2.2-2.el7.x86_64

How reproducible:
Can't say, don't have steps to reproduce either


Actual results:
Master domain is not synced between engine and VDSM

2017-11-16 15:19:36,195+0200 ERROR (jsonrpc/4) [storage.Dispatcher] FINISH connectStoragePool error=Wrong Master domain or its version: u'SD=e5525ed6-0970-49da-b022-5d9c342d928b, pool=2669a
110-2c7b-45d9-86f2-4ce975a04ba2' (dispatcher:82)

2017-11-16 15:50:34,742+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3112) [] EVENT_ID: SYSTEM_MASTER_DOMAIN_NOT_IN
_SYNC(990), Sync Error on Master Domain between Host host_mixed_2 and oVirt Engine. Domain: nfs_0 is marked as Master in oVirt Engine database but not on the Storage side. Please consult wi
th Support on how to fix this issue.


2 master domains in the same storage pool. 
Reconstruct master takes place in an endless loop.

Expected results:
In case the master domain is not synced between DB and VDSM, reconstruct should not take place. 


Additional info:


storage_pool_iso_map table:


              storage_id              |           storage_pool_id            | status 
--------------------------------------+--------------------------------------+--------
 e5525ed6-0970-49da-b022-5d9c342d928b | 2669a110-2c7b-45d9-86f2-4ce975a04ba2 |      0
 925d34aa-6664-4183-b0dd-d894a2fe261d | 2669a110-2c7b-45d9-86f2-4ce975a04ba2 |      0
 70bb6e39-a072-4638-8b7c-306a4bc54cbc | 2669a110-2c7b-45d9-86f2-4ce975a04ba2 |      0
 2e760c3d-d521-443b-8556-71d6bd553063 | 2669a110-2c7b-45d9-86f2-4ce975a04ba2 |      0

Comment 1 Allon Mureinik 2017-11-16 14:47:05 UTC
Assigning for investigation, but without steps to reproduce, I'm not sure what we can do anything meaningful.

Comment 2 Tal Nisan 2017-12-11 11:13:13 UTC
*** Bug 1524146 has been marked as a duplicate of this bug. ***

Comment 3 Allon Mureinik 2018-01-15 09:27:25 UTC
In the duplicate bug 1524146, the master storage domain is of GlusterFS type - not sure if this is material to the situation, but worth checking.

Comment 4 Tal Nisan 2018-02-20 12:52:51 UTC
*** Bug 1547048 has been marked as a duplicate of this bug. ***

Comment 5 Elad 2018-03-06 15:12:07 UTC
We're encountering this bug quite a lot in our testing environments. Raising severity and priority

Comment 6 Fred Rolland 2018-03-07 09:01:30 UTC
(In reply to Elad from comment #5)
> We're encountering this bug quite a lot in our testing environments. Raising
> severity and priority

Do you have logs from the last occurrence?

Comment 8 Natalie Gavrielov 2018-04-17 13:20:14 UTC
Based on our latest automation runs (tier1-2-3 for 4.2.3) we are planning to mark this bug as verified (didn't witness anything abnormal).

Fred,

Are there any specific areas/operations that we should pay attention to?

Comment 9 Fred Rolland 2018-04-17 13:31:06 UTC
Natalie,

Thanks for the update.

The most important flow that needs to be tested in addition would be moving the master domain to maintenance and see that another domain takes the role.

Comment 10 Natalie Gavrielov 2018-04-17 15:28:45 UTC
Following commant8 and comment9 moving to verified.
Builds used (rhv-4.2.3-1):
ovirt-engine-4.2.3-0.1.el7.noarch
vdsm-4.20.23-1.el7ev.x86_64

Automation TP: 
test_domain_lifecycle (includes reconstruct master) passed successfully.

Comment 11 Sandro Bonazzola 2018-05-10 06:31:02 UTC
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.