1344314 – "VDSM HOST2 command failed: Cannot find master domain" after adding storage

Bug 1344314 - "VDSM HOST2 command failed: Cannot find master domain" after adding storage

Summary: "VDSM HOST2 command failed: Cannot find master domain" after adding storage

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.6.7
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.0.1
Target Release:	4.0.1
Assignee:	Maor
QA Contact:	Carlos Mestre González
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1330827 (view as bug list)
Depends On:
Blocks:	Gluster-HC-1 1349404 1349405
TreeView+	depends on / blocked

Reported:	2016-06-09 11:59 UTC by Dusan Fodor
Modified:	2016-08-23 20:41 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1349404 1349405 (view as bug list)
Environment:
Last Closed:	2016-08-23 20:41:40 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
screenshot of events and logfiles (849.98 KB, application/x-gzip) 2016-06-09 11:59 UTC, Dusan Fodor	no flags	Details
correct version of vdsm log file (768.78 KB, application/x-xz) 2016-06-09 15:24 UTC, Dusan Fodor	no flags	Details
vdsm.log for both hosts and engine. (1.45 MB, application/x-gzip) 2016-07-12 10:29 UTC, Carlos Mestre González	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1345787	unspecified	CLOSED	Logging: useless alert on engine side: command failed: Cannot find master domain: ... (followed with UUIDs)	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHEA-2016:1743	normal	SHIPPED_LIVE	Red Hat Virtualization Manager 4.0 GA Enhancement (ovirt-engine)	2016-09-02 21:54:01 UTC
oVirt gerrit	59169	master	MERGED	core: Remove redundant else	2020-06-29 18:38:28 UTC
oVirt gerrit	59170	master	MERGED	core: Connect all hosts on adding storage pool.	2020-06-29 18:38:28 UTC
oVirt gerrit	59171	master	MERGED	core: Remove redundant else when calling registerOvfStoreDisks	2020-06-29 18:38:27 UTC
oVirt gerrit	59263	ovirt-engine-4.0	MERGED	core: Remove redundant else	2020-06-29 18:38:27 UTC
oVirt gerrit	59264	ovirt-engine-4.0	MERGED	core: Connect all hosts on adding storage pool.	2020-06-29 18:38:27 UTC
oVirt gerrit	59265	ovirt-engine-4.0	MERGED	core: Remove redundant else when calling registerOvfStoreDisks	2020-06-29 18:38:27 UTC
oVirt gerrit	59269	ovirt-engine-3.6	MERGED	core: Remove redundant else	2020-06-29 18:38:27 UTC
oVirt gerrit	59270	ovirt-engine-3.6	MERGED	core: Connect all hosts on adding storage pool.	2020-06-29 18:38:27 UTC
oVirt gerrit	59271	ovirt-engine-3.6	MERGED	core: Remove redundant else when calling registerOvfStoreDisks	2020-06-29 18:38:27 UTC
oVirt gerrit	59272	ovirt-engine-3.6.7	MERGED	core: Remove redundant else	2020-06-29 18:38:27 UTC
oVirt gerrit	59273	ovirt-engine-3.6.7	MERGED	core: Connect all hosts on adding storage pool.	2020-06-29 18:38:27 UTC
oVirt gerrit	59274	ovirt-engine-3.6.7	MERGED	core: Remove redundant else when calling registerOvfStoreDisks	2020-06-29 18:38:29 UTC

Internal Links: 1345787

Description Dusan Fodor 2016-06-09 11:59:55 UTC

Created attachment 1166248 [details]
screenshot of events and logfiles

Description of problem:
Following error appears in UI when trying to configure storage; everything seems to work fine though
VDSM HOST2 command failed: Cannot find master domain: u'spUUID=83958f66-8809-4260-a13f-9ba2a4619f2a, msdUUID=6e1cf774-8caf-4f5d-9491-f01e618cf656'

Version-Release number of selected component (if applicable):
3.6.7-4

How reproducible:
Tried twice, same result

Steps to Reproduce:
1. Create DC
2. Create 2 Clusters
3. Create Host on each
4. Attach NFS storage
5. See error


Actual results:
Storage attached successfully with error message displayed

Expected results:
Same, without showing error message

Additional info:

Comment 1 Allon Mureinik 2016-06-09 12:36:12 UTC

This seems to happen during disk registration.
Maor, can you take a look please?

Comment 3 Maor 2016-06-09 13:53:50 UTC

Hi Dusan,

It looks like the logs are not synced.
In the engine, the error is at 2016-06-08 19:13:09,230
and the VDSM log only starts at 2016-06-09 08:01:01,906

Can you please add all the relevant vdsm logs.

Also which VDSM version are you using?

Thanks

Comment 4 Dusan Fodor 2016-06-09 15:24:01 UTC

Created attachment 1166350 [details]
correct version of vdsm log file

Hi,

Sorry, i didn't notice it was rewriten.
Correct version (according to timestamp) attached.
The VDSM version is 4.17.31

Thanks

Comment 5 Maor 2016-06-13 08:58:42 UTC

The issue here is that the Data Center contained two Hosts in an uninitialized Data Center.

It looks like createStoragePool was being done on Host1, while connectStoragePool was being done on Host2, and that caused connectStoragePool to fail since only Host1 knew the master domain:

  2016-06-08 19:12:41,796 .... ConnectStorageServerVDSCommand] START, ConnectStorageServerVDSCommand(HostName = HOST1

  2016-06-08 19:12:43,001 INFO ...CreateStoragePoolVDSCommand] START, CreateStoragePoolVDSCommand(HostName = HOST1

  2016-06-08 19:13:07,791 INFO ....ConnectStoragePoolVDSCommand] START, ConnectStoragePoolVDSCommand(HostName = HOST2

When calling ConnectStoragePoolVDSCommand we get the Host by using the method IrsProxyData#getPrioritizedVdsInPool, this method gives a random Host, and that Host might not be the one that called CreateStoragePoolVDSCommand and update its storage domains cache with the master SD.

Comment 6 Sahina Bose 2016-06-20 12:05:27 UTC

*** Bug 1330827 has been marked as a duplicate of this bug. ***

Comment 8 Carlos Mestre González 2016-07-11 10:15:50 UTC

No error is shown in the events, but I checked the vdsm logs and I see the error, versions: vdsm-4.18.5.1-1.el7ev.x86_64 rhevm-4.0.2-0.2.rc1.el7ev.noarch

1. A DC, a cluster and two hosts connect to that cluster, add an nfs domain, I see an error in one of the hosts:

ioprocess communication (23127)::INFO::2016-07-11 13:07:26,080::__init__::447::IOProcess::(_pr
ocessLogs) Starting ioprocess
ioprocess communication (23127)::INFO::2016-07-11 13:07:26,080::__init__::447::IOProcess::(_pr
ocessLogs) Starting ioprocess
jsonrpc.Executor/6::ERROR::2016-07-11 13:07:26,081::sdc::146::Storage.StorageDomainCache::(_fi
ndDomain) domain 058eedf5-e92b-47a8-a3b7-a5c5db6baac1 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('058eedf5-e92b-47a8-a3b7-a5c5db6baa
c1',)
jsonrpc.Executor/6::INFO::2016-07-11 13:07:26,082::nfsSD::70::Storage.StorageDomain::(create) 
sdUUID=058eedf5-e92b-47a8-a3b7-a5c5db6baac1 domainName=test_nfs remotePath=10.35.64.11:/vol/RH
EV/Storage/storage_jenkins_ge19_nfs_3 domClass=1

Maor, is this important? Operation seems fine

Comment 9 Maor 2016-07-11 12:27:04 UTC

I'm not sure if it's related to that issue, does it happen consistently?

Comment 10 Carlos Mestre González 2016-07-12 10:29:40 UTC

Created attachment 1178845 [details]
vdsm.log for both hosts and engine.

Yes, tested it twice and saw it both times.

DC with Cluster and two hosts, add one NFS domain (choose host_1) and the error shows in host_1:

jsonrpc.Executor/3::ERROR::2016-07-12 13:22:40,918::sdc::146::Storage.StorageDomainCache::(_findDomain) domain d388f7ee-ac92-4ce8-9a7a-81d4e69adb16 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('d388f7ee-ac92-4ce8-9a7a-81d4e69adb16',)
jsonrpc.Executor/3::INFO::2016-07-12 13:22:40,920::nfsSD::70::Storage.StorageDomain::(create) sdUUID=d388f7ee-ac92-4ce8-9a7a-81d4e69adb16 domainName=nfs_test_dc remotePath=10.35.64.11:/vol/RHEV/Storage/storage_jenkins_ge19_nfs_4 domClass=1
jsonrpc.Executor/3::DEBUG::2016-07-12 13:22:40,928::outOfProcess::69::Storage.oop::(getProcessPool) Creating ioprocess d388f7ee-ac92-4ce8-9a7a-81d4e69adb16
jsonrpc.Executor/3::INFO::2016-07-12 13:22:40,928::__init__::325::IOProcessClient::(__init__) Starting client ioprocess-3
jsonrpc.Executor/3::DEBUG::2016-07-12 13:22:40,928::__init__::334::IOProcessClient::(_run) Starting ioprocess for client ioprocess-3

Comment 11 Allon Mureinik 2016-07-12 11:34:01 UTC

This is an old logging issue we've had since oVrit 3.2 IIRC. I agree it's ugly, but there's no real effect there.

You can move the BZ to VERIFIED, thanks!

Comment 13 errata-xmlrpc 2016-08-23 20:41:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1743.html

Note You need to log in before you can comment on or make changes to this bug.