Bug 1344314 - "VDSM HOST2 command failed: Cannot find master domain" after adding storage
Summary: "VDSM HOST2 command failed: Cannot find master domain" after adding storage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.6.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.0.1
: 4.0.1
Assignee: Maor
QA Contact: Carlos Mestre González
URL:
Whiteboard:
: 1330827 (view as bug list)
Depends On:
Blocks: Gluster-HC-1 1349404 1349405
TreeView+ depends on / blocked
 
Reported: 2016-06-09 11:59 UTC by Dusan Fodor
Modified: 2016-08-23 20:41 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1349404 1349405 (view as bug list)
Environment:
Last Closed: 2016-08-23 20:41:40 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screenshot of events and logfiles (849.98 KB, application/x-gzip)
2016-06-09 11:59 UTC, Dusan Fodor
no flags Details
correct version of vdsm log file (768.78 KB, application/x-xz)
2016-06-09 15:24 UTC, Dusan Fodor
no flags Details
vdsm.log for both hosts and engine. (1.45 MB, application/x-gzip)
2016-07-12 10:29 UTC, Carlos Mestre González
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1345787 0 unspecified CLOSED Logging: useless alert on engine side: command failed: Cannot find master domain: ... (followed with UUIDs) 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2016:1743 0 normal SHIPPED_LIVE Red Hat Virtualization Manager 4.0 GA Enhancement (ovirt-engine) 2016-09-02 21:54:01 UTC
oVirt gerrit 59169 0 master MERGED core: Remove redundant else 2020-06-29 18:38:28 UTC
oVirt gerrit 59170 0 master MERGED core: Connect all hosts on adding storage pool. 2020-06-29 18:38:28 UTC
oVirt gerrit 59171 0 master MERGED core: Remove redundant else when calling registerOvfStoreDisks 2020-06-29 18:38:27 UTC
oVirt gerrit 59263 0 ovirt-engine-4.0 MERGED core: Remove redundant else 2020-06-29 18:38:27 UTC
oVirt gerrit 59264 0 ovirt-engine-4.0 MERGED core: Connect all hosts on adding storage pool. 2020-06-29 18:38:27 UTC
oVirt gerrit 59265 0 ovirt-engine-4.0 MERGED core: Remove redundant else when calling registerOvfStoreDisks 2020-06-29 18:38:27 UTC
oVirt gerrit 59269 0 ovirt-engine-3.6 MERGED core: Remove redundant else 2020-06-29 18:38:27 UTC
oVirt gerrit 59270 0 ovirt-engine-3.6 MERGED core: Connect all hosts on adding storage pool. 2020-06-29 18:38:27 UTC
oVirt gerrit 59271 0 ovirt-engine-3.6 MERGED core: Remove redundant else when calling registerOvfStoreDisks 2020-06-29 18:38:27 UTC
oVirt gerrit 59272 0 ovirt-engine-3.6.7 MERGED core: Remove redundant else 2020-06-29 18:38:27 UTC
oVirt gerrit 59273 0 ovirt-engine-3.6.7 MERGED core: Connect all hosts on adding storage pool. 2020-06-29 18:38:27 UTC
oVirt gerrit 59274 0 ovirt-engine-3.6.7 MERGED core: Remove redundant else when calling registerOvfStoreDisks 2020-06-29 18:38:29 UTC

Internal Links: 1345787

Description Dusan Fodor 2016-06-09 11:59:55 UTC
Created attachment 1166248 [details]
screenshot of events and logfiles

Description of problem:
Following error appears in UI when trying to configure storage; everything seems to work fine though
VDSM HOST2 command failed: Cannot find master domain: u'spUUID=83958f66-8809-4260-a13f-9ba2a4619f2a, msdUUID=6e1cf774-8caf-4f5d-9491-f01e618cf656'

Version-Release number of selected component (if applicable):
3.6.7-4

How reproducible:
Tried twice, same result

Steps to Reproduce:
1. Create DC
2. Create 2 Clusters
3. Create Host on each
4. Attach NFS storage
5. See error


Actual results:
Storage attached successfully with error message displayed

Expected results:
Same, without showing error message

Additional info:

Comment 1 Allon Mureinik 2016-06-09 12:36:12 UTC
This seems to happen during disk registration.
Maor, can you take a look please?

Comment 3 Maor 2016-06-09 13:53:50 UTC
Hi Dusan,

It looks like the logs are not synced.
In the engine, the error is at 2016-06-08 19:13:09,230
and the VDSM log only starts at 2016-06-09 08:01:01,906

Can you please add all the relevant vdsm logs.

Also which VDSM version are you using?

Thanks

Comment 4 Dusan Fodor 2016-06-09 15:24:01 UTC
Created attachment 1166350 [details]
correct version of vdsm log file

Hi,

Sorry, i didn't notice it was rewriten.
Correct version (according to timestamp) attached.
The VDSM version is 4.17.31

Thanks

Comment 5 Maor 2016-06-13 08:58:42 UTC
The issue here is that the Data Center contained two Hosts in an uninitialized Data Center.

It looks like createStoragePool was being done on Host1, while connectStoragePool was being done on Host2, and that caused connectStoragePool to fail since only Host1 knew the master domain:

  2016-06-08 19:12:41,796 .... ConnectStorageServerVDSCommand] START, ConnectStorageServerVDSCommand(HostName = HOST1

  2016-06-08 19:12:43,001 INFO ...CreateStoragePoolVDSCommand] START, CreateStoragePoolVDSCommand(HostName = HOST1

  2016-06-08 19:13:07,791 INFO ....ConnectStoragePoolVDSCommand] START, ConnectStoragePoolVDSCommand(HostName = HOST2

When calling ConnectStoragePoolVDSCommand we get the Host by using the method IrsProxyData#getPrioritizedVdsInPool, this method gives a random Host, and that Host might not be the one that called CreateStoragePoolVDSCommand and update its storage domains cache with the master SD.

Comment 6 Sahina Bose 2016-06-20 12:05:27 UTC
*** Bug 1330827 has been marked as a duplicate of this bug. ***

Comment 8 Carlos Mestre González 2016-07-11 10:15:50 UTC
No error is shown in the events, but I checked the vdsm logs and I see the error, versions: vdsm-4.18.5.1-1.el7ev.x86_64 rhevm-4.0.2-0.2.rc1.el7ev.noarch

1. A DC, a cluster and two hosts connect to that cluster, add an nfs domain, I see an error in one of the hosts:

ioprocess communication (23127)::INFO::2016-07-11 13:07:26,080::__init__::447::IOProcess::(_pr
ocessLogs) Starting ioprocess
ioprocess communication (23127)::INFO::2016-07-11 13:07:26,080::__init__::447::IOProcess::(_pr
ocessLogs) Starting ioprocess
jsonrpc.Executor/6::ERROR::2016-07-11 13:07:26,081::sdc::146::Storage.StorageDomainCache::(_fi
ndDomain) domain 058eedf5-e92b-47a8-a3b7-a5c5db6baac1 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('058eedf5-e92b-47a8-a3b7-a5c5db6baa
c1',)
jsonrpc.Executor/6::INFO::2016-07-11 13:07:26,082::nfsSD::70::Storage.StorageDomain::(create) 
sdUUID=058eedf5-e92b-47a8-a3b7-a5c5db6baac1 domainName=test_nfs remotePath=10.35.64.11:/vol/RH
EV/Storage/storage_jenkins_ge19_nfs_3 domClass=1

Maor, is this important? Operation seems fine

Comment 9 Maor 2016-07-11 12:27:04 UTC
I'm not sure if it's related to that issue, does it happen consistently?

Comment 10 Carlos Mestre González 2016-07-12 10:29:40 UTC
Created attachment 1178845 [details]
vdsm.log for both hosts and engine.

Yes, tested it twice and saw it both times.

DC with Cluster and two hosts, add one NFS domain (choose host_1) and the error shows in host_1:

jsonrpc.Executor/3::ERROR::2016-07-12 13:22:40,918::sdc::146::Storage.StorageDomainCache::(_findDomain) domain d388f7ee-ac92-4ce8-9a7a-81d4e69adb16 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('d388f7ee-ac92-4ce8-9a7a-81d4e69adb16',)
jsonrpc.Executor/3::INFO::2016-07-12 13:22:40,920::nfsSD::70::Storage.StorageDomain::(create) sdUUID=d388f7ee-ac92-4ce8-9a7a-81d4e69adb16 domainName=nfs_test_dc remotePath=10.35.64.11:/vol/RHEV/Storage/storage_jenkins_ge19_nfs_4 domClass=1
jsonrpc.Executor/3::DEBUG::2016-07-12 13:22:40,928::outOfProcess::69::Storage.oop::(getProcessPool) Creating ioprocess d388f7ee-ac92-4ce8-9a7a-81d4e69adb16
jsonrpc.Executor/3::INFO::2016-07-12 13:22:40,928::__init__::325::IOProcessClient::(__init__) Starting client ioprocess-3
jsonrpc.Executor/3::DEBUG::2016-07-12 13:22:40,928::__init__::334::IOProcessClient::(_run) Starting ioprocess for client ioprocess-3

Comment 11 Allon Mureinik 2016-07-12 11:34:01 UTC
This is an old logging issue we've had since oVrit 3.2 IIRC. I agree it's ugly, but there's no real effect there.

You can move the BZ to VERIFIED, thanks!

Comment 13 errata-xmlrpc 2016-08-23 20:41:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1743.html


Note You need to log in before you can comment on or make changes to this bug.