Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1344314 - "VDSM HOST2 command failed: Cannot find master domain" after adding storage
"VDSM HOST2 command failed: Cannot find master domain" after adding storage
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.6.7
x86_64 Linux
high Severity high
: ovirt-4.0.1
: 4.0.1
Assigned To: Maor
Carlos Mestre González
:
: 1330827 (view as bug list)
Depends On:
Blocks: Gluster-HC-1 1349404 1349405
  Show dependency treegraph
 
Reported: 2016-06-09 07:59 EDT by Dusan Fodor
Modified: 2016-08-23 16:41 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1349404 1349405 (view as bug list)
Environment:
Last Closed: 2016-08-23 16:41:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
screenshot of events and logfiles (849.98 KB, application/x-gzip)
2016-06-09 07:59 EDT, Dusan Fodor
no flags Details
correct version of vdsm log file (768.78 KB, application/x-xz)
2016-06-09 11:24 EDT, Dusan Fodor
no flags Details
vdsm.log for both hosts and engine. (1.45 MB, application/x-gzip)
2016-07-12 06:29 EDT, Carlos Mestre González
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 59169 master MERGED core: Remove redundant else 2016-06-15 11:08 EDT
oVirt gerrit 59170 master MERGED core: Connect all hosts on adding storage pool. 2016-06-15 11:59 EDT
oVirt gerrit 59171 master MERGED core: Remove redundant else when calling registerOvfStoreDisks 2016-06-15 11:59 EDT
oVirt gerrit 59263 ovirt-engine-4.0 MERGED core: Remove redundant else 2016-06-16 05:01 EDT
oVirt gerrit 59264 ovirt-engine-4.0 MERGED core: Connect all hosts on adding storage pool. 2016-06-16 05:01 EDT
oVirt gerrit 59265 ovirt-engine-4.0 MERGED core: Remove redundant else when calling registerOvfStoreDisks 2016-06-16 05:01 EDT
oVirt gerrit 59269 ovirt-engine-3.6 MERGED core: Remove redundant else 2016-06-16 05:05 EDT
oVirt gerrit 59270 ovirt-engine-3.6 MERGED core: Connect all hosts on adding storage pool. 2016-06-16 05:05 EDT
oVirt gerrit 59271 ovirt-engine-3.6 MERGED core: Remove redundant else when calling registerOvfStoreDisks 2016-06-16 05:05 EDT
oVirt gerrit 59272 ovirt-engine-3.6.7 MERGED core: Remove redundant else 2016-06-16 05:07 EDT
oVirt gerrit 59273 ovirt-engine-3.6.7 MERGED core: Connect all hosts on adding storage pool. 2016-06-16 05:07 EDT
oVirt gerrit 59274 ovirt-engine-3.6.7 MERGED core: Remove redundant else when calling registerOvfStoreDisks 2016-06-16 05:07 EDT
Red Hat Product Errata RHEA-2016:1743 normal SHIPPED_LIVE Red Hat Virtualization Manager 4.0 GA Enhancement (ovirt-engine) 2016-09-02 17:54:01 EDT

  None (edit)
Description Dusan Fodor 2016-06-09 07:59:55 EDT
Created attachment 1166248 [details]
screenshot of events and logfiles

Description of problem:
Following error appears in UI when trying to configure storage; everything seems to work fine though
VDSM HOST2 command failed: Cannot find master domain: u'spUUID=83958f66-8809-4260-a13f-9ba2a4619f2a, msdUUID=6e1cf774-8caf-4f5d-9491-f01e618cf656'

Version-Release number of selected component (if applicable):
3.6.7-4

How reproducible:
Tried twice, same result

Steps to Reproduce:
1. Create DC
2. Create 2 Clusters
3. Create Host on each
4. Attach NFS storage
5. See error


Actual results:
Storage attached successfully with error message displayed

Expected results:
Same, without showing error message

Additional info:
Comment 1 Allon Mureinik 2016-06-09 08:36:12 EDT
This seems to happen during disk registration.
Maor, can you take a look please?
Comment 3 Maor 2016-06-09 09:53:50 EDT
Hi Dusan,

It looks like the logs are not synced.
In the engine, the error is at 2016-06-08 19:13:09,230
and the VDSM log only starts at 2016-06-09 08:01:01,906

Can you please add all the relevant vdsm logs.

Also which VDSM version are you using?

Thanks
Comment 4 Dusan Fodor 2016-06-09 11:24 EDT
Created attachment 1166350 [details]
correct version of vdsm log file

Hi,

Sorry, i didn't notice it was rewriten.
Correct version (according to timestamp) attached.
The VDSM version is 4.17.31

Thanks
Comment 5 Maor 2016-06-13 04:58:42 EDT
The issue here is that the Data Center contained two Hosts in an uninitialized Data Center.

It looks like createStoragePool was being done on Host1, while connectStoragePool was being done on Host2, and that caused connectStoragePool to fail since only Host1 knew the master domain:

  2016-06-08 19:12:41,796 .... ConnectStorageServerVDSCommand] START, ConnectStorageServerVDSCommand(HostName = HOST1

  2016-06-08 19:12:43,001 INFO ...CreateStoragePoolVDSCommand] START, CreateStoragePoolVDSCommand(HostName = HOST1

  2016-06-08 19:13:07,791 INFO ....ConnectStoragePoolVDSCommand] START, ConnectStoragePoolVDSCommand(HostName = HOST2

When calling ConnectStoragePoolVDSCommand we get the Host by using the method IrsProxyData#getPrioritizedVdsInPool, this method gives a random Host, and that Host might not be the one that called CreateStoragePoolVDSCommand and update its storage domains cache with the master SD.
Comment 6 Sahina Bose 2016-06-20 08:05:27 EDT
*** Bug 1330827 has been marked as a duplicate of this bug. ***
Comment 8 Carlos Mestre González 2016-07-11 06:15:50 EDT
No error is shown in the events, but I checked the vdsm logs and I see the error, versions: vdsm-4.18.5.1-1.el7ev.x86_64 rhevm-4.0.2-0.2.rc1.el7ev.noarch

1. A DC, a cluster and two hosts connect to that cluster, add an nfs domain, I see an error in one of the hosts:

ioprocess communication (23127)::INFO::2016-07-11 13:07:26,080::__init__::447::IOProcess::(_pr
ocessLogs) Starting ioprocess
ioprocess communication (23127)::INFO::2016-07-11 13:07:26,080::__init__::447::IOProcess::(_pr
ocessLogs) Starting ioprocess
jsonrpc.Executor/6::ERROR::2016-07-11 13:07:26,081::sdc::146::Storage.StorageDomainCache::(_fi
ndDomain) domain 058eedf5-e92b-47a8-a3b7-a5c5db6baac1 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('058eedf5-e92b-47a8-a3b7-a5c5db6baa
c1',)
jsonrpc.Executor/6::INFO::2016-07-11 13:07:26,082::nfsSD::70::Storage.StorageDomain::(create) 
sdUUID=058eedf5-e92b-47a8-a3b7-a5c5db6baac1 domainName=test_nfs remotePath=10.35.64.11:/vol/RH
EV/Storage/storage_jenkins_ge19_nfs_3 domClass=1

Maor, is this important? Operation seems fine
Comment 9 Maor 2016-07-11 08:27:04 EDT
I'm not sure if it's related to that issue, does it happen consistently?
Comment 10 Carlos Mestre González 2016-07-12 06:29 EDT
Created attachment 1178845 [details]
vdsm.log for both hosts and engine.

Yes, tested it twice and saw it both times.

DC with Cluster and two hosts, add one NFS domain (choose host_1) and the error shows in host_1:

jsonrpc.Executor/3::ERROR::2016-07-12 13:22:40,918::sdc::146::Storage.StorageDomainCache::(_findDomain) domain d388f7ee-ac92-4ce8-9a7a-81d4e69adb16 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('d388f7ee-ac92-4ce8-9a7a-81d4e69adb16',)
jsonrpc.Executor/3::INFO::2016-07-12 13:22:40,920::nfsSD::70::Storage.StorageDomain::(create) sdUUID=d388f7ee-ac92-4ce8-9a7a-81d4e69adb16 domainName=nfs_test_dc remotePath=10.35.64.11:/vol/RHEV/Storage/storage_jenkins_ge19_nfs_4 domClass=1
jsonrpc.Executor/3::DEBUG::2016-07-12 13:22:40,928::outOfProcess::69::Storage.oop::(getProcessPool) Creating ioprocess d388f7ee-ac92-4ce8-9a7a-81d4e69adb16
jsonrpc.Executor/3::INFO::2016-07-12 13:22:40,928::__init__::325::IOProcessClient::(__init__) Starting client ioprocess-3
jsonrpc.Executor/3::DEBUG::2016-07-12 13:22:40,928::__init__::334::IOProcessClient::(_run) Starting ioprocess for client ioprocess-3
Comment 11 Allon Mureinik 2016-07-12 07:34:01 EDT
This is an old logging issue we've had since oVrit 3.2 IIRC. I agree it's ugly, but there's no real effect there.

You can move the BZ to VERIFIED, thanks!
Comment 13 errata-xmlrpc 2016-08-23 16:41:40 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1743.html

Note You need to log in before you can comment on or make changes to this bug.