Bug 1384466 - Wrong error when attaching an NFS USI domain to default cluster: "Storage domain does not exist" instead of a mount error
Summary: Wrong error when attaching an NFS USI domain to default cluster: "Storage dom...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.0.5.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.1.0-beta
: 4.1.0.2
Assignee: Daniel Erez
QA Contact: Lilach Zitnitski
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-13 10:42 UTC by Sandro Bonazzola
Modified: 2017-02-01 14:55 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-02-01 14:55:50 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: planning_ack+
rule-engine: devel_ack+
ratamir: testing_ack+


Attachments (Terms of Use)
vdsm logs (43.16 KB, application/x-xz)
2016-10-13 10:42 UTC, Sandro Bonazzola
no flags Details
engine logs (26.36 KB, application/x-xz)
2016-10-13 10:43 UTC, Sandro Bonazzola
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 69902 0 master MERGED core: attach domain - abort on connection failure 2017-01-12 18:16:44 UTC
oVirt gerrit 70164 0 ovirt-engine-4.1 MERGED core: attach domain - abort on connection failure 2017-01-15 14:07:52 UTC

Description Sandro Bonazzola 2016-10-13 10:42:32 UTC
Created attachment 1210014 [details]
vdsm logs

Description of problem:
Trying to attach a NFS iso domain to default cluster configured as local cluster:
	

Failed to attach Storage Domain ISO_DOMAIN to Data Center Default. (User: admin@internal-authz)

VDSM command failed: Storage domain does not exist: (u'3c212448-3a55-46fb-a978-457c40ac0ad0',)

While I may expect that attaching a shared storage to a local cluster may not work, I expect that the error says that instead of telling that the storage doesn't exists.

Tested version: 4.0.5 RC2

Comment 1 Sandro Bonazzola 2016-10-13 10:43:01 UTC
Created attachment 1210015 [details]
engine logs

Comment 2 Allon Mureinik 2016-10-13 15:45:22 UTC
The underlying failure is a mount failure (presumably centos.home is not configured in the machine's dns/hosts file):

jsonrpc.Executor/5::INFO::2016-10-13 14:15:16,895::mount::226::storage.Mount::(mount) mounting centos.home:/var/lib/exports/iso at /rhev/data-center/mnt/centos.home:_var_lib_exports_iso
jsonrpc.Executor/5::ERROR::2016-10-13 14:15:17,083::hsm::2403::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2400, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 456, in connect
    return self._mountCon.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 238, in connect
    six.reraise(t, v, tb)
  File "/usr/share/vdsm/storage/storageServer.py", line 230, in connect
    self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 229, in mount
    timeout=timeout, cgroup=cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 53, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 51, in <lambda>
    **kwargs)
  File "<string>", line 2, in mount
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
MountError: (32, ';mount.nfs: Failed to resolve server centos.home: Name or service not known\n')

This has nothing to do with attaching an NFS domain to a local DC (which is perfectly legal, btw), but the error reporting should indeed be corrected.

Comment 3 Daniel Erez 2016-12-28 09:32:05 UTC
According to the engine log [1], the underlying failure has been logged beforehand and occurred by ConnectStorageServer.

The "Storage domain does not exist" error is originated from AttachStorageDomain (the next step in the flow).

@Liron - can we avoid invoking VDSCommandType.AttachStorageDomain when 'connectHostsInUpToDomainStorageServer()' fails on all hosts/spm?
(called from AttachStorageDomainToPoolCommand).

[1]
2016-10-13 14:15:54,866 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-32) [7a492a45] FINISH, ConnectStorageServerVDSCommand, return: {b3829d2d-f4d1-4878-98ec-2fa4e8a0e218=477}, log id: 196dfa8c
2016-10-13 14:15:54,869 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [7a492a45] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: The error message for connection centos.home:/var/lib/exports/iso returned by VDSM was: Problem while trying to mount target

Comment 4 Daniel Erez 2017-01-12 18:16:24 UTC
The suggested fix is to output an error message [*] to log and avoid attach storage attempt, when *all* hosts fail connecting to storage.


[*] "Cannot connect storage connection server, aborting attach storage domain operation"

Comment 5 Lilach Zitnitski 2017-01-26 08:33:00 UTC
--------------------------------------
Tested with the following code:
----------------------------------------
ovirt-engine-4.1.0.3-0.0.master.20170123115434.git8a69605.el7.centos.noarch
vdsm-4.19.2-2.el7ev.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. Block connection from (all) hosts to storage.
2. Try to attach an nfs storage domain.
3. Verify that the error appears in the log.

Actual results:
2017-01-26 09:51:57,947+02 ERROR [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-7-thread-33) [a718445c-7a5f-466d-b78a-0892d21c62a
7] Cannot connect storage connection server, aborting attach storage domain operation.

Expected results:

Moving to VERIFIED!


Note You need to log in before you can comment on or make changes to this bug.