Bug 965648

Summary: engine: Error while executing action Remove Storage Domain: Internal Engine Error
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Tal Nisan <tnisan>
Status: CLOSED WONTFIX QA Contact: Aharon Canan <acanan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: abaron, abonas, acathrow, amureini, dron, iheim, jkt, lpeer, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Flags: amureini: Triaged+
Target Release: 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-29 13:36:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-05-21 13:07:55 UTC
Created attachment 751160 [details]
logs

Description of problem:

when trying to remove a domain that no longer exists in the storage but exists as unattached domain in rhevm we are getting error: Error while executing action Remove Storage Domain: Internal Engine Error
the engine logs shows: 

2013-05-21 16:01:40,456 ERROR [org.ovirt.engine.core.bll.storage.NFSStorageHelper] (ajp-/127.0.0.1:8702-3) [1b8168b6] The connection with details orion.qa.lab.tlv.redhat.com:/export/Dafna/data failed because of error code 100 and error 
message is: general exception


Version-Release number of selected component (if applicable):

sf17

How reproducible:

100%

Steps to Reproduce:
1. create an unattached nfs data domain
2. rm -rf the domain in the storage
3. try to remove the domain from rhevm

Actual results:

we are getting Internal Engine Error

Expected results:

we should get "domain does not exist" error. 

Additional info: logs


vdsm: 

Thread-43462::DEBUG::2013-05-21 16:01:44,502::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 orion.qa.lab.tlv.redhat.com:/export/Dafna/data /rhev/data-cen
ter/mnt/orion.qa.lab.tlv.redhat.com:_export_Dafna_data' (cwd None)
Thread-43462::ERROR::2013-05-21 16:01:44,561::hsm::2298::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2295, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 302, in connect
    return self._mountCon.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 208, in connect
    fileSD.validateDirAccess(self.getMountObj().getRecord().fs_file)
  File "/usr/share/vdsm/storage/mount.py", line 244, in getRecord
    (self.fs_spec, self.fs_file))
OSError: [Errno 2] Mount of `orion.qa.lab.tlv.redhat.com:/export/Dafna/data` at `/rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_Dafna_data` does not exist
Thread-43462::DEBUG::2013-05-21 16:01:44,563::hsm::2310::Storage.HSM::(connectStorageServer) knownSDs: {'3a3407c7-18b9-40d9-a3d3-dc4aa7e7dbf9': <function findDomain at 0x7fabad647cf8>, '1bbc27d0-9393-4a45-8ed2-e3eae55a7f14': <function f
indDomain at 0x7fabad647cf8>, '72ec1321-a114-451f-bee1-6790cbca1bc6': <function findDomain at 0x7fabad647cf8>}

Comment 1 Ayal Baron 2013-05-26 12:59:47 UTC
1. you've deleted the storage side, just destroy the domain?
2. the error has nothing to do with remove domain, it failed validating access to the mount.  Is it possible that you removed the export from the server?

Comment 2 Dafna Ron 2013-05-27 06:59:39 UTC
(In reply to Ayal Baron from comment #1)
> 1. you've deleted the storage side, just destroy the domain?

I did destroy the domain, there is no problem to destroy - this bug is on text only (internal engine error pops up for the user). 

> 2. the error has nothing to do with remove domain, it failed validating
> access to the mount.  Is it possible that you removed the export from the
> server?

the validation happens when we try to remove the domain, and yes, I did remove the domain in the storage. 
we already have a "domain does not exist" error when we validate mount during create domain so if we validate during remove we should probably have a "domain does not exists" as well instead of internal engine error message.

Comment 3 Alissa 2013-08-21 09:26:09 UTC
The problem is that the connection error is "swallowed" in RemoveStorageDomainCommand.
The connect() method (from one of the storage helpers) is called from executeCommand, however it returns only boolean without the actual result, thus not setting any error code/fault in the return value of the command, so the command just reports setSucceeded=false without any further details - resulting the generic error.

Comment 4 Ayal Baron 2014-01-29 13:36:55 UTC
User can 'destroy' the domain from the GUI