Bug 1157239

Summary: [hosted-engine] [iSCSI support] Putting an iSCSI domain in maintenance while the hosted-engine is installed on a LUN from the same storage server causes the setup to become non operational
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-hosted-engine-setupAssignee: Tal Nisan <tnisan>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: acanan, amureini, dfediuck, didi, ebenahar, ecohen, eedri, gklein, iheim, lsurette, lveyde, sbonazzo, scohen, stirabos, tnisan
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-hosted-engine-setup-1.2.1-5.el7ev Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1067162, 1157243    
Attachments:
Description Flags
logs none

Description Elad 2014-10-26 13:32:15 UTC
Created attachment 950796 [details]
logs

Description of problem:
On a hosted-engine setup installed using iSCSI, I created an iSCSI storage domain located on the same server where the engine's disk is located. I set the domain to maintenance mode. The whole hosted-engine setup became inactive beacause the host disconnected from the storage server where the engine's VM disk is located.

Version-Release number of selected component (if applicable):
rhev 3.5 vt7
rhel6.6 host
ovirt-hosted-engine-setup-1.2.1-1.el6ev.noarch
rhevm-3.5.0-0.17.beta.el6ev.noarch
vdsm-4.16.7.1-1.el6ev.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted-engine using iSCSI
2. Create an iSCSI storage domain using a LUN from the same storage server where the engine's VM disk is located. Create one more storage domain (nfs)
3. Put the iSCSI domain in maintenance

Actual results:
The whole setup becomes inactive because the host had disconnected from the iSCSI storage server

Thread-2551::INFO::2014-10-26 15:07:42,210::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=1, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'connection': u'lion.q
a.lab.tlv.redhat.com:/export/elad/1', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'password': '******', u'id': u'529690d4-1afd-4ebb-a3f9-91009c157496', u'port': u''}], options=None)
Thread-2551::DEBUG::2014-10-26 15:07:42,211::mount::227::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /bin/umount -f -l /rhev/data-center/mnt/lion.qa.lab.tlv.redhat.com:_export_elad_1 (cwd None)



Thread-2694::ERROR::2014-10-26 15:09:16,505::API::1699::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1679, in _getHaInfo
    stats = instance.get_all_stats()
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 100, in get_all_stats
    stats = broker.get_stats_from_storage(service)
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage
    result = self._checked_communicate(request)
  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate
    .format(message or response))
RequestError: Request failed: <type 'exceptions.OSError'>



Expected results:
When putting an iSCSI storage domain in maintenance while the engine's VM disk is located on the same storage server, the host should not disconnect from the targets

Workaround: 
Recoonecting to the iSCSI sessions using iscsiadm


Additional info: logs

Comment 1 Tal Nisan 2014-10-28 10:32:04 UTC
We currently don't have any indication in the engine of this LUN. There has to be a way for the engine to know this LUN before we can block this operation, moving to integration to set up this data so the storage flows can use it.

Comment 2 Doron Fediuck 2014-10-28 13:37:56 UTC
Will you get a different behaviour when working with NFS?

Comment 3 Elad 2014-10-28 15:05:10 UTC
(In reply to Doron Fediuck from comment #2)
> Will you get a different behaviour when working with NFS?

No, because on NFS, maintenance domain will umount the specific export path of the storage domain.

Comment 4 Sandro Bonazzola 2014-11-05 13:01:28 UTC
http://gerrit.ovirt.org/34783 add the HE disk to the engine.
Now the lun is known to the engine, any other change required on hosted-engine side?

Moving back to storage.

Comment 5 Allon Mureinik 2014-11-18 18:38:25 UTC
Tal, if the SD and HE's LUN Disk share a connection, shouldn't this be already solved?

Comment 6 Tal Nisan 2014-11-19 13:42:54 UTC
Iirc upon disconnection there is a check if the connections to disconnect are not used by other entities in the system and those are filtered out

Comment 7 Allon Mureinik 2014-11-19 17:37:37 UTC
(In reply to Tal Nisan from comment #6)
> Iirc upon disconnection there is a check if the connections to disconnect
> are not used by other entities in the system and those are filtered out

So, in other words, this is already solved?

Comment 8 Tal Nisan 2014-11-20 13:14:04 UTC
Should be, yes

Comment 11 Elad 2014-12-01 08:24:01 UTC
Moving an iSCSI domain to maintenance, while the engine disk is located on the same storage server, doesn't cause to a disconnection from the storage server since of the existance of the LUN, contains the engine disk, in the DB.

Verified using rhev3.5 vt12

Comment 12 Allon Mureinik 2015-02-16 19:13:32 UTC
RHEV-M 3.5.0 has been released, closing this bug.

Comment 13 Allon Mureinik 2015-02-16 19:13:35 UTC
RHEV-M 3.5.0 has been released, closing this bug.