Bug 1387085

Summary: Hosted-Engine iSCSI target logged in twice on activated Host (to be solved via new HE installation flow)
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: vdsmAssignee: Maor <mlipchuk>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: low Docs Contact:
Priority: high    
Version: 4.0.3CC: gveitmic, lsurette, rabraham, srevivo, stirabos, tnisan, ycui, ykaul, ylavi
Target Milestone: ovirt-4.2.0Keywords: TestOnly, Triaged
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-15 17:49:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1393902, 1455169    
Bug Blocks:    

Description Germano Veit Michel 2016-10-20 03:50:11 UTC
Description of problem:

When the host boots up it connects to the Hosted Engine storage, according to the configurations in /etc/ovirt-hosted-engine/hosted-engine.conf.

# cat /etc/ovirt-hosted-engine/hosted-engine.conf | egrep 'iqn|connection'
iqn=iqn.2003-01.org.linux-iscsi.storage.x8664:hostedengine
connectionUUID=97d0b390-c93c-4ab9-a418-64f26608a691

The result is:

360014052d03c99fec334a71a88308fb6 dm-8 LIO-ORG ,hostedengine    
size=60G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 7:0:0:0  sdf 8:80  active ready running

tcp: [1] 192.168.100.1:3260,1 iqn.2003-01.org.linux-iscsi.storage.x8664:hostedengine

Fine. But then the host is activated, the engine tells vdsm to connect to the same storage again. I believe the problem is that vdsm does not recognize this is the same storage that the ha-agent asked it to connect to, and connects again:

360014052d03c99fec334a71a88308fb6 dm-8 LIO-ORG ,hostedengine    
size=60G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 7:0:0:0  sdf 8:80  active ready running
|-+- policy='service-time 0' prio=1 status=enabled
| `- 8:0:0:0  sdg 8:96  active ready running

tcp: [1] 192.168.100.1:3260,1 iqn.2003-01.org.linux-iscsi.storage.x8664:hostedengine
tcp: [2] 192.168.100.1:3260,1 iqn.2003-01.org.linux-iscsi.storage.x8664:hostedengine

Putting the host into maintenance mode makes the "Activation" connection disconnect, and it goes back to 1 (the one requested by ha-agent).

I believe this is not desirable, as later adding some fancier iscsi bond multipath configurations (load balancing perhaps?) might think these are actually two different paths, when they are just two identical connections that do not provide any form of performance of reliability benefits, in fact, it's just wasting resources.

Version-Release number of selected component (if applicable):
ovirt-engine-4.0.4
vdsm-4.18.11-1.el7ev.x86_64

How reproducible:
100%

Comment 1 Sandro Bonazzola 2016-11-07 10:05:40 UTC
Allon, what's the issue here? I see there are 2 storage listed above.

Comment 2 Allon Mureinik 2016-11-07 13:33:24 UTC
(In reply to Sandro Bonazzola from comment #1)
> Allon, what's the issue here? I see there are 2 storage listed above.
Germano?

Comment 3 Germano Veit Michel 2016-11-08 05:29:35 UTC
(In reply to Allon Mureinik from comment #2)
> (In reply to Sandro Bonazzola from comment #1)
> > Allon, what's the issue here? I see there are 2 storage listed above.
> Germano?

Hmmm....I thought comment #0 was quite clear. Sorry.

In the case of Hosted-Engine Storage Domain, we are logging in TWICE to the exact same thing (one when ha-agent asks vdsm to connect to storage, and another when rhev-m activates the host). It's not big deal (Low Severity), but I am afraid this is not right.

I found this in my env and I think it is not desirable to have this duplicate entry. If one configures some fancier multipath (failover/load balancing stuff) it may get in the way. Ideally when the host is activated the host should not login again to the exact same target it's already logged in, via the same IP and connection. Not sure if vdsm should figure this out or the engine shouldn't tell vdsm to connect again.

All clear?

Comment 4 Tal Nisan 2016-11-22 16:38:01 UTC
Simone, any idea why the double login?

Comment 5 Simone Tiraboschi 2016-11-22 16:47:40 UTC
Exactly as in the bug description:
the hosted-engine storage domain gets logged in once by ovirt-ha-agent just after the boot and then it gets logged it a second time by the engine once we have an engine.

Not sure we wan't to remove this behavior as it's the base for:
https://bugzilla.redhat.com/show_bug.cgi?id=1267807

The idea is to simply let the agent connect the first/initial path, and then let the engine (once it's active) connect other using the iSCSI bond features.

Comment 6 Simone Tiraboschi 2017-07-31 15:48:16 UTC
I think that the point is that ovirt-ha-agent ignores the undocumented 'netIfaceName' parameter.

See: https://bugzilla.redhat.com/show_bug.cgi?id=1193961#c33

Comment 7 Allon Mureinik 2017-11-15 15:33:40 UTC
Pushing to 4.2.1, as the zero-node installation is still not ready

Comment 8 Yaniv Lavi 2017-12-27 13:11:24 UTC
Zero node should solve this, please test with that flow.

Comment 9 Nikolai Sednev 2018-03-18 15:53:49 UTC
Works for me on these components on ansible deployment:
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

I saw only single active session towards the storage after deployment was finished.

alma03 ~]# cat /etc/ovirt-hosted-engine/hosted-engine.conf | egrep 'iqn|connection'
connectionUUID=e29cf818-5ee5-46e1-85c1-8aeefa33e95d
iqn=iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00
alma03 ~]# iscsiadm -m session
tcp: [1] 10.35.146.129:3260,1 iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00 (non-flash)
alma03 ~]# multipath -ll
3514f0c5a51601655 dm-0 XtremIO ,XtremApp        
size=70G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  `- 6:0:0:1 sdb 8:16 active ready running

Comment 14 errata-xmlrpc 2018-05-15 17:49:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Comment 15 Franta Kust 2019-05-16 13:05:51 UTC
BZ<2>Jira Resync