Bug 1094033

Summary:

[engine-backend] [iscsi multipath] After networks replacement in an iSCSI multipath bond had failed, the bond's networks aren't being updated back

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Elad <ebenahar>

Component:

ovirt-engine

Assignee:

Maor <mlipchuk>

Status:

CLOSED UPSTREAM

QA Contact:

Elad <ebenahar>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.4.0

CC:

amureini, gklein, iheim, lpeer, rbalakri, Rhev-m-bugs, scohen, yeylon

Target Milestone:

---

Keywords:

ZStream

Target Release:

3.5.0

Flags:

amureini: Triaged+

Hardware:

x86_64

OS:

Unspecified

Whiteboard:

storage

Fixed In Version:

ovirt-3.5.0_rc1.1

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1127007 (view as bug list)

Environment:

Last Closed:

2014-09-16 06:45:54 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1127007, 1142923, 1156165

Attachments:

Description	Flags
logs from engine and vdsm	none

Description Elad 2014-05-04 15:20:48 UTC

Created attachment 892293 [details]
logs from engine and vdsm

Description of problem:
Even though networks replacement operation had failed because of a failure in vdsm to re-connect to the storage server using a different network (reported here https://bugzilla.redhat.com/show_bug.cgi?id=1094025), engine doesn't revert the operation. The networks in the bond aren't being updated back to the ones before the change which failed.

Version-Release number of selected component (if applicable):
AV7
vdsm-4.14.7-0.1.beta3.el6ev.x86_64
rhevm-3.4.0-0.15.beta3.el6ev.noarch


How reproducible:
Always

Steps to Reproduce:
On a shared DC with active iSCSI storage domain(s):
1. Create 3 new networks and attach them to the cluster with required check-box checked
2. Attach the networks to the cluster's hosts NICs 
3. Create a new iSCSI multipath bond (under DC tab -> pick the relevant DC -> iSCSI multipath sub-tab -> new) and add 2 of the new networks along which the targets to it
4. Maintenance the iSCSI domain and activate it so the connection to the storage will be done from the new networks
5. After the iSCSI domain is active, edit the multipath bond, uncheck the checked networks an pick the third network. Click 'Ok'
6. VDSM will fail to perform the operation with IscsiNodeError.

Actual results:
Engine fails to catch the error from vdsm (as reported here https://bugzilla.redhat.com/show_bug.cgi?id=1094023). After the failure, the networks under the bond are the new ones even though the operation failed. Engine doesn't perform roll-back to the failed bond iSCSI multipath update.

Expected results:
If the iSCSI multipath bond update had failed, engine should revert the changes and the update shouldn't be partial.

Additional info: logs from engine and vdsm

Error on vdsm:

Thread-1846::ERROR::2014-05-04 15:40:44,971::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 359, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/share/vdsm/storage/iscsi.py", line 166, in addIscsiNode
    iscsiadm.node_login(iface.name, portalStr, targetName)
  File "/usr/share/vdsm/storage/iscsiadm.py", line 295, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (8, ['Logging in to [iface: eth0.1, target: iqn.2008-05.com.xtremio:001e675b8ee1, portal: 10.35.160.3,3260] (multiple)'], ['iscsiadm: Could not login to [iface:
eth0.1, target: iqn.2008-05.com.xtremio:001e675b8ee1, portal: 10.35.160.3,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into
 all portals'])



Error in engine:

2014-05-04 15:39:56,678 ERROR [org.ovirt.engine.core.bll.storage.EditIscsiBondCommand] (ajp-/127.0.0.1:8702-10) [37fea003] Command org.ovirt.engine.core.bll.storage.EditIscsiBon
dCommand throw exception: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.en
gine.core.vdsbroker.vdsbroker.VDSNetworkException: java.util.concurrent.TimeoutException (Failed with error VDS_NETWORK_ERROR and code 5022)

Comment 3 Maor 2014-08-05 16:12:27 UTC

After discussing the bug with Elad and Allon, we have decided that we should not rollback the update of the iSCSI bond since there might be a scenario when some of the hosts will be able to connect to the storage and some don't.

Instead we will let the user decide whether he would like to keep the situation as it is, or he can also decide to rollback him self.

I've added a warning audit log to indicate that the operation succeeded but the engine encountered some issues with the storage connection.

Comment 5 Elad 2014-09-16 06:45:54 UTC

I'm unable to get to a situation in which vdsm fails to connect to the storage server via its network that participates in the iscsi bond like explain in the description (as happened in bz #1102687). Therefore, closing as UPSTREAM