Bug 1094033

Summary: [engine-backend] [iscsi multipath] After networks replacement in an iSCSI multipath bond had failed, the bond's networks aren't being updated back
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Maor <mlipchuk>
Status: CLOSED UPSTREAM QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: amureini, gklein, iheim, lpeer, rbalakri, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.0Flags: amureini: Triaged+
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-3.5.0_rc1.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1127007 (view as bug list) Environment:
Last Closed: 2014-09-16 06:45:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1127007, 1142923, 1156165    
Attachments:
Description Flags
logs from engine and vdsm none

Description Elad 2014-05-04 15:20:48 UTC
Created attachment 892293 [details]
logs from engine and vdsm

Description of problem:
Even though networks replacement operation had failed because of a failure in vdsm to re-connect to the storage server using a different network (reported here https://bugzilla.redhat.com/show_bug.cgi?id=1094025), engine doesn't revert the operation. The networks in the bond aren't being updated back to the ones before the change which failed.

Version-Release number of selected component (if applicable):
AV7
vdsm-4.14.7-0.1.beta3.el6ev.x86_64
rhevm-3.4.0-0.15.beta3.el6ev.noarch


How reproducible:
Always

Steps to Reproduce:
On a shared DC with active iSCSI storage domain(s):
1. Create 3 new networks and attach them to the cluster with required check-box checked
2. Attach the networks to the cluster's hosts NICs 
3. Create a new iSCSI multipath bond (under DC tab -> pick the relevant DC -> iSCSI multipath sub-tab -> new) and add 2 of the new networks along which the targets to it
4. Maintenance the iSCSI domain and activate it so the connection to the storage will be done from the new networks
5. After the iSCSI domain is active, edit the multipath bond, uncheck the checked networks an pick the third network. Click 'Ok'
6. VDSM will fail to perform the operation with IscsiNodeError.

Actual results:
Engine fails to catch the error from vdsm (as reported here https://bugzilla.redhat.com/show_bug.cgi?id=1094023). After the failure, the networks under the bond are the new ones even though the operation failed. Engine doesn't perform roll-back to the failed bond iSCSI multipath update.

Expected results:
If the iSCSI multipath bond update had failed, engine should revert the changes and the update shouldn't be partial.

Additional info: logs from engine and vdsm

Error on vdsm:

Thread-1846::ERROR::2014-05-04 15:40:44,971::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 359, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/share/vdsm/storage/iscsi.py", line 166, in addIscsiNode
    iscsiadm.node_login(iface.name, portalStr, targetName)
  File "/usr/share/vdsm/storage/iscsiadm.py", line 295, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (8, ['Logging in to [iface: eth0.1, target: iqn.2008-05.com.xtremio:001e675b8ee1, portal: 10.35.160.3,3260] (multiple)'], ['iscsiadm: Could not login to [iface:
eth0.1, target: iqn.2008-05.com.xtremio:001e675b8ee1, portal: 10.35.160.3,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into
 all portals'])



Error in engine:

2014-05-04 15:39:56,678 ERROR [org.ovirt.engine.core.bll.storage.EditIscsiBondCommand] (ajp-/127.0.0.1:8702-10) [37fea003] Command org.ovirt.engine.core.bll.storage.EditIscsiBon
dCommand throw exception: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.en
gine.core.vdsbroker.vdsbroker.VDSNetworkException: java.util.concurrent.TimeoutException (Failed with error VDS_NETWORK_ERROR and code 5022)

Comment 3 Maor 2014-08-05 16:12:27 UTC
After discussing the bug with Elad and Allon, we have decided that we should not rollback the update of the iSCSI bond since there might be a scenario when some of the hosts will be able to connect to the storage and some don't.

Instead we will let the user decide whether he would like to keep the situation as it is, or he can also decide to rollback him self.

I've added a warning audit log to indicate that the operation succeeded but the engine encountered some issues with the storage connection.

Comment 5 Elad 2014-09-16 06:45:54 UTC
I'm unable to get to a situation in which vdsm fails to connect to the storage server via its network that participates in the iscsi bond like explain in the description (as happened in bz #1102687). Therefore, closing as UPSTREAM