Bug 1094023

Summary: [engine-backend] [iSCSI multipath] Internal engine error when vdsm fails to connect to storage server with IscsiNodeError
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Maor <mlipchuk>
Status: CLOSED UPSTREAM QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: acanan, acathrow, amureini, gklein, iheim, lpeer, mlipchuk, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 3.5.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-3.5.0-alpha2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1102687 (view as bug list) Environment:
Last Closed: 2014-08-10 15:11:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1102687, 1142923, 1156165    
Attachments:
Description Flags
logs from engine and host none

Description Elad 2014-05-04 12:54:30 UTC
Created attachment 892281 [details]
logs from engine and host

Description of problem:
Configured an iSCSI multipath bond. I tried to replace the networks which are participating in the bond and failed because vdsm was unable to connect to the storage server via the replacing network. This failure wasn't caught right in engine, I got this in webadmin:

Operation Canceled
Error while executing action EditIscsiBond: Internal Engine Error

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
On a shared DC with active iSCSI storage domain(s):
1. Create 3 new networks and attach them to the cluster with required check-box checked
2. Attach the networks to the cluster's hosts NICs 
3. Create a new iSCSI multipath bond (under DC tab -> pick the relevant DC -> iSCSI multipath sub-tab -> new) and add 2 of the new networks along which the targets to it
4. Maintenance the iSCSI domain and activate it so the connection to the storage will be done from the new networks
5. After the iSCSI domain is active, edit the multipath bond, uncheck the checked networks an pick the third network. Click 'Ok'
6. VDSM will fail to perform the operation with IscsiNodeError

Actual results:
Engine doesn't know how to handle with the error from vdsm and throws the following error message in the log which represents in an internal engine error in webadmin:

2014-05-04 15:39:56,678 ERROR [org.ovirt.engine.core.bll.storage.EditIscsiBondCommand] (ajp-/127.0.0.1:8702-10) [37fea003] Command org.ovirt.engine.core.bll.storage.EditIscsiBon
dCommand throw exception: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.en
gine.core.vdsbroker.vdsbroker.VDSNetworkException: java.util.concurrent.TimeoutException (Failed with error VDS_NETWORK_ERROR and code 5022)
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil.invokeAll(ThreadPoolUtil.java:205) [utils.jar:]
        at org.ovirt.engine.core.bll.storage.BaseIscsiBondCommand.connectAllHostsToStorage(BaseIscsiBondCommand.java:56) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.EditIscsiBondCommand.executeCommand(EditIscsiBondCommand.java:70) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1123) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1208) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1884) [bll.jar:]

Expected results:
Engine should know how to handle with such an error from vdsm, to notify user and to revert the operation

Additional info: logs from engine and host

The error in vdsm:

Thread-1846::ERROR::2014-05-04 15:40:44,971::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 359, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/share/vdsm/storage/iscsi.py", line 166, in addIscsiNode
    iscsiadm.node_login(iface.name, portalStr, targetName)
  File "/usr/share/vdsm/storage/iscsiadm.py", line 295, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (8, ['Logging in to [iface: eth0.1, target: iqn.2008-05.com.xtremio:001e675b8ee1, portal: 10.35.160.3,3260] (multiple)'], ['iscsiadm: Could not login to [iface:
eth0.1, target: iqn.2008-05.com.xtremio:001e675b8ee1, portal: 10.35.160.3,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into
 all portals'])

Comment 1 Elad 2014-05-04 12:59:32 UTC
Version-Release number of selected component (if applicable):
AV7
vdsm-4.14.7-0.1.beta3.el6ev.x86_64
rhevm-3.4.0-0.15.beta3.el6ev.noarch

Comment 2 Maor 2014-05-29 11:08:57 UTC
The fix should now add a log indication the following:
"Could not connect Host {hostName} - {hostId} to Iscsi Storage Server."
Following the error got from VDSM.

The operation of editing the network should eventually succeed even though we got exception from VDSM in the middle.

Comment 4 Elad 2014-08-10 15:11:33 UTC
Since this bug was fixed but I'm unable to get the iscsiNodeError in vdsm as part of network replacement inside the iscsi bond, which was trigger the internal engine error, I'm closing this bug as UPSTREAM