Bug 1094023 - [engine-backend] [iSCSI multipath] Internal engine error when vdsm fails to connect to storage server with IscsiNodeError
Summary: [engine-backend] [iSCSI multipath] Internal engine error when vdsm fails to c...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Maor
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks: 1102687 rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2014-05-04 12:54 UTC by Elad
Modified: 2016-02-10 17:44 UTC (History)
10 users (show)

Fixed In Version: ovirt-3.5.0-alpha2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1102687 (view as bug list)
Environment:
Last Closed: 2014-08-10 15:11:33 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs from engine and host (307.05 KB, application/x-gzip)
2014-05-04 12:54 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 27359 0 master MERGED core: Connect Storage Pool on iSCSI update Never

Description Elad 2014-05-04 12:54:30 UTC
Created attachment 892281 [details]
logs from engine and host

Description of problem:
Configured an iSCSI multipath bond. I tried to replace the networks which are participating in the bond and failed because vdsm was unable to connect to the storage server via the replacing network. This failure wasn't caught right in engine, I got this in webadmin:

Operation Canceled
Error while executing action EditIscsiBond: Internal Engine Error

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
On a shared DC with active iSCSI storage domain(s):
1. Create 3 new networks and attach them to the cluster with required check-box checked
2. Attach the networks to the cluster's hosts NICs 
3. Create a new iSCSI multipath bond (under DC tab -> pick the relevant DC -> iSCSI multipath sub-tab -> new) and add 2 of the new networks along which the targets to it
4. Maintenance the iSCSI domain and activate it so the connection to the storage will be done from the new networks
5. After the iSCSI domain is active, edit the multipath bond, uncheck the checked networks an pick the third network. Click 'Ok'
6. VDSM will fail to perform the operation with IscsiNodeError

Actual results:
Engine doesn't know how to handle with the error from vdsm and throws the following error message in the log which represents in an internal engine error in webadmin:

2014-05-04 15:39:56,678 ERROR [org.ovirt.engine.core.bll.storage.EditIscsiBondCommand] (ajp-/127.0.0.1:8702-10) [37fea003] Command org.ovirt.engine.core.bll.storage.EditIscsiBon
dCommand throw exception: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.en
gine.core.vdsbroker.vdsbroker.VDSNetworkException: java.util.concurrent.TimeoutException (Failed with error VDS_NETWORK_ERROR and code 5022)
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil.invokeAll(ThreadPoolUtil.java:205) [utils.jar:]
        at org.ovirt.engine.core.bll.storage.BaseIscsiBondCommand.connectAllHostsToStorage(BaseIscsiBondCommand.java:56) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.EditIscsiBondCommand.executeCommand(EditIscsiBondCommand.java:70) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1123) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1208) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1884) [bll.jar:]

Expected results:
Engine should know how to handle with such an error from vdsm, to notify user and to revert the operation

Additional info: logs from engine and host

The error in vdsm:

Thread-1846::ERROR::2014-05-04 15:40:44,971::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 359, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/share/vdsm/storage/iscsi.py", line 166, in addIscsiNode
    iscsiadm.node_login(iface.name, portalStr, targetName)
  File "/usr/share/vdsm/storage/iscsiadm.py", line 295, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (8, ['Logging in to [iface: eth0.1, target: iqn.2008-05.com.xtremio:001e675b8ee1, portal: 10.35.160.3,3260] (multiple)'], ['iscsiadm: Could not login to [iface:
eth0.1, target: iqn.2008-05.com.xtremio:001e675b8ee1, portal: 10.35.160.3,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into
 all portals'])

Comment 1 Elad 2014-05-04 12:59:32 UTC
Version-Release number of selected component (if applicable):
AV7
vdsm-4.14.7-0.1.beta3.el6ev.x86_64
rhevm-3.4.0-0.15.beta3.el6ev.noarch

Comment 2 Maor 2014-05-29 11:08:57 UTC
The fix should now add a log indication the following:
"Could not connect Host {hostName} - {hostId} to Iscsi Storage Server."
Following the error got from VDSM.

The operation of editing the network should eventually succeed even though we got exception from VDSM in the middle.

Comment 4 Elad 2014-08-10 15:11:33 UTC
Since this bug was fixed but I'm unable to get the iscsiNodeError in vdsm as part of network replacement inside the iscsi bond, which was trigger the internal engine error, I'm closing this bug as UPSTREAM


Note You need to log in before you can comment on or make changes to this bug.