Bug 1263695
Summary: | [engine-backend] AddStoragePoolWithStorages fails with NullPointerException after iSCSI connection failure | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Elad <ebenahar> | ||||
Component: | ovirt-engine | Assignee: | Maor <mlipchuk> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.6.0 | CC: | amureini, bgraveno, gklein, lsurette, rbalakri, Rhev-m-bugs, tnisan, yeylon, ykaul | ||||
Target Milestone: | ovirt-3.6.1 | ||||||
Target Release: | 3.6.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Previously when importing a storage domain and the engine was unable to read the storage domain metadata, it threw an exception. Fixed the query checking so that if the engine is unable to read the meta data it provides a warning. The VDSM will cause the operation to fail naturally when being attached.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | Type: | Bug | |||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Worth to remember: Once the host fails to read the Storage Domain metadata, for validating if the Storage Domain is attached to another Data Center, the engine should not stop the attach operation (and of course not through an NPE on the process) but try to connect the Storage Domain to the DC and if in the worst case the Storage Domain is still attached to another Data Center, VDSM should fail the operation. A vdsm failure to connect to storage server (iSCSI) is handled right by engine. iSCSI login failure: jsonrpc.Executor/3::ERROR::2015-12-01 11:14:48,378::hsm::2465::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2462, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 480, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 201, in addIscsiNode iscsiadm.node_login(iface.name, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 314, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.225,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.225,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals']) Engine: Operation Canceled Error while executing action Attach Storage Domain: Network error during communication with the Host. Tested using: rhevm-3.6.1-0.2.el6.noarch vdsm-4.17.11-0.el7ev.noarch RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE RHEV 3.6.0 has been released, setting status to CLOSED CURRENTRELEASE |
Created attachment 1073998 [details] logs from engine and host Description of problem: A vdsm failure to connect to storage server while creating a storage pool (first storage domain in the DC) is not handled right by engine. The CanDoActionFailure fails with a NullPointerException. Happened on a hosted-engine environment though I don't think it's relevant. Version-Release number of selected component (if applicable): rhevm-3.6.0-12 rhevm-3.6.0-0.15.master.el6.noarch vdsm-4.17.5-1.el7ev.noarch How reproducible: Need a vdsm failure to connect to the storage server (login to iSCSI target in my case) Steps to Reproduce: 1. Activate 2 hosts in a cluster 2. Initiate iSCSI storage domain creation and cause one of the hosts to fail its login to the iSCSI server (happened spontaneously in my case but could be reproduced using iptables/firewalld connectivity block during the iSCSI login) Actual results: One of the hosts fails to connect to the storage server. Thread-2831::ERROR::2015-09-16 06:13:30,496::hsm::2454::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2451, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 473, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/share/vdsm/storage/iscsi.py", line 201, in addIscsiNode iscsiadm.node_login(iface.name, portalStr, target.iqn) File "/usr/share/vdsm/storage/iscsiadm.py", line 314, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (8, ['Logging in to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.225,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, targe t: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.225,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals']) Thread-2831::DEBUG::2015-09-16 06:13:30,496::hsm::2478::Storage.HSM::(connectStorageServer) knownSDs: {} This failure is not handled right in engine. The CanDoActionFailure fails with a NullPointerException and in Webadmin we get 'Internal engine error' message. 2015-09-16 00:08:28,784 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-11) [7c4f4cb1] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host hosted_engine_1 is not responding. Host cannot be fenced automatically because power management for the host is disabled. 2015-09-16 00:08:28,785 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (ajp-/127.0.0.1:8702-6) [] Error during CanDoActionFailure.: java.lang.NullPointerException at org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand.isStorageDomainAttachedToStoragePool(AddStoragePoolWithStoragesCommand.java:367) [bll.jar:] Expected results: The CanDoActionFailure should succeed. Additional info: logs from engine and host