Created attachment 609654 [details] log Description of problem: when a domain is only partially inaccessible (some of the luns are visible and some are not) the even log does show which lun's are not visible. Version-Release number of selected component (if applicable): si16 How reproducible: 100% Steps to Reproduce: 1. in a two host's cluster have a storage domain which has luns from 2 different storage servers -> block connectivity to one of the storage servers from one of the host 2. 3. Actual results: even log reports domain as problematic but does not show which lun is inaccessible. Expected results: we should report which lun is inaccessible in event log Additional info: backend log backend log does show it: 2012-09-04 12:41:23,891 INFO [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [756aab3e] The lun with id HXT9pz-3stk-TPSL-Irup-5P31-887A-3AZMHm was reported as problematic !
Fix is to make sure that the domain appears in the event log and not only in the log.
http://gerrit.ovirt.org/8028
merged change id I6ba766c552a56940c4559b4cd73702627ff13eed
moving to verified on si20. I can see the error in event log: The error code for connection 10.35.64.106 (LUN w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following Failed to login to iSCSI node due to authorization failure
(In reply to comment #7) > moving to verified on si20. I can see the error in event log: > > The error code for connection 10.35.64.106 (LUN > w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following > Failed to login to iSCSI node due to authorization failure Is this enough information to find out the LUN on the storage side?
reopening since we are getting pv/vg uuid which we cannot find easily in storage server iqn would be useful information for the user.
An example of a device from VDSM's getDeviceList: {'GUID': '360a98000572d45366b4a53724369584e', 'capacity': '322163441664', 'devtype': 'iSCSI', 'fwrev': '0.2', 'logicalblocksize': '512', 'partitioned': False, 'pathlist': [{'connection': '10.35.66.11', 'initiator': 'iqn.1994-05.com.redhat:d482fda7ce', 'iqn': 'iqn.1992-08.com.netapp:sn.151709391', 'password': '', 'port': '3260', 'portal': '1000', 'user': ''}], 'pathstatus': [{'lun': '1', 'physdev': 'sdi', 'state': 'active', 'type': 'iSCSI'}], 'physicalblocksize': '512', 'productID': 'LUN', 'pvUUID': '1MyP2n-7cWs-4J5h-TgXy-hxoA-0b7F-eUf3Ot', 'serial': 'SNETAPP_LUN_W-E6kJSrCiXN', 'vendorID': 'NETAPP', 'vgUUID': 'wuW3NF-54ta-VWwN-sH13-hMLf-97Co-N6na0q'}, I'm wondering which fields need to be passed up the chain to the RHEVM event, to make it possible for a storage admin, from looking at the log, understand what LUN we are talking about. Candidates I see are vgUUID, VendorID, serial, pVUUID, iqn, GUID. I know some do not always have content, and perhaps we don't need some (no need for both serial and vendor ID?).
Since the error is reporting on connections for the lun, how about iqn, like this? The error code for connection 192.168.0.10 iqn.2012-08.localdomain.ovirt:iscsi1 (LUN gZTlrV-oiSl-YkbM-VWW4-2boE-vXTO-PnGQNH) returned by VDSM was following Failed to login to iSCSI node due to authorization failure I went ahead and submitted a patch for this here: http://gerrit.ovirt.org/8598
Merged If6f4f4c5e176e06cceeda7a7fed29565eead369b
moving back to devel - perhaps the fix was not merged on si21.1 the only error I see is this one: Host gold-vdsc cannot access one of the Storage Domains attached to the Data Center iSCSI. Setting Host state to Non-Operational. the domain blocked is a domain that has luns on two different storage servers and only one of the servers was blocked.
Dafna, please attach engine log, the message you quoted above is not even the one you saw in comment 7 so either engine thinks there is no LUN Greg, I totally missed this in the review. In: http://gerrit.ovirt.org/#/c/8598/2/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/StorageHelperBase.java Line 138: lun.getphysical_volume_id() We should not be printing the pv uuid, it is meaningless. We should be printing the device GUID.
Created attachment 630912 [details] log
http://gerrit.ovirt.org/8880 Got rid of the pv uuid, new example message: VDSM returned an error for connection 192.168.0.10 iqn.2012-08.localdo main.ovirt:iscsi2 (LUN 1IET_0002000a): Failed to login to iSCSI node due to authorization failure
new error shown is: The error message for connection 10.35.64.10 Dafna-30 (LUN 1Dafna-371358338) returned by VDSM was: Failed to setup iSCSI subsystem it would have been helpful to have the domain name in the error, but its something that the user can see in the storage tab or DC tab. verified on si26
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0211.html