Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 854214

Summary: engine: logging - when domain is partially inaccessible event log does not report which luns are inaccessible
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Greg Padgett <gpadgett>
Status: CLOSED ERRATA QA Contact: Dafna Ron <dron>
Severity: medium Docs Contact:
Priority: high    
Version: 3.1.0CC: abaron, amureini, dyasny, hateya, iheim, lpeer, mkenneth, Rhev-m-bugs, yeylon, ykaul, zdover
Target Milestone: ---Keywords: ZStream
Target Release: 3.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: SI21 Doc Type: Bug Fix
Doc Text:
Previously, when a storage domain was partially inaccessible, the event log reported that a domain was problematic but did not report which LUNs were inaccessible. For instance, when a storage domain contained LUNs from two different storage servers and the connection to one of those LUNs was inaccessible, the error message did not tell you which of the LUNs was inaccessible. Now, when a storage domain is partially inaccessible because one of the LUNs in it is inaccessible, the event log records an error that includes information that you can use to determine which of the LUNs is inaccessible.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-04 23:33:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log
none
log none

Description Dafna Ron 2012-09-04 11:52:31 UTC
Created attachment 609654 [details]
log

Description of problem:

when a domain is only partially inaccessible (some of the luns are visible and some are not) the even log does show which lun's are not visible. 

Version-Release number of selected component (if applicable):

si16

How reproducible:

100%

Steps to Reproduce:
1. in a two host's cluster have a storage domain which has luns from 2 different storage servers -> block connectivity to one of the storage servers from one of the host
2.
3.
  
Actual results:

even log reports domain as problematic but does not show which lun is inaccessible. 

Expected results:

we should report which lun is inaccessible in event log

Additional info: backend log

backend log does show it:

2012-09-04 12:41:23,891 INFO  [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [756aab3e] The lun with id HXT9pz-3stk-TPSL-Irup-5P31-887A-3AZMHm was reported as problematic !

Comment 1 Ayal Baron 2012-09-05 09:42:30 UTC
Fix is to make sure that the domain appears in the event log and not only in the log.

Comment 2 Greg Padgett 2012-09-18 01:53:10 UTC
http://gerrit.ovirt.org/8028

Comment 4 Allon Mureinik 2012-09-29 12:40:02 UTC
merged change id I6ba766c552a56940c4559b4cd73702627ff13eed

Comment 7 Dafna Ron 2012-10-14 13:59:47 UTC
moving to verified on si20. I can see the error in event log: 

The error code for connection 10.35.64.106 (LUN w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following Failed to login to iSCSI node due to authorization failure

Comment 8 Yaniv Kaul 2012-10-14 14:04:18 UTC
(In reply to comment #7)
> moving to verified on si20. I can see the error in event log: 
> 
> The error code for connection 10.35.64.106 (LUN
> w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following
> Failed to login to iSCSI node due to authorization failure

Is this enough information to find out the LUN on the storage side?

Comment 9 Dafna Ron 2012-10-14 14:12:50 UTC
reopening since we are getting pv/vg uuid which we cannot find easily in storage server
iqn would be useful information for the user.

Comment 10 Yaniv Kaul 2012-10-14 14:20:28 UTC
An example of a device from VDSM's getDeviceList:
{'GUID': '360a98000572d45366b4a53724369584e',
  'capacity': '322163441664',
  'devtype': 'iSCSI',
  'fwrev': '0.2',
  'logicalblocksize': '512',
  'partitioned': False,
  'pathlist': [{'connection': '10.35.66.11',
                'initiator': 'iqn.1994-05.com.redhat:d482fda7ce',
                'iqn': 'iqn.1992-08.com.netapp:sn.151709391',
                'password': '',
                'port': '3260',
                'portal': '1000',
                'user': ''}],
  'pathstatus': [{'lun': '1',
                  'physdev': 'sdi',
                  'state': 'active',
                  'type': 'iSCSI'}],
  'physicalblocksize': '512',
  'productID': 'LUN',
  'pvUUID': '1MyP2n-7cWs-4J5h-TgXy-hxoA-0b7F-eUf3Ot',
  'serial': 'SNETAPP_LUN_W-E6kJSrCiXN',
  'vendorID': 'NETAPP',
  'vgUUID': 'wuW3NF-54ta-VWwN-sH13-hMLf-97Co-N6na0q'},

I'm wondering which fields need to be passed up the chain to the RHEVM event, to make it possible for a storage admin, from looking at the log, understand what LUN we are talking about.

Candidates I see are vgUUID, VendorID, serial, pVUUID, iqn, GUID. I know some do not always have content, and perhaps we don't need some (no need for both serial and vendor ID?).

Comment 11 Greg Padgett 2012-10-16 03:17:51 UTC
Since the error is reporting on connections for the lun, how about iqn, like this?

The error code for connection 192.168.0.10 iqn.2012-08.localdomain.ovirt:iscsi1 (LUN gZTlrV-oiSl-YkbM-VWW4-2boE-vXTO-PnGQNH) returned by VDSM was following Failed to login to iSCSI node due to authorization failure

I went ahead and submitted a patch for this here:

http://gerrit.ovirt.org/8598

Comment 13 Allon Mureinik 2012-10-16 16:10:07 UTC
Merged If6f4f4c5e176e06cceeda7a7fed29565eead369b

Comment 14 Dafna Ron 2012-10-21 13:55:41 UTC
moving back to devel - perhaps the fix was not merged on si21.1

the only error I see is this one: 

Host gold-vdsc cannot access one of the Storage Domains attached to the Data Center iSCSI. Setting Host state to Non-Operational.

the domain blocked is a domain that has luns on two different storage servers and only one of the servers was blocked.

Comment 15 Ayal Baron 2012-10-21 14:43:11 UTC
Dafna, please attach engine log, the message you quoted above is not even the one you saw in comment 7 so either engine thinks there is no LUN

Greg, I totally missed this in the review. In: http://gerrit.ovirt.org/#/c/8598/2/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/StorageHelperBase.java

Line 138: lun.getphysical_volume_id()
We should not be printing the pv uuid, it is meaningless.  We should be printing the device GUID.

Comment 16 Dafna Ron 2012-10-21 14:58:16 UTC
Created attachment 630912 [details]
log

Comment 17 Greg Padgett 2012-10-28 22:33:52 UTC
http://gerrit.ovirt.org/8880

Got rid of the pv uuid, new example message:

VDSM returned an error for connection 192.168.0.10 iqn.2012-08.localdo
main.ovirt:iscsi2 (LUN 1IET_0002000a): Failed to login to iSCSI node due to authorization failure

Comment 22 Dafna Ron 2013-01-16 14:37:36 UTC
new error shown is: 

The error message for connection 10.35.64.10 Dafna-30 (LUN 1Dafna-371358338) returned by VDSM was: Failed to setup iSCSI subsystem


it would have been helpful to have the domain name in the error, but its something that the user can see in the storage tab or DC tab. 

verified on si26

Comment 24 errata-xmlrpc 2013-02-04 23:33:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0211.html