Bug 854214 - engine: logging - when domain is partially inaccessible event log does not report which luns are inaccessible
engine: logging - when domain is partially inaccessible event log does not re...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
x86_64 Linux
high Severity medium
: ---
: 3.1.2
Assigned To: Greg Padgett
Dafna Ron
storage
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-04 07:52 EDT by Dafna Ron
Modified: 2016-02-10 12:58 EST (History)
11 users (show)

See Also:
Fixed In Version: SI21
Doc Type: Bug Fix
Doc Text:
Previously, when a storage domain was partially inaccessible, the event log reported that a domain was problematic but did not report which LUNs were inaccessible. For instance, when a storage domain contained LUNs from two different storage servers and the connection to one of those LUNs was inaccessible, the error message did not tell you which of the LUNs was inaccessible. Now, when a storage domain is partially inaccessible because one of the LUNs in it is inaccessible, the event log records an error that includes information that you can use to determine which of the LUNs is inaccessible.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-04 18:33:57 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
log (88.13 KB, application/x-xz)
2012-09-04 07:52 EDT, Dafna Ron
no flags Details
log (218.99 KB, application/x-xz)
2012-10-21 10:58 EDT, Dafna Ron
no flags Details

  None (edit)
Description Dafna Ron 2012-09-04 07:52:31 EDT
Created attachment 609654 [details]
log

Description of problem:

when a domain is only partially inaccessible (some of the luns are visible and some are not) the even log does show which lun's are not visible. 

Version-Release number of selected component (if applicable):

si16

How reproducible:

100%

Steps to Reproduce:
1. in a two host's cluster have a storage domain which has luns from 2 different storage servers -> block connectivity to one of the storage servers from one of the host
2.
3.
  
Actual results:

even log reports domain as problematic but does not show which lun is inaccessible. 

Expected results:

we should report which lun is inaccessible in event log

Additional info: backend log

backend log does show it:

2012-09-04 12:41:23,891 INFO  [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [756aab3e] The lun with id HXT9pz-3stk-TPSL-Irup-5P31-887A-3AZMHm was reported as problematic !
Comment 1 Ayal Baron 2012-09-05 05:42:30 EDT
Fix is to make sure that the domain appears in the event log and not only in the log.
Comment 2 Greg Padgett 2012-09-17 21:53:10 EDT
http://gerrit.ovirt.org/8028
Comment 4 Allon Mureinik 2012-09-29 08:40:02 EDT
merged change id I6ba766c552a56940c4559b4cd73702627ff13eed
Comment 7 Dafna Ron 2012-10-14 09:59:47 EDT
moving to verified on si20. I can see the error in event log: 

The error code for connection 10.35.64.106 (LUN w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following Failed to login to iSCSI node due to authorization failure
Comment 8 Yaniv Kaul 2012-10-14 10:04:18 EDT
(In reply to comment #7)
> moving to verified on si20. I can see the error in event log: 
> 
> The error code for connection 10.35.64.106 (LUN
> w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following
> Failed to login to iSCSI node due to authorization failure

Is this enough information to find out the LUN on the storage side?
Comment 9 Dafna Ron 2012-10-14 10:12:50 EDT
reopening since we are getting pv/vg uuid which we cannot find easily in storage server
iqn would be useful information for the user.
Comment 10 Yaniv Kaul 2012-10-14 10:20:28 EDT
An example of a device from VDSM's getDeviceList:
{'GUID': '360a98000572d45366b4a53724369584e',
  'capacity': '322163441664',
  'devtype': 'iSCSI',
  'fwrev': '0.2',
  'logicalblocksize': '512',
  'partitioned': False,
  'pathlist': [{'connection': '10.35.66.11',
                'initiator': 'iqn.1994-05.com.redhat:d482fda7ce',
                'iqn': 'iqn.1992-08.com.netapp:sn.151709391',
                'password': '',
                'port': '3260',
                'portal': '1000',
                'user': ''}],
  'pathstatus': [{'lun': '1',
                  'physdev': 'sdi',
                  'state': 'active',
                  'type': 'iSCSI'}],
  'physicalblocksize': '512',
  'productID': 'LUN',
  'pvUUID': '1MyP2n-7cWs-4J5h-TgXy-hxoA-0b7F-eUf3Ot',
  'serial': 'SNETAPP_LUN_W-E6kJSrCiXN',
  'vendorID': 'NETAPP',
  'vgUUID': 'wuW3NF-54ta-VWwN-sH13-hMLf-97Co-N6na0q'},

I'm wondering which fields need to be passed up the chain to the RHEVM event, to make it possible for a storage admin, from looking at the log, understand what LUN we are talking about.

Candidates I see are vgUUID, VendorID, serial, pVUUID, iqn, GUID. I know some do not always have content, and perhaps we don't need some (no need for both serial and vendor ID?).
Comment 11 Greg Padgett 2012-10-15 23:17:51 EDT
Since the error is reporting on connections for the lun, how about iqn, like this?

The error code for connection 192.168.0.10 iqn.2012-08.localdomain.ovirt:iscsi1 (LUN gZTlrV-oiSl-YkbM-VWW4-2boE-vXTO-PnGQNH) returned by VDSM was following Failed to login to iSCSI node due to authorization failure

I went ahead and submitted a patch for this here:

http://gerrit.ovirt.org/8598
Comment 13 Allon Mureinik 2012-10-16 12:10:07 EDT
Merged If6f4f4c5e176e06cceeda7a7fed29565eead369b
Comment 14 Dafna Ron 2012-10-21 09:55:41 EDT
moving back to devel - perhaps the fix was not merged on si21.1

the only error I see is this one: 

Host gold-vdsc cannot access one of the Storage Domains attached to the Data Center iSCSI. Setting Host state to Non-Operational.

the domain blocked is a domain that has luns on two different storage servers and only one of the servers was blocked.
Comment 15 Ayal Baron 2012-10-21 10:43:11 EDT
Dafna, please attach engine log, the message you quoted above is not even the one you saw in comment 7 so either engine thinks there is no LUN

Greg, I totally missed this in the review. In: http://gerrit.ovirt.org/#/c/8598/2/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/StorageHelperBase.java

Line 138: lun.getphysical_volume_id()
We should not be printing the pv uuid, it is meaningless.  We should be printing the device GUID.
Comment 16 Dafna Ron 2012-10-21 10:58:16 EDT
Created attachment 630912 [details]
log
Comment 17 Greg Padgett 2012-10-28 18:33:52 EDT
http://gerrit.ovirt.org/8880

Got rid of the pv uuid, new example message:

VDSM returned an error for connection 192.168.0.10 iqn.2012-08.localdo
main.ovirt:iscsi2 (LUN 1IET_0002000a): Failed to login to iSCSI node due to authorization failure
Comment 22 Dafna Ron 2013-01-16 09:37:36 EST
new error shown is: 

The error message for connection 10.35.64.10 Dafna-30 (LUN 1Dafna-371358338) returned by VDSM was: Failed to setup iSCSI subsystem


it would have been helpful to have the domain name in the error, but its something that the user can see in the storage tab or DC tab. 

verified on si26
Comment 24 errata-xmlrpc 2013-02-04 18:33:57 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0211.html

Note You need to log in before you can comment on or make changes to this bug.