Bug 854214 - engine: logging - when domain is partially inaccessible event log does not report which luns are inaccessible
Summary: engine: logging - when domain is partially inaccessible event log does not re...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: 3.1.2
Assignee: Greg Padgett
QA Contact: Dafna Ron
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-04 11:52 UTC by Dafna Ron
Modified: 2016-02-10 17:58 UTC (History)
11 users (show)

Fixed In Version: SI21
Doc Type: Bug Fix
Doc Text:
Previously, when a storage domain was partially inaccessible, the event log reported that a domain was problematic but did not report which LUNs were inaccessible. For instance, when a storage domain contained LUNs from two different storage servers and the connection to one of those LUNs was inaccessible, the error message did not tell you which of the LUNs was inaccessible. Now, when a storage domain is partially inaccessible because one of the LUNs in it is inaccessible, the event log records an error that includes information that you can use to determine which of the LUNs is inaccessible.
Clone Of:
Environment:
Last Closed: 2013-02-04 23:33:57 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log (88.13 KB, application/x-xz)
2012-09-04 11:52 UTC, Dafna Ron
no flags Details
log (218.99 KB, application/x-xz)
2012-10-21 14:58 UTC, Dafna Ron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0211 0 normal SHIPPED_LIVE Moderate: rhevm 3.1.2 security and bug fix update 2013-02-05 04:53:00 UTC

Description Dafna Ron 2012-09-04 11:52:31 UTC
Created attachment 609654 [details]
log

Description of problem:

when a domain is only partially inaccessible (some of the luns are visible and some are not) the even log does show which lun's are not visible. 

Version-Release number of selected component (if applicable):

si16

How reproducible:

100%

Steps to Reproduce:
1. in a two host's cluster have a storage domain which has luns from 2 different storage servers -> block connectivity to one of the storage servers from one of the host
2.
3.
  
Actual results:

even log reports domain as problematic but does not show which lun is inaccessible. 

Expected results:

we should report which lun is inaccessible in event log

Additional info: backend log

backend log does show it:

2012-09-04 12:41:23,891 INFO  [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [756aab3e] The lun with id HXT9pz-3stk-TPSL-Irup-5P31-887A-3AZMHm was reported as problematic !

Comment 1 Ayal Baron 2012-09-05 09:42:30 UTC
Fix is to make sure that the domain appears in the event log and not only in the log.

Comment 2 Greg Padgett 2012-09-18 01:53:10 UTC
http://gerrit.ovirt.org/8028

Comment 4 Allon Mureinik 2012-09-29 12:40:02 UTC
merged change id I6ba766c552a56940c4559b4cd73702627ff13eed

Comment 7 Dafna Ron 2012-10-14 13:59:47 UTC
moving to verified on si20. I can see the error in event log: 

The error code for connection 10.35.64.106 (LUN w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following Failed to login to iSCSI node due to authorization failure

Comment 8 Yaniv Kaul 2012-10-14 14:04:18 UTC
(In reply to comment #7)
> moving to verified on si20. I can see the error in event log: 
> 
> The error code for connection 10.35.64.106 (LUN
> w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following
> Failed to login to iSCSI node due to authorization failure

Is this enough information to find out the LUN on the storage side?

Comment 9 Dafna Ron 2012-10-14 14:12:50 UTC
reopening since we are getting pv/vg uuid which we cannot find easily in storage server
iqn would be useful information for the user.

Comment 10 Yaniv Kaul 2012-10-14 14:20:28 UTC
An example of a device from VDSM's getDeviceList:
{'GUID': '360a98000572d45366b4a53724369584e',
  'capacity': '322163441664',
  'devtype': 'iSCSI',
  'fwrev': '0.2',
  'logicalblocksize': '512',
  'partitioned': False,
  'pathlist': [{'connection': '10.35.66.11',
                'initiator': 'iqn.1994-05.com.redhat:d482fda7ce',
                'iqn': 'iqn.1992-08.com.netapp:sn.151709391',
                'password': '',
                'port': '3260',
                'portal': '1000',
                'user': ''}],
  'pathstatus': [{'lun': '1',
                  'physdev': 'sdi',
                  'state': 'active',
                  'type': 'iSCSI'}],
  'physicalblocksize': '512',
  'productID': 'LUN',
  'pvUUID': '1MyP2n-7cWs-4J5h-TgXy-hxoA-0b7F-eUf3Ot',
  'serial': 'SNETAPP_LUN_W-E6kJSrCiXN',
  'vendorID': 'NETAPP',
  'vgUUID': 'wuW3NF-54ta-VWwN-sH13-hMLf-97Co-N6na0q'},

I'm wondering which fields need to be passed up the chain to the RHEVM event, to make it possible for a storage admin, from looking at the log, understand what LUN we are talking about.

Candidates I see are vgUUID, VendorID, serial, pVUUID, iqn, GUID. I know some do not always have content, and perhaps we don't need some (no need for both serial and vendor ID?).

Comment 11 Greg Padgett 2012-10-16 03:17:51 UTC
Since the error is reporting on connections for the lun, how about iqn, like this?

The error code for connection 192.168.0.10 iqn.2012-08.localdomain.ovirt:iscsi1 (LUN gZTlrV-oiSl-YkbM-VWW4-2boE-vXTO-PnGQNH) returned by VDSM was following Failed to login to iSCSI node due to authorization failure

I went ahead and submitted a patch for this here:

http://gerrit.ovirt.org/8598

Comment 13 Allon Mureinik 2012-10-16 16:10:07 UTC
Merged If6f4f4c5e176e06cceeda7a7fed29565eead369b

Comment 14 Dafna Ron 2012-10-21 13:55:41 UTC
moving back to devel - perhaps the fix was not merged on si21.1

the only error I see is this one: 

Host gold-vdsc cannot access one of the Storage Domains attached to the Data Center iSCSI. Setting Host state to Non-Operational.

the domain blocked is a domain that has luns on two different storage servers and only one of the servers was blocked.

Comment 15 Ayal Baron 2012-10-21 14:43:11 UTC
Dafna, please attach engine log, the message you quoted above is not even the one you saw in comment 7 so either engine thinks there is no LUN

Greg, I totally missed this in the review. In: http://gerrit.ovirt.org/#/c/8598/2/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/StorageHelperBase.java

Line 138: lun.getphysical_volume_id()
We should not be printing the pv uuid, it is meaningless.  We should be printing the device GUID.

Comment 16 Dafna Ron 2012-10-21 14:58:16 UTC
Created attachment 630912 [details]
log

Comment 17 Greg Padgett 2012-10-28 22:33:52 UTC
http://gerrit.ovirt.org/8880

Got rid of the pv uuid, new example message:

VDSM returned an error for connection 192.168.0.10 iqn.2012-08.localdo
main.ovirt:iscsi2 (LUN 1IET_0002000a): Failed to login to iSCSI node due to authorization failure

Comment 22 Dafna Ron 2013-01-16 14:37:36 UTC
new error shown is: 

The error message for connection 10.35.64.10 Dafna-30 (LUN 1Dafna-371358338) returned by VDSM was: Failed to setup iSCSI subsystem


it would have been helpful to have the domain name in the error, but its something that the user can see in the storage tab or DC tab. 

verified on si26

Comment 24 errata-xmlrpc 2013-02-04 23:33:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0211.html


Note You need to log in before you can comment on or make changes to this bug.