Bug 854214
| Summary: | engine: logging - when domain is partially inaccessible event log does not report which luns are inaccessible | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||||
| Component: | ovirt-engine | Assignee: | Greg Padgett <gpadgett> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Dafna Ron <dron> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 3.1.0 | CC: | abaron, amureini, dyasny, hateya, iheim, lpeer, mkenneth, Rhev-m-bugs, yeylon, ykaul, zdover | ||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||
| Target Release: | 3.1.2 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | storage | ||||||||
| Fixed In Version: | SI21 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Previously, when a storage domain was partially inaccessible, the event log reported that a domain was problematic but did not report which LUNs were inaccessible. For instance, when a storage domain contained LUNs from two different storage servers and the connection to one of those LUNs was inaccessible, the error message did not tell you which of the LUNs was inaccessible.
Now, when a storage domain is partially inaccessible because one of the LUNs in it is inaccessible, the event log records an error that includes information that you can use to determine which of the LUNs is inaccessible.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2013-02-04 23:33:57 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Fix is to make sure that the domain appears in the event log and not only in the log. merged change id I6ba766c552a56940c4559b4cd73702627ff13eed moving to verified on si20. I can see the error in event log: The error code for connection 10.35.64.106 (LUN w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following Failed to login to iSCSI node due to authorization failure (In reply to comment #7) > moving to verified on si20. I can see the error in event log: > > The error code for connection 10.35.64.106 (LUN > w5N10S-YxNF-jbOZ-DQsn-bhgf-1kR4-iDbhlB) returned by VDSM was following > Failed to login to iSCSI node due to authorization failure Is this enough information to find out the LUN on the storage side? reopening since we are getting pv/vg uuid which we cannot find easily in storage server iqn would be useful information for the user. An example of a device from VDSM's getDeviceList:
{'GUID': '360a98000572d45366b4a53724369584e',
'capacity': '322163441664',
'devtype': 'iSCSI',
'fwrev': '0.2',
'logicalblocksize': '512',
'partitioned': False,
'pathlist': [{'connection': '10.35.66.11',
'initiator': 'iqn.1994-05.com.redhat:d482fda7ce',
'iqn': 'iqn.1992-08.com.netapp:sn.151709391',
'password': '',
'port': '3260',
'portal': '1000',
'user': ''}],
'pathstatus': [{'lun': '1',
'physdev': 'sdi',
'state': 'active',
'type': 'iSCSI'}],
'physicalblocksize': '512',
'productID': 'LUN',
'pvUUID': '1MyP2n-7cWs-4J5h-TgXy-hxoA-0b7F-eUf3Ot',
'serial': 'SNETAPP_LUN_W-E6kJSrCiXN',
'vendorID': 'NETAPP',
'vgUUID': 'wuW3NF-54ta-VWwN-sH13-hMLf-97Co-N6na0q'},
I'm wondering which fields need to be passed up the chain to the RHEVM event, to make it possible for a storage admin, from looking at the log, understand what LUN we are talking about.
Candidates I see are vgUUID, VendorID, serial, pVUUID, iqn, GUID. I know some do not always have content, and perhaps we don't need some (no need for both serial and vendor ID?).
Since the error is reporting on connections for the lun, how about iqn, like this? The error code for connection 192.168.0.10 iqn.2012-08.localdomain.ovirt:iscsi1 (LUN gZTlrV-oiSl-YkbM-VWW4-2boE-vXTO-PnGQNH) returned by VDSM was following Failed to login to iSCSI node due to authorization failure I went ahead and submitted a patch for this here: http://gerrit.ovirt.org/8598 Merged If6f4f4c5e176e06cceeda7a7fed29565eead369b moving back to devel - perhaps the fix was not merged on si21.1 the only error I see is this one: Host gold-vdsc cannot access one of the Storage Domains attached to the Data Center iSCSI. Setting Host state to Non-Operational. the domain blocked is a domain that has luns on two different storage servers and only one of the servers was blocked. Dafna, please attach engine log, the message you quoted above is not even the one you saw in comment 7 so either engine thinks there is no LUN Greg, I totally missed this in the review. In: http://gerrit.ovirt.org/#/c/8598/2/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/StorageHelperBase.java Line 138: lun.getphysical_volume_id() We should not be printing the pv uuid, it is meaningless. We should be printing the device GUID. Created attachment 630912 [details]
log
http://gerrit.ovirt.org/8880 Got rid of the pv uuid, new example message: VDSM returned an error for connection 192.168.0.10 iqn.2012-08.localdo main.ovirt:iscsi2 (LUN 1IET_0002000a): Failed to login to iSCSI node due to authorization failure new error shown is: The error message for connection 10.35.64.10 Dafna-30 (LUN 1Dafna-371358338) returned by VDSM was: Failed to setup iSCSI subsystem it would have been helpful to have the domain name in the error, but its something that the user can see in the storage tab or DC tab. verified on si26 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0211.html |
Created attachment 609654 [details] log Description of problem: when a domain is only partially inaccessible (some of the luns are visible and some are not) the even log does show which lun's are not visible. Version-Release number of selected component (if applicable): si16 How reproducible: 100% Steps to Reproduce: 1. in a two host's cluster have a storage domain which has luns from 2 different storage servers -> block connectivity to one of the storage servers from one of the host 2. 3. Actual results: even log reports domain as problematic but does not show which lun is inaccessible. Expected results: we should report which lun is inaccessible in event log Additional info: backend log backend log does show it: 2012-09-04 12:41:23,891 INFO [org.ovirt.engine.core.bll.storage.ISCSIStorageHelper] (QuartzScheduler_Worker-94) [756aab3e] The lun with id HXT9pz-3stk-TPSL-Irup-5P31-887A-3AZMHm was reported as problematic !