Bug 1152587 - vdsm-4.14.13-2 sends FC LIP events on storage actions
Summary: vdsm-4.14.13-2 sends FC LIP events on storage actions
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.4.1-1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.5.0
Assignee: Nir Soffer
QA Contact: Elad
URL:
Whiteboard: storage
: 1162283 (view as bug list)
Depends On:
Blocks: 1157681
TreeView+ depends on / blocked
 
Reported: 2014-10-14 13:27 UTC by Evgheni Dereveanchin
Modified: 2019-05-20 11:18 UTC (History)
30 users (show)

Fixed In Version: vt7
Doc Type: Bug Fix
Doc Text:
The issue_lip operation has been found to be disruptive on some storage servers, causing storage connection issues. Domains became inaccessible on random occasions. With this update, the issue_lip operation is disabled by default. As a result, discovering new LUNs on Fibre Channel storage server is not supported by default. Users can enable this option through new VDSM configuration (hba_rescan) if this option is compatible with the storage server. A future Red Hat Enterprise Virtualization version will support discovering new LUNs by default.
Clone Of:
: 1157681 (view as bug list)
Environment:
Last Closed: 2015-02-11 21:13:02 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1218583 0 None None None Never
Red Hat Product Errata RHBA-2015:0159 0 normal SHIPPED_LIVE vdsm 3.5.0 - bug fix and enhancement update 2015-02-12 01:35:58 UTC
oVirt gerrit 34176 0 'None' 'MERGED' 'multiapth: Disable hba rescanning by default' 2019-11-20 09:34:32 UTC
oVirt gerrit 34196 0 'None' 'ABANDONED' 'multiapth: Disable hba rescanning by default' 2019-11-20 09:34:32 UTC
oVirt gerrit 34215 0 'None' 'MERGED' 'multiapth: Disable hba rescanning by default' 2019-11-20 09:34:32 UTC
oVirt gerrit 34245 0 'None' 'MERGED' 'hba: Rescan using SCSI layer' 2019-11-20 09:34:32 UTC

Description Evgheni Dereveanchin 2014-10-14 13:27:47 UTC
Description of problem:
Upgrading to vdsm-4.14.13-2 sometimes causes storage instability, the hosts report latency errors and FC link flapping events. It is also noticed that supervdsm sends FC LIP events around the same time as these errors occur. 

Sending a LIP is advertised as a last resort action in our documentation:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/scanning-storage-interconnects.html

why are we doing this on a regular basis in vdsm? This seems to negatively affect storage performance

Version-Release number of selected component (if applicable):
vdsm-4.14.13-2.el6ev

How reproducible:
Always

Steps to Reproduce:
1. install RHEV-H with vdsm-4.14.13-2.el6ev
2. connect FibreChannel Storage
3. activate host/assign a LUN/activate a storage domain

Actual results:
High Latency errors reported by storage
Link Up events registered for FC HBAs
sanlock warnings/errors reported

Expected results:
All operates without issues

Additional info:
this may have been implemented as a fix to BZ#1121998

Comment 2 Gordon Watson 2014-10-14 15:33:57 UTC
In 'vdsm-4.14.13-2' a LIP is now issued to all FibreChannel hosts in 'hba.py' in certain circumstances. This was introduced via BZ 1123637.

In some circumstances, e.g. (dis)connectStoragePool, e.g. when going into or coming out of maintenance mode, this may be ok, but this is also occurring when the following are performed;

- activating/deactivating an NFS Export or ISO domain
- clicking on 'Edit' in the Admin Portal to edit an FC data domain (results in a 'getDeviceList' on the host)


The problem here is that all the active FC storage domains will be affected by this.

Even for the fix to BZ 1123637, which I believe is for being able to see a newly-presented FC lun on a host in order to either create a new storage domain or extend an existing one, all of the other FC storage domains will be affected.

Comment 6 Nir Soffer 2014-10-15 11:39:28 UTC
Gordon, the attached patch disabled hba rescanning by defualt.

Can you test this patch and confirm that the FC connection issues are resolved with this patch?

Comment 21 Nir Soffer 2014-10-19 21:48:13 UTC
The new patch (34245) is a more correct fix. It would be helpful if you test this patch on relevant sites.

The interesting test is:
1. While host is up, add new LUN on the storage server
2. Edit FC storage domain or create new one

Expected results:
- The new LUN should appear in the list of devices
- Existing FC connection should not be disrupted.

Comment 27 Ezequiel Hector Brizuela 2014-10-22 13:49:50 UTC
Red Hat Customer Portal 01123741

Comment 31 Nir Soffer 2014-11-18 06:03:18 UTC
*** Bug 1162283 has been marked as a duplicate of this bug. ***

Comment 32 Elad 2014-11-26 13:25:53 UTC
FC LIP events are not issued as part of storage domain creation/edit.

Checked the following:
Installed OS on a guest with a disk attached resides on FC domain:
- Mapped a new LUN to the host by FC, then clicked on 'new' domain.
- Mapped a new LUN to the host by FC, then clicked on 'edit' domain.

Rescanned the bus using 'rescan-scsi-bus.sh' tool, then perfomed those steps again.


During those actions, OS installation on the guest wasn't affected, it finished successfully.



Didn't encountered any of:

High Latency errors reported by storage
Link Up events registered for FC HBAs
sanlock warnings/errors reported



Checked using XtremIO storage server

Used rhev 3.5 vt11

Comment 33 Allon Mureinik 2014-11-26 23:20:33 UTC
doctext copied from the zstream clone, bug 1157681.

Comment 35 errata-xmlrpc 2015-02-11 21:13:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0159.html


Note You need to log in before you can comment on or make changes to this bug.