1152587 – vdsm-4.14.13-2 sends FC LIP events on storage actions

Bug 1152587 - vdsm-4.14.13-2 sends FC LIP events on storage actions

Summary: vdsm-4.14.13-2 sends FC LIP events on storage actions

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.4.1-1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Nir Soffer
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:	storage
Duplicates (1):	1162283 (view as bug list)
Depends On:
Blocks:	1157681
TreeView+	depends on / blocked

Reported:	2014-10-14 13:27 UTC by Evgheni Dereveanchin
Modified:	2019-05-20 11:18 UTC (History)
CC List:	30 users (show)
Fixed In Version:	vt7
Doc Type:	Bug Fix
Doc Text:	The issue_lip operation has been found to be disruptive on some storage servers, causing storage connection issues. Domains became inaccessible on random occasions. With this update, the issue_lip operation is disabled by default. As a result, discovering new LUNs on Fibre Channel storage server is not supported by default. Users can enable this option through new VDSM configuration (hba_rescan) if this option is compatible with the storage server. A future Red Hat Enterprise Virtualization version will support discovering new LUNs by default.
Clone Of:
Clones:	1157681 (view as bug list)
Environment:
Last Closed:	2015-02-11 21:13:02 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	1218583	None	None	None	Never
Red Hat Product Errata	RHBA-2015:0159	normal	SHIPPED_LIVE	vdsm 3.5.0 - bug fix and enhancement update	2015-02-12 01:35:58 UTC
oVirt gerrit	34176	'None'	'MERGED'	'multiapth: Disable hba rescanning by default'	2019-11-20 09:34:32 UTC
oVirt gerrit	34196	'None'	'ABANDONED'	'multiapth: Disable hba rescanning by default'	2019-11-20 09:34:32 UTC
oVirt gerrit	34215	'None'	'MERGED'	'multiapth: Disable hba rescanning by default'	2019-11-20 09:34:32 UTC
oVirt gerrit	34245	'None'	'MERGED'	'hba: Rescan using SCSI layer'	2019-11-20 09:34:32 UTC

Description Evgheni Dereveanchin 2014-10-14 13:27:47 UTC

Description of problem:
Upgrading to vdsm-4.14.13-2 sometimes causes storage instability, the hosts report latency errors and FC link flapping events. It is also noticed that supervdsm sends FC LIP events around the same time as these errors occur. 

Sending a LIP is advertised as a last resort action in our documentation:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/scanning-storage-interconnects.html

why are we doing this on a regular basis in vdsm? This seems to negatively affect storage performance

Version-Release number of selected component (if applicable):
vdsm-4.14.13-2.el6ev

How reproducible:
Always

Steps to Reproduce:
1. install RHEV-H with vdsm-4.14.13-2.el6ev
2. connect FibreChannel Storage
3. activate host/assign a LUN/activate a storage domain

Actual results:
High Latency errors reported by storage
Link Up events registered for FC HBAs
sanlock warnings/errors reported

Expected results:
All operates without issues

Additional info:
this may have been implemented as a fix to BZ#1121998

Comment 2 Gordon Watson 2014-10-14 15:33:57 UTC

In 'vdsm-4.14.13-2' a LIP is now issued to all FibreChannel hosts in 'hba.py' in certain circumstances. This was introduced via BZ 1123637.

In some circumstances, e.g. (dis)connectStoragePool, e.g. when going into or coming out of maintenance mode, this may be ok, but this is also occurring when the following are performed;

- activating/deactivating an NFS Export or ISO domain
- clicking on 'Edit' in the Admin Portal to edit an FC data domain (results in a 'getDeviceList' on the host)


The problem here is that all the active FC storage domains will be affected by this.

Even for the fix to BZ 1123637, which I believe is for being able to see a newly-presented FC lun on a host in order to either create a new storage domain or extend an existing one, all of the other FC storage domains will be affected.

Comment 6 Nir Soffer 2014-10-15 11:39:28 UTC

Gordon, the attached patch disabled hba rescanning by defualt.

Can you test this patch and confirm that the FC connection issues are resolved with this patch?

Comment 21 Nir Soffer 2014-10-19 21:48:13 UTC

The new patch (34245) is a more correct fix. It would be helpful if you test this patch on relevant sites.

The interesting test is:
1. While host is up, add new LUN on the storage server
2. Edit FC storage domain or create new one

Expected results:
- The new LUN should appear in the list of devices
- Existing FC connection should not be disrupted.

Comment 27 Ezequiel Hector Brizuela 2014-10-22 13:49:50 UTC

Red Hat Customer Portal 01123741

Comment 31 Nir Soffer 2014-11-18 06:03:18 UTC

*** Bug 1162283 has been marked as a duplicate of this bug. ***

Comment 32 Elad 2014-11-26 13:25:53 UTC

FC LIP events are not issued as part of storage domain creation/edit.

Checked the following:
Installed OS on a guest with a disk attached resides on FC domain:
- Mapped a new LUN to the host by FC, then clicked on 'new' domain.
- Mapped a new LUN to the host by FC, then clicked on 'edit' domain.

Rescanned the bus using 'rescan-scsi-bus.sh' tool, then perfomed those steps again.


During those actions, OS installation on the guest wasn't affected, it finished successfully.



Didn't encountered any of:

High Latency errors reported by storage
Link Up events registered for FC HBAs
sanlock warnings/errors reported



Checked using XtremIO storage server

Used rhev 3.5 vt11

Comment 33 Allon Mureinik 2014-11-26 23:20:33 UTC

doctext copied from the zstream clone, bug 1157681.

Comment 35 errata-xmlrpc 2015-02-11 21:13:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0159.html

Note You need to log in before you can comment on or make changes to this bug.

abisogia
amureini
bazulay
cww
ebrizuel
ecohen
ederevea
eedri
gwatson
iheim
jcoscia
jentrena
jraju
jswensso
ldelouw
lpeer
lsurette
mkalinin
mtessun
nsoffer
pablo.iranzo
pdwyer
raul.laansoo
rhodain
sauchter
sherold
tdosek
tnisan
tscherf
yeylon