Bug 1152587

Summary: vdsm-4.14.13-2 sends FC LIP events on storage actions
Product: Red Hat Enterprise Virtualization Manager Reporter: Evgheni Dereveanchin <ederevea>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED ERRATA QA Contact: Elad <ebenahar>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.4.1-1CC: abisogia, amureini, bazulay, cww, ebrizuel, ecohen, ederevea, eedri, gwatson, iheim, jcoscia, jentrena, jraju, jswensso, ldelouw, lpeer, lsurette, mkalinin, mtessun, nsoffer, pablo.iranzo, pdwyer, raul.laansoo, rhodain, sauchter, sherold, tdosek, tnisan, tscherf, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: vt7 Doc Type: Bug Fix
Doc Text:
The issue_lip operation has been found to be disruptive on some storage servers, causing storage connection issues. Domains became inaccessible on random occasions. With this update, the issue_lip operation is disabled by default. As a result, discovering new LUNs on Fibre Channel storage server is not supported by default. Users can enable this option through new VDSM configuration (hba_rescan) if this option is compatible with the storage server. A future Red Hat Enterprise Virtualization version will support discovering new LUNs by default.
Story Points: ---
Clone Of:
: 1157681 (view as bug list) Environment:
Last Closed: 2015-02-11 21:13:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1157681    

Description Evgheni Dereveanchin 2014-10-14 13:27:47 UTC
Description of problem:
Upgrading to vdsm-4.14.13-2 sometimes causes storage instability, the hosts report latency errors and FC link flapping events. It is also noticed that supervdsm sends FC LIP events around the same time as these errors occur. 

Sending a LIP is advertised as a last resort action in our documentation:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/scanning-storage-interconnects.html

why are we doing this on a regular basis in vdsm? This seems to negatively affect storage performance

Version-Release number of selected component (if applicable):
vdsm-4.14.13-2.el6ev

How reproducible:
Always

Steps to Reproduce:
1. install RHEV-H with vdsm-4.14.13-2.el6ev
2. connect FibreChannel Storage
3. activate host/assign a LUN/activate a storage domain

Actual results:
High Latency errors reported by storage
Link Up events registered for FC HBAs
sanlock warnings/errors reported

Expected results:
All operates without issues

Additional info:
this may have been implemented as a fix to BZ#1121998

Comment 2 Gordon Watson 2014-10-14 15:33:57 UTC
In 'vdsm-4.14.13-2' a LIP is now issued to all FibreChannel hosts in 'hba.py' in certain circumstances. This was introduced via BZ 1123637.

In some circumstances, e.g. (dis)connectStoragePool, e.g. when going into or coming out of maintenance mode, this may be ok, but this is also occurring when the following are performed;

- activating/deactivating an NFS Export or ISO domain
- clicking on 'Edit' in the Admin Portal to edit an FC data domain (results in a 'getDeviceList' on the host)


The problem here is that all the active FC storage domains will be affected by this.

Even for the fix to BZ 1123637, which I believe is for being able to see a newly-presented FC lun on a host in order to either create a new storage domain or extend an existing one, all of the other FC storage domains will be affected.

Comment 6 Nir Soffer 2014-10-15 11:39:28 UTC
Gordon, the attached patch disabled hba rescanning by defualt.

Can you test this patch and confirm that the FC connection issues are resolved with this patch?

Comment 21 Nir Soffer 2014-10-19 21:48:13 UTC
The new patch (34245) is a more correct fix. It would be helpful if you test this patch on relevant sites.

The interesting test is:
1. While host is up, add new LUN on the storage server
2. Edit FC storage domain or create new one

Expected results:
- The new LUN should appear in the list of devices
- Existing FC connection should not be disrupted.

Comment 27 Ezequiel Hector Brizuela 2014-10-22 13:49:50 UTC
Red Hat Customer Portal 01123741

Comment 31 Nir Soffer 2014-11-18 06:03:18 UTC
*** Bug 1162283 has been marked as a duplicate of this bug. ***

Comment 32 Elad 2014-11-26 13:25:53 UTC
FC LIP events are not issued as part of storage domain creation/edit.

Checked the following:
Installed OS on a guest with a disk attached resides on FC domain:
- Mapped a new LUN to the host by FC, then clicked on 'new' domain.
- Mapped a new LUN to the host by FC, then clicked on 'edit' domain.

Rescanned the bus using 'rescan-scsi-bus.sh' tool, then perfomed those steps again.


During those actions, OS installation on the guest wasn't affected, it finished successfully.



Didn't encountered any of:

High Latency errors reported by storage
Link Up events registered for FC HBAs
sanlock warnings/errors reported



Checked using XtremIO storage server

Used rhev 3.5 vt11

Comment 33 Allon Mureinik 2014-11-26 23:20:33 UTC
doctext copied from the zstream clone, bug 1157681.

Comment 35 errata-xmlrpc 2015-02-11 21:13:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0159.html