Bug 1103722 - [vdsm] [FC] vdsm detects wrong devices when performing getDeviceList after removing a device from the storage server
Summary: [vdsm] [FC] vdsm detects wrong devices when performing getDeviceList after re...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.5
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: m1
: 3.6.0
Assignee: Nir Soffer
QA Contact: Ori Gofen
URL:
Whiteboard: storage
Depends On: 1159839
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-02 12:30 UTC by Elad
Modified: 2016-03-10 06:13 UTC (History)
9 users (show)

Fixed In Version: ovirt-engine-3.6.0_alpha2
Clone Of:
Environment:
Last Closed: 2015-11-04 13:48:58 UTC
oVirt Team: Storage
Embargoed:
amureini: ovirt_requires_release_note-


Attachments (Terms of Use)
logs from vdsm and engine and screenshots of the storage server mapping against the LUNs list as seen in webadmin (791.59 KB, application/x-gzip)
2014-06-02 12:30 UTC, Elad
no flags Details

Description Elad 2014-06-02 12:30:20 UTC
Created attachment 901432 [details]
logs from vdsm and engine and screenshots of the storage server mapping against the LUNs list as seen in webadmin

Description of problem:
While testing this patch [1], I encountered the following problem:
I had a few LUNs mapped to the host from the storage server connected by FC. I've unmapped 2 of them.
After that, some of the LUNs were marked as problematic and some of them were marked as possible to be picked for storage domain creation/expansion. The ones reported as problematic weren't the LUNs I've removed from the mapping in the storage server and the ones reported as possible to be picked were the LUNs than I've unmapped. I tried to pick one of the LUNs which were reported as OK to be picked for storage domain creation/expansion and tried to create a storage domain with it. It failed with an error from vdsm that the LUN is in problem. 

[1] http://gerrit.ovirt.org/#/c/27122/


Version-Release number of selected component (if applicable):
ovirt-engine-3.5-alpha-1.1
ovirt-engine-3.5.0-0.0.master.20140519181229.gitc6324d4.el6.noarch
vdsm was build from this patch -  http://gerrit.ovirt.org/#/c/27122/
device-mapper-1.02.79-8.el6.x86_64
kernel - 2.6.32-431.17.1.el6.x86_64



How reproducible:
Always

Steps to Reproduce:
On a shared DC with one host connected and logged in to a storage server by its FC HBA:
1. Expose 2 LUNs to the host from the storage server by FC
2. Create a new FC domain from one of the LUNs
3. Expose 3 more LUNs to the host from the storage server and click on edit domain for the FC domain, look for the new LUNs
4. Unmap the new LUNs from the host, click on edit domain again

Actual results:
vdsm reports its FC connected devices list wrongly. The LUNs which were unmapped from storage server side it shows as OK and pick-able for storage domain creation/expansion. I tried to create a storage domain from one of them and failed:


Thread-23::ERROR::2014-06-02 13:20:47,594::lvm::727::Storage.LVM::(_initpvs) pvcreate failed with rc=5
Thread-23::ERROR::2014-06-02 13:20:47,594::lvm::728::Storage.LVM::(_initpvs) [], ['  /dev/mapper/3514f0c5462600d03: read failed after 0 of 4096 at 53687025664: Input/output error', '  /dev/mapper/3514f0c5462600d03
: read failed after 0 of 4096 at 53687083008: Input/output error', '  WARNING: Error counts reached a limit of 3. Device /dev/mapper/3514f0c5462600d03 was disabled', '  Fatal error while trying to detect swap sign
ature on /dev/mapper/3514f0c5462600d03.']
Thread-23::ERROR::2014-06-02 13:20:47,594::task::866::TaskManager.Task::(_setError) Task=`c3d704b2-b088-4c01-9408-c2b8a09acc0d`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2030, in createVG
    (force.capitalize() == "True")))
  File "/usr/share/vdsm/storage/lvm.py", line 907, in createVG
    _initpvs(pvs, metadataSize, force)
  File "/usr/share/vdsm/storage/lvm.py", line 729, in _initpvs
    raise se.PhysDevInitializationError(str(devices))
PhysDevInitializationError: Failed to initialize physical device: ("['/dev/mapper/3514f0c5462600d03']",)



Expected results:
vdsm should report the right state of its FC connected devices

Additional info: 
Attaching logs from vdsm and engine and screenshots of the storage server mapping against the LUNs list as seen in webadmin

Comment 1 Nir Soffer 2014-06-17 06:44:24 UTC
This happens after removing a device from the storage server, without removing the device from the hosts first.

This is not supported currently. I suggest to change this into RFE for 3.6 or later.

Comment 2 Allon Mureinik 2014-06-19 09:46:29 UTC
(In reply to Nir Soffer from comment #1)
> This happens after removing a device from the storage server, without
> removing the device from the hosts first.
> 
> This is not supported currently. I suggest to change this into RFE for 3.6
> or later.
Sean - let's take this up in the version planning.

Comment 3 Nir Soffer 2014-11-03 12:54:26 UTC
May be related to using issue_lip instead of scsi scan.

Comment 4 Nir Soffer 2014-11-13 23:51:07 UTC
May be fixed by http://gerrit.ovirt.org/34245

Comment 5 Allon Mureinik 2014-11-20 09:40:12 UTC
I've flagged it as dependent. Once a fix for bug 1159839 is merged, we need to see if /this/ one still reproduces or not.

Comment 6 Nir Soffer 2014-11-26 14:39:21 UTC
Elad, please test this when https://bugzilla.redhat.com/1159839 is verified.

Comment 7 Allon Mureinik 2015-07-07 14:13:45 UTC
(In reply to Nir Soffer from comment #6)
> Elad, please test this when https://bugzilla.redhat.com/1159839 is verified.
Moving to ON_QA to highlight this request.

Comment 9 Ori Gofen 2015-08-03 16:02:26 UTC
it's an XtremIO bug, cannot be verified on oVirt.
Luns at the XtremIO java management application does not keep a consistent Lunid, that probably what made Elad confused, oVirt's behaviour is correlated with multipathd, which is correlated with the storage server.
the UI of XtreamIO is misinforming.

Comment 10 Ori Gofen 2015-08-03 16:03:53 UTC
Elad sorry, It is a huge bug, but not oVirt's

Comment 11 Ori Gofen 2015-08-04 09:22:50 UTC
It looks like I ran into another bug, anyway the behaviour described in this bug have not reproduced.

Comment 12 Ori Gofen 2015-08-04 09:23:12 UTC
thus, Verified on 3.6

Comment 13 Sandro Bonazzola 2015-11-04 13:48:58 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.


Note You need to log in before you can comment on or make changes to this bug.