Created attachment 901432 [details] logs from vdsm and engine and screenshots of the storage server mapping against the LUNs list as seen in webadmin Description of problem: While testing this patch [1], I encountered the following problem: I had a few LUNs mapped to the host from the storage server connected by FC. I've unmapped 2 of them. After that, some of the LUNs were marked as problematic and some of them were marked as possible to be picked for storage domain creation/expansion. The ones reported as problematic weren't the LUNs I've removed from the mapping in the storage server and the ones reported as possible to be picked were the LUNs than I've unmapped. I tried to pick one of the LUNs which were reported as OK to be picked for storage domain creation/expansion and tried to create a storage domain with it. It failed with an error from vdsm that the LUN is in problem. [1] http://gerrit.ovirt.org/#/c/27122/ Version-Release number of selected component (if applicable): ovirt-engine-3.5-alpha-1.1 ovirt-engine-3.5.0-0.0.master.20140519181229.gitc6324d4.el6.noarch vdsm was build from this patch - http://gerrit.ovirt.org/#/c/27122/ device-mapper-1.02.79-8.el6.x86_64 kernel - 2.6.32-431.17.1.el6.x86_64 How reproducible: Always Steps to Reproduce: On a shared DC with one host connected and logged in to a storage server by its FC HBA: 1. Expose 2 LUNs to the host from the storage server by FC 2. Create a new FC domain from one of the LUNs 3. Expose 3 more LUNs to the host from the storage server and click on edit domain for the FC domain, look for the new LUNs 4. Unmap the new LUNs from the host, click on edit domain again Actual results: vdsm reports its FC connected devices list wrongly. The LUNs which were unmapped from storage server side it shows as OK and pick-able for storage domain creation/expansion. I tried to create a storage domain from one of them and failed: Thread-23::ERROR::2014-06-02 13:20:47,594::lvm::727::Storage.LVM::(_initpvs) pvcreate failed with rc=5 Thread-23::ERROR::2014-06-02 13:20:47,594::lvm::728::Storage.LVM::(_initpvs) [], [' /dev/mapper/3514f0c5462600d03: read failed after 0 of 4096 at 53687025664: Input/output error', ' /dev/mapper/3514f0c5462600d03 : read failed after 0 of 4096 at 53687083008: Input/output error', ' WARNING: Error counts reached a limit of 3. Device /dev/mapper/3514f0c5462600d03 was disabled', ' Fatal error while trying to detect swap sign ature on /dev/mapper/3514f0c5462600d03.'] Thread-23::ERROR::2014-06-02 13:20:47,594::task::866::TaskManager.Task::(_setError) Task=`c3d704b2-b088-4c01-9408-c2b8a09acc0d`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2030, in createVG (force.capitalize() == "True"))) File "/usr/share/vdsm/storage/lvm.py", line 907, in createVG _initpvs(pvs, metadataSize, force) File "/usr/share/vdsm/storage/lvm.py", line 729, in _initpvs raise se.PhysDevInitializationError(str(devices)) PhysDevInitializationError: Failed to initialize physical device: ("['/dev/mapper/3514f0c5462600d03']",) Expected results: vdsm should report the right state of its FC connected devices Additional info: Attaching logs from vdsm and engine and screenshots of the storage server mapping against the LUNs list as seen in webadmin
This happens after removing a device from the storage server, without removing the device from the hosts first. This is not supported currently. I suggest to change this into RFE for 3.6 or later.
(In reply to Nir Soffer from comment #1) > This happens after removing a device from the storage server, without > removing the device from the hosts first. > > This is not supported currently. I suggest to change this into RFE for 3.6 > or later. Sean - let's take this up in the version planning.
May be related to using issue_lip instead of scsi scan.
May be fixed by http://gerrit.ovirt.org/34245
I've flagged it as dependent. Once a fix for bug 1159839 is merged, we need to see if /this/ one still reproduces or not.
Elad, please test this when https://bugzilla.redhat.com/1159839 is verified.
(In reply to Nir Soffer from comment #6) > Elad, please test this when https://bugzilla.redhat.com/1159839 is verified. Moving to ON_QA to highlight this request.
it's an XtremIO bug, cannot be verified on oVirt. Luns at the XtremIO java management application does not keep a consistent Lunid, that probably what made Elad confused, oVirt's behaviour is correlated with multipathd, which is correlated with the storage server. the UI of XtreamIO is misinforming.
Elad sorry, It is a huge bug, but not oVirt's
It looks like I ran into another bug, anyway the behaviour described in this bug have not reproduced.
thus, Verified on 3.6
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue. If problems still persist, please open a new BZ and reference this one.