Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1191401

Summary:	[RHEV-H 6.6] multipath fails while creating map
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Alexandros Gkesos <agkesos>
Component:	ovirt-node	Assignee:	Fabian Deutsch <fdeutsch>
Status:	CLOSED WORKSFORME	QA Contact:	wanghui <huiwa>
Severity:	low	Docs Contact:
Priority:	high
Version:	3.4.4	CC:	agkesos, amureini, bazulay, bmarzins, cshao, dougsland, ecohen, fdeutsch, huiwa, iheim, jraju, leiwang, lpeer, lsurette, nsoffer, pbrilla, ppostler, pstehlik, pzhukov, tnisan, yaniwang, ycui, yeylon
Target Milestone:	ovirt-3.6.0-rc3	Keywords:	OtherQA
Target Release:	3.6.0
Hardware:	All
OS:	Linux
Whiteboard:	node
Fixed In Version:	ovirt-node-3.3.0-0.16.20151021git51ae965.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-11-03 15:45:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Node	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1051742, 1235965
Bug Blocks:

Comment 3 Nir Soffer 2015-02-11 21:53:57 UTC

Ben, can you take a look at this?

Comment 4 Ben Marzinski 2015-02-12 04:57:45 UTC

I looks like the logical volumes were already activated on one of the path devices, before multipath could get set up on it. In this case, when multipath tries to load the table with that device, device mapper sees that it's already in use.

Comment 7 Allon Mureinik 2015-03-09 07:25:52 UTC

So Nir, Ben, what's our next action item here?
(or, more properly, do we have one?)

Comment 8 Ben Marzinski 2015-03-09 19:44:32 UTC

As long as the wwid for the device exists in /etc/multipath/wwids, then multipath should claim the device in it's udev rules and lvm shouldn't grab it.  So the real question is why wasn't the wwid 3600601601772320026d466a7a744e4, in the wwids file. Multipath won't put it there until it sets itself up on the device correctly. However, the wwid can be added without multipath needing to have set itself up by running

# multipath -a <device>

or by adding

mpath.wwid=<wwid>

to the kernel commandline, or by directly editting

/etc/multipath/wwids

I'm not sure why the wwid wasn't in the file in this case, but it needs to be, to avoid this race.

Comment 9 Nir Soffer 2015-03-22 19:49:22 UTC

(In reply to Ben Marzinski from comment #8)
Ben, do you think this is related to this change in vdsm, not using
"multipath -r" regularly, suggested in comment 1?
https://gerrit.ovirt.org/27242

Can vdsm take any action to avoid this situation?

Comment 10 Ben Marzinski 2015-03-23 18:08:26 UTC

(In reply to Nir Soffer from comment #9)
> (In reply to Ben Marzinski from comment #8)
> Ben, do you think this is related to this change in vdsm, not using
> "multipath -r" regularly, suggested in comment 1?
> https://gerrit.ovirt.org/27242
> 
> Can vdsm take any action to avoid this situation?

No. This is happening in response to the uevents when a device is added. By the time that the device is available for multipath -r to work on it, lvm will have already grabbed it or will do so shortly. The reason that the workaround in comment #1 works, is because the lvm devices were deactivated first.  This whole issue comes down to the device not being in the wwids file, and so not recognized as a multipath device. If the device isn't recognized by multipath, then other systems can grab it (such as LVM or MD).

The only real solution is to make sure that the device wwid is in /etc/multipath/wwids.  I should note that if the devices don't have any LVM or MD metadata on them, then nothing will be racing with multipath to build a virtual device on top of them.  If there is a filesystem on them, then there still could be a race where the filesystem gets mounted before multipath claims the devices, but if you are starting with blank devices, you will never see this issue, since there is nothing for multipath to race with.

Comment 11 Nir Soffer 2015-03-23 22:10:17 UTC

(In reply to Ben Marzinski from comment #10)
Based on Ben reply, this is not vdsm bug, and should be handled by lower level component.

Ben, would you like to take this bug, or recommend the correct component?

Comment 29 Fabian Deutsch 2015-04-13 20:28:58 UTC

Douglas, we can create an empty /etc/multipath/wwids during the build process and persist it when we install rhevh, then we hopefully catch all writes to this file.

Comment 30 Douglas Schilling Landgraf 2015-04-24 00:24:02 UTC

Hey Fabian,

(In reply to Fabian Deutsch from comment #29)
> Douglas, we can create an empty /etc/multipath/wwids during the build
> process and persist it when we install rhevh, then we hopefully catch all
> writes to this file.

Thanks for hint. I have sent to gerrit a vdsm plugin hook patch which resolved the issue for new installations and for upgrades, I have tested both scenarios locally. If you prefer the schema to create a file, let me know in gerrit.

Comment 31 Fabian Deutsch 2015-04-24 04:58:07 UTC

The patch looks good, and yes build time creation is not necessary, as /etc is a tmpfs.

Comment 32 Fabian Deutsch 2015-04-24 14:32:32 UTC

Douglas, please prepare a 3.5 build w/ this patch which can be tested by QE, as this is a somewhat bigger change and we want to make sure that it doesn't break anything.

Topics to cover:
- this bug
- general single path iscsi/fc
- multipath iscsi/fc

Comment 36 Yaniv Lavi 2015-05-10 14:38:19 UTC

should this be on MODIFIED and cloned? Is see patches merged on both master and 3.5 branches?

Comment 38 Pavol Brilla 2015-05-18 10:55:01 UTC

I was not able to reproduce bug on 3.5, 
1. installed clean rhevm 3.5.1
2. in that installed clean host - 20150128(without patch)
3. attached iSCSI disks to rhevm, we dont have fc cards in your environment
4. created VM on those directly attached iSCSI disks                                                                                                
5. shutdown VM, put host to maintance and update it to different version of RHEVH which I was testing
6. after update /etc/multipath/wwids have all disks correctly regoznized and VM started without issue

Comment 39 Fabian Deutsch 2015-05-18 12:40:23 UTC

After re-reading this bug I wonder if we can fix it with persisting the wwids file.

IIUIC the problem is that LVM claims the raw device before multipath can claim it to create the mpathed device.
AFAIK lvm claims the devices very early, probably at some point after it went through the mpath udev rules.

And to fix this bug, we must make the correct wwids file available when mpath comes up the first time during boot.
This is _very_ early (maybe even in dracut?) during boot.
But persisting (in the RHEV-H sense) the file will not help to make that file available early enough during the boot process.

I see two basic approaches:
1. Regenerate initramfs to include the correct wwids file - but we always wanted to avoid regenerating the initramfs on RHEV-H, due to the way how we strip the kernel.
2. Do some scripting to manually do the workaround described in the inital description.

Ben, do you have any more thoughts?

The mpath.wwid= kernel argument is only used to specify the mpath device used for _booting_ RHEV-H.

Comment 40 Ben Marzinski 2015-05-18 18:45:08 UTC

(In reply to Fabian Deutsch from comment #39)
> After re-reading this bug I wonder if we can fix it with persisting the
> wwids file.
> 
> IIUIC the problem is that LVM claims the raw device before multipath can
> claim it to create the mpathed device.
> AFAIK lvm claims the devices very early, probably at some point after it
> went through the mpath udev rules.
> 
> And to fix this bug, we must make the correct wwids file available when
> mpath comes up the first time during boot.
> This is _very_ early (maybe even in dracut?) during boot.
> But persisting (in the RHEV-H sense) the file will not help to make that
> file available early enough during the boot process.
> 
> I see two basic approaches:
> 1. Regenerate initramfs to include the correct wwids file - but we always
> wanted to avoid regenerating the initramfs on RHEV-H, due to the way how we
> strip the kernel.
> 2. Do some scripting to manually do the workaround described in the inital
> description.
> 
> Ben, do you have any more thoughts?

If this needs to be handled in the initramfs, then you would either have to remake it, to pull the updated wwids file into there, or you would have to add all the necessary wwids with the mpath.wwid kernel argument. Otherwise, lvm will claim the device first, and you'll have to manually deactivate the lvm devices, so that multipath can properly grab the devices.  You could take a look at using /usr/sbin/blkdeactivate to make sure that you can deactivate arbitrarily stacked lvm devices.

> The mpath.wwid= kernel argument is only used to specify the mpath device
> used for _booting_ RHEV-H.

Comment 41 wanghui 2015-05-19 02:46:13 UTC

Virt QE can not reproduce this issue by the follow steps.

Test Version:
rhev-hypervisor6-6.5-20150115.0
rhev-hypervisor6-6.6-20150123.1
Red Hat Enterprise Virtualization Manager Version: 3.4.5-0.3.el6ev

Test steps:
1. Install rhev-hypervisor6-6.5-20150115.0
2. Register to rhevm3.4.5
3. Attache a single path FC lun as storage domain.
4. Create a VM with direct lun on another single path FC storage.
5. Shutdown VM and put host to maintance, then update it to rhev-hypervisor6-6.6-20150123.1.
6. After update, the /etc/multipath/wwids has all disks except the installation disk and VM starts without issue.

Thanks,
Hui Wang

Comment 58 Fabian Deutsch 2015-10-20 13:38:24 UTC

I'm moving this to modified, because patches which should solve this problem are merged, however I also added OtherQA, because we can not verify that this bug has been fixed or not, because we can not reproduce it.

Comment 59 Fabian Deutsch 2015-11-03 15:45:39 UTC

We can not reproduce this issue in-house.

If you can still reproduce this issue with RHEV-H for RHEV 3.6, please re-open this issue.