Ben, can you take a look at this?
I looks like the logical volumes were already activated on one of the path devices, before multipath could get set up on it. In this case, when multipath tries to load the table with that device, device mapper sees that it's already in use.
So Nir, Ben, what's our next action item here? (or, more properly, do we have one?)
As long as the wwid for the device exists in /etc/multipath/wwids, then multipath should claim the device in it's udev rules and lvm shouldn't grab it. So the real question is why wasn't the wwid 3600601601772320026d466a7a744e4, in the wwids file. Multipath won't put it there until it sets itself up on the device correctly. However, the wwid can be added without multipath needing to have set itself up by running # multipath -a <device> or by adding mpath.wwid=<wwid> to the kernel commandline, or by directly editting /etc/multipath/wwids I'm not sure why the wwid wasn't in the file in this case, but it needs to be, to avoid this race.
(In reply to Ben Marzinski from comment #8) Ben, do you think this is related to this change in vdsm, not using "multipath -r" regularly, suggested in comment 1? https://gerrit.ovirt.org/27242 Can vdsm take any action to avoid this situation?
(In reply to Nir Soffer from comment #9) > (In reply to Ben Marzinski from comment #8) > Ben, do you think this is related to this change in vdsm, not using > "multipath -r" regularly, suggested in comment 1? > https://gerrit.ovirt.org/27242 > > Can vdsm take any action to avoid this situation? No. This is happening in response to the uevents when a device is added. By the time that the device is available for multipath -r to work on it, lvm will have already grabbed it or will do so shortly. The reason that the workaround in comment #1 works, is because the lvm devices were deactivated first. This whole issue comes down to the device not being in the wwids file, and so not recognized as a multipath device. If the device isn't recognized by multipath, then other systems can grab it (such as LVM or MD). The only real solution is to make sure that the device wwid is in /etc/multipath/wwids. I should note that if the devices don't have any LVM or MD metadata on them, then nothing will be racing with multipath to build a virtual device on top of them. If there is a filesystem on them, then there still could be a race where the filesystem gets mounted before multipath claims the devices, but if you are starting with blank devices, you will never see this issue, since there is nothing for multipath to race with.
(In reply to Ben Marzinski from comment #10) Based on Ben reply, this is not vdsm bug, and should be handled by lower level component. Ben, would you like to take this bug, or recommend the correct component?
Douglas, we can create an empty /etc/multipath/wwids during the build process and persist it when we install rhevh, then we hopefully catch all writes to this file.
Hey Fabian, (In reply to Fabian Deutsch from comment #29) > Douglas, we can create an empty /etc/multipath/wwids during the build > process and persist it when we install rhevh, then we hopefully catch all > writes to this file. Thanks for hint. I have sent to gerrit a vdsm plugin hook patch which resolved the issue for new installations and for upgrades, I have tested both scenarios locally. If you prefer the schema to create a file, let me know in gerrit.
The patch looks good, and yes build time creation is not necessary, as /etc is a tmpfs.
Douglas, please prepare a 3.5 build w/ this patch which can be tested by QE, as this is a somewhat bigger change and we want to make sure that it doesn't break anything. Topics to cover: - this bug - general single path iscsi/fc - multipath iscsi/fc
should this be on MODIFIED and cloned? Is see patches merged on both master and 3.5 branches?
I was not able to reproduce bug on 3.5, 1. installed clean rhevm 3.5.1 2. in that installed clean host - 20150128(without patch) 3. attached iSCSI disks to rhevm, we dont have fc cards in your environment 4. created VM on those directly attached iSCSI disks 5. shutdown VM, put host to maintance and update it to different version of RHEVH which I was testing 6. after update /etc/multipath/wwids have all disks correctly regoznized and VM started without issue
After re-reading this bug I wonder if we can fix it with persisting the wwids file. IIUIC the problem is that LVM claims the raw device before multipath can claim it to create the mpathed device. AFAIK lvm claims the devices very early, probably at some point after it went through the mpath udev rules. And to fix this bug, we must make the correct wwids file available when mpath comes up the first time during boot. This is _very_ early (maybe even in dracut?) during boot. But persisting (in the RHEV-H sense) the file will not help to make that file available early enough during the boot process. I see two basic approaches: 1. Regenerate initramfs to include the correct wwids file - but we always wanted to avoid regenerating the initramfs on RHEV-H, due to the way how we strip the kernel. 2. Do some scripting to manually do the workaround described in the inital description. Ben, do you have any more thoughts? The mpath.wwid= kernel argument is only used to specify the mpath device used for _booting_ RHEV-H.
(In reply to Fabian Deutsch from comment #39) > After re-reading this bug I wonder if we can fix it with persisting the > wwids file. > > IIUIC the problem is that LVM claims the raw device before multipath can > claim it to create the mpathed device. > AFAIK lvm claims the devices very early, probably at some point after it > went through the mpath udev rules. > > And to fix this bug, we must make the correct wwids file available when > mpath comes up the first time during boot. > This is _very_ early (maybe even in dracut?) during boot. > But persisting (in the RHEV-H sense) the file will not help to make that > file available early enough during the boot process. > > I see two basic approaches: > 1. Regenerate initramfs to include the correct wwids file - but we always > wanted to avoid regenerating the initramfs on RHEV-H, due to the way how we > strip the kernel. > 2. Do some scripting to manually do the workaround described in the inital > description. > > Ben, do you have any more thoughts? If this needs to be handled in the initramfs, then you would either have to remake it, to pull the updated wwids file into there, or you would have to add all the necessary wwids with the mpath.wwid kernel argument. Otherwise, lvm will claim the device first, and you'll have to manually deactivate the lvm devices, so that multipath can properly grab the devices. You could take a look at using /usr/sbin/blkdeactivate to make sure that you can deactivate arbitrarily stacked lvm devices. > The mpath.wwid= kernel argument is only used to specify the mpath device > used for _booting_ RHEV-H.
Virt QE can not reproduce this issue by the follow steps. Test Version: rhev-hypervisor6-6.5-20150115.0 rhev-hypervisor6-6.6-20150123.1 Red Hat Enterprise Virtualization Manager Version: 3.4.5-0.3.el6ev Test steps: 1. Install rhev-hypervisor6-6.5-20150115.0 2. Register to rhevm3.4.5 3. Attache a single path FC lun as storage domain. 4. Create a VM with direct lun on another single path FC storage. 5. Shutdown VM and put host to maintance, then update it to rhev-hypervisor6-6.6-20150123.1. 6. After update, the /etc/multipath/wwids has all disks except the installation disk and VM starts without issue. Thanks, Hui Wang
I'm moving this to modified, because patches which should solve this problem are merged, however I also added OtherQA, because we can not verify that this bug has been fixed or not, because we can not reproduce it.
We can not reproduce this issue in-house. If you can still reproduce this issue with RHEV-H for RHEV 3.6, please re-open this issue.