Please attach complete logs. Unclear if it's 3.6 or 4.0.4 - please clarify.
its rhv-4.0.4. I'll attach the sosrpeort and other data shortly.
Looking into related bugs, mentioned earlier: Some dirty LUNs are not usable in RHEV https://bugzilla.redhat.com/show_bug.cgi?id=1253640 [Nimble Storage] multipath unable to add new path https://bugzilla.redhat.com/show_bug.cgi?id=1309409 Specifically here: https://bugzilla.redhat.com/show_bug.cgi?id=1253640#c15 I think maybe this is not a vdsm bug at all? Nir, can you please check and move to platform, if needed? This is an urgent customer bug.
(In reply to Marina from comment #11) > Looking into related bugs, mentioned earlier: > Some dirty LUNs are not usable in RHEV > https://bugzilla.redhat.com/show_bug.cgi?id=1253640 > > [Nimble Storage] multipath unable to add new path > https://bugzilla.redhat.com/show_bug.cgi?id=1309409 > > Specifically here: > https://bugzilla.redhat.com/show_bug.cgi?id=1253640#c15 > > I think maybe this is not a vdsm bug at all? > > Nir, can you please check and move to platform, if needed? This was comment 3 . I don't think it's our issue. > This is an urgent customer bug.
I agree with Yaniv, if we don't see the device in the scsi layer, vdsm can do nothing about it. Maybe we do not rescan scsi devices correctly? I suggest to move this to platform.
Hi Zdenek, Do you think an lvm filter with a 'white-list' is the best solution for this issue as well? (as you've suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1374545#c74) Thanks!
We've also reached the conclusion that disabling (and stopping) lvmetad is a good idea and we'll implement it in 4.0.7 and 4.1. Can you try that?
Should be fixed in 4.1.1 by disabling lvmetad.
(In reply to Nir Soffer from comment #18) > Should be fixed in 4.1.1 by disabling lvmetad. Current target milestone (for this bug) is 4.0.7. If it's intended to be fixed there, need clone, backport, etc. Otherwise - set target milestone to 4.1.1.
(In reply to Yaniv Kaul from comment #19) > (In reply to Nir Soffer from comment #18) > > Should be fixed in 4.1.1 by disabling lvmetad. > > Current target milestone (for this bug) is 4.0.7. If it's intended to be > fixed there, need clone, backport, etc. This fix is also available in 4.0.7, so we should be good. I don't think we can verified since we don't know how to reproduce this issue, it is caused by race between multipath and lvm when new device is discovered.
(In reply to Nir Soffer from comment #22) > (In reply to Yaniv Kaul from comment #19) > > (In reply to Nir Soffer from comment #18) > > > Should be fixed in 4.1.1 by disabling lvmetad. > > > > Current target milestone (for this bug) is 4.0.7. If it's intended to be > > fixed there, need clone, backport, etc. > > This fix is also available in 4.0.7, so we should be good. > > I don't think we can verified since we don't know how to reproduce this > issue, it > is caused by race between multipath and lvm when new device is discovered. Can I verify this bug using the steps to reproduce from the first comment? or there are different steps to make sure this bug is fixed?
(In reply to Lilach Zitnitski from comment #24) > > I don't think we can verified since we don't know how to reproduce this > > issue, it > > is caused by race between multipath and lvm when new device is discovered. > > Can I verify this bug using the steps to reproduce from the first comment? > or there are different steps to make sure this bug is fixed? To verify this you need to reproduce this wit an older version, and show that it works with new version. Reproducing this should be very hard, since the issue is a race between lvm and multipath and the chance to get this race is very low. You can try like this: 1. Add a new LUN on storage, and expose it to 2 hosts 2. On one host attach the LUN to the vm as direct LUN 3. Inside the guest, create a pv from the lun 4. Inside the guest, create a vg with that pv 5. Inside the guest, create a lv on the new pv 6. Try to migrate the vm to another host 7. If you are lucky, multipath will fail to grab the LUN because LVM will grab the LUN before multipath, activating the lv you created inside the guest in step 5 I don't think you will reproduce this, since the LUN will probably be discovered on the second host before you created the lv and multipath will grab it before LVM. If you are lucky and could reproduce this, you will have to repeat the entire setup from scratch using 4.1. Even if you could reproduce it, proving that the fix works is very hard since the race between LVM and multipath is hard to reproduce.
There is doc text in the downstream clone, use if it needed.
According to Nir's comment (comment #25), and because I didn't manage to reproduce this bug, moving to CLOSED.
Moving to VERIFIED without reproducing this bug (this can't be reproduced comment#25)