Bug 1424819 - [downstream clone - 4.0.7] [Bug RHV 4.0.4] Intermittent direct lun vm failed to start with error "VolumeError: Bad volume specification".
Summary: [downstream clone - 4.0.7] [Bug RHV 4.0.4] Intermittent direct lun vm failed ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.0.4
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ovirt-4.0.7
: ---
Assignee: Nir Soffer
QA Contact: Lilach Zitnitski
URL:
Whiteboard:
Depends On: 1400528
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-19 16:32 UTC by rhev-integ
Modified: 2020-02-14 18:32 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, when a LUN was discovered by a host, LVM activated a logical volume on the LUN before the LUN could be mapped by multipath. As result, the multipath device was not available on the host, and virtual machines could not use the missing device. Now, LVM no longer activates logical volumes dynamically, and multipath can successfully map the LUN.
Clone Of: 1400528
Environment:
Last Closed: 2017-03-16 15:36:45 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0544 0 normal SHIPPED_LIVE vdsm 4.0.7 bug fix and enhancement update 2017-03-16 19:25:18 UTC

Comment 3 rhev-integ 2017-02-19 16:33:06 UTC
Please attach complete logs. 
Unclear if it's 3.6 or 4.0.4 -  please clarify.

(Originally by Yaniv Kaul)

Comment 6 rhev-integ 2017-02-19 16:33:26 UTC
its rhv-4.0.4. I'll attach the sosrpeort and other data shortly.

(Originally by Sachin Raje)

Comment 12 rhev-integ 2017-02-19 16:34:04 UTC
Looking into related bugs, mentioned earlier:
Some dirty LUNs are not usable in RHEV
https://bugzilla.redhat.com/show_bug.cgi?id=1253640

[Nimble Storage] multipath unable to add new path
https://bugzilla.redhat.com/show_bug.cgi?id=1309409

Specifically here:
https://bugzilla.redhat.com/show_bug.cgi?id=1253640#c15

I think maybe this is not a vdsm bug at all?

Nir, can you please check and move to platform, if needed?
This is an urgent customer bug.

(Originally by Marina Kalinin)

Comment 13 rhev-integ 2017-02-19 16:34:11 UTC
(In reply to Marina from comment #11)
> Looking into related bugs, mentioned earlier:
> Some dirty LUNs are not usable in RHEV
> https://bugzilla.redhat.com/show_bug.cgi?id=1253640
> 
> [Nimble Storage] multipath unable to add new path
> https://bugzilla.redhat.com/show_bug.cgi?id=1309409
> 
> Specifically here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1253640#c15
> 
> I think maybe this is not a vdsm bug at all?
> 
> Nir, can you please check and move to platform, if needed?

This was comment 3 .  I don't think it's our issue.

> This is an urgent customer bug.

(Originally by Yaniv Kaul)

Comment 14 rhev-integ 2017-02-19 16:34:18 UTC
I agree with Yaniv, if we don't see the device in the scsi layer, vdsm can do
nothing about it. Maybe we do not rescan scsi devices correctly?

I suggest to move this to platform.

(Originally by Nir Soffer)

Comment 17 rhev-integ 2017-02-19 16:34:37 UTC
Hi Zdenek,

Do you think an lvm filter with a 'white-list' is the best solution for this issue as well? (as you've suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1374545#c74)

Thanks!

(Originally by Daniel Erez)

Comment 18 rhev-integ 2017-02-19 16:34:44 UTC
We've also reached the conclusion that disabling (and stopping) lvmetad is a good idea and we'll implement it in 4.0.7 and 4.1. Can you try that?

(Originally by Yaniv Kaul)

Comment 19 rhev-integ 2017-02-19 16:34:51 UTC
Should be fixed in 4.1.1 by disabling lvmetad.

(Originally by Nir Soffer)

Comment 20 rhev-integ 2017-02-19 16:34:57 UTC
(In reply to Nir Soffer from comment #18)
> Should be fixed in 4.1.1 by disabling lvmetad.

Current target milestone (for this bug) is 4.0.7. If it's intended to be fixed there, need clone, backport, etc. Otherwise - set target milestone to 4.1.1.

(Originally by Yaniv Kaul)

Comment 23 rhev-integ 2017-02-19 16:35:17 UTC
(In reply to Yaniv Kaul from comment #19)
> (In reply to Nir Soffer from comment #18)
> > Should be fixed in 4.1.1 by disabling lvmetad.
> 
> Current target milestone (for this bug) is 4.0.7. If it's intended to be
> fixed there, need clone, backport, etc.

This fix is also available in 4.0.7, so we should be good.

I don't think we can verified since we don't know how to reproduce this issue, it
is caused by race between multipath and lvm when new device is discovered.

(Originally by Nir Soffer)

Comment 25 Lilach Zitnitski 2017-03-06 12:31:42 UTC
(In reply to rhev-integ from comment #23)
> (In reply to Yaniv Kaul from comment #19)
> > (In reply to Nir Soffer from comment #18)
> > > Should be fixed in 4.1.1 by disabling lvmetad.
> > 
> > Current target milestone (for this bug) is 4.0.7. If it's intended to be
> > fixed there, need clone, backport, etc.
> 
> This fix is also available in 4.0.7, so we should be good.
> 
> I don't think we can verified since we don't know how to reproduce this
> issue, it
> is caused by race between multipath and lvm when new device is discovered.
> 
> (Originally by Nir Soffer)

According to this comment - this bug is verified even though it can't be reproduced. 
moving to VERIFIED!

Comment 27 errata-xmlrpc 2017-03-16 15:36:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0544.html


Note You need to log in before you can comment on or make changes to this bug.