Bug 1424819

Summary: [downstream clone - 4.0.7] [Bug RHV 4.0.4] Intermittent direct lun vm failed to start with error "VolumeError: Bad volume specification".
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED ERRATA QA Contact: Lilach Zitnitski <lzitnits>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.0.4CC: bazulay, eheftman, gklein, gwatson, lsurette, mkalinin, mwest, nashok, nsoffer, ratamir, sherold, sraje, srevivo, tnisan, ycui, ykaul, ylavi, zkabelac
Target Milestone: ovirt-4.0.7Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, when a LUN was discovered by a host, LVM activated a logical volume on the LUN before the LUN could be mapped by multipath. As result, the multipath device was not available on the host, and virtual machines could not use the missing device. Now, LVM no longer activates logical volumes dynamically, and multipath can successfully map the LUN.
Story Points: ---
Clone Of: 1400528 Environment:
Last Closed: 2017-03-16 15:36:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1400528    
Bug Blocks:    

Comment 3 rhev-integ 2017-02-19 16:33:06 UTC
Please attach complete logs. 
Unclear if it's 3.6 or 4.0.4 -  please clarify.

(Originally by Yaniv Kaul)

Comment 6 rhev-integ 2017-02-19 16:33:26 UTC
its rhv-4.0.4. I'll attach the sosrpeort and other data shortly.

(Originally by Sachin Raje)

Comment 12 rhev-integ 2017-02-19 16:34:04 UTC
Looking into related bugs, mentioned earlier:
Some dirty LUNs are not usable in RHEV
https://bugzilla.redhat.com/show_bug.cgi?id=1253640

[Nimble Storage] multipath unable to add new path
https://bugzilla.redhat.com/show_bug.cgi?id=1309409

Specifically here:
https://bugzilla.redhat.com/show_bug.cgi?id=1253640#c15

I think maybe this is not a vdsm bug at all?

Nir, can you please check and move to platform, if needed?
This is an urgent customer bug.

(Originally by Marina Kalinin)

Comment 13 rhev-integ 2017-02-19 16:34:11 UTC
(In reply to Marina from comment #11)
> Looking into related bugs, mentioned earlier:
> Some dirty LUNs are not usable in RHEV
> https://bugzilla.redhat.com/show_bug.cgi?id=1253640
> 
> [Nimble Storage] multipath unable to add new path
> https://bugzilla.redhat.com/show_bug.cgi?id=1309409
> 
> Specifically here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1253640#c15
> 
> I think maybe this is not a vdsm bug at all?
> 
> Nir, can you please check and move to platform, if needed?

This was comment 3 .  I don't think it's our issue.

> This is an urgent customer bug.

(Originally by Yaniv Kaul)

Comment 14 rhev-integ 2017-02-19 16:34:18 UTC
I agree with Yaniv, if we don't see the device in the scsi layer, vdsm can do
nothing about it. Maybe we do not rescan scsi devices correctly?

I suggest to move this to platform.

(Originally by Nir Soffer)

Comment 17 rhev-integ 2017-02-19 16:34:37 UTC
Hi Zdenek,

Do you think an lvm filter with a 'white-list' is the best solution for this issue as well? (as you've suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1374545#c74)

Thanks!

(Originally by Daniel Erez)

Comment 18 rhev-integ 2017-02-19 16:34:44 UTC
We've also reached the conclusion that disabling (and stopping) lvmetad is a good idea and we'll implement it in 4.0.7 and 4.1. Can you try that?

(Originally by Yaniv Kaul)

Comment 19 rhev-integ 2017-02-19 16:34:51 UTC
Should be fixed in 4.1.1 by disabling lvmetad.

(Originally by Nir Soffer)

Comment 20 rhev-integ 2017-02-19 16:34:57 UTC
(In reply to Nir Soffer from comment #18)
> Should be fixed in 4.1.1 by disabling lvmetad.

Current target milestone (for this bug) is 4.0.7. If it's intended to be fixed there, need clone, backport, etc. Otherwise - set target milestone to 4.1.1.

(Originally by Yaniv Kaul)

Comment 23 rhev-integ 2017-02-19 16:35:17 UTC
(In reply to Yaniv Kaul from comment #19)
> (In reply to Nir Soffer from comment #18)
> > Should be fixed in 4.1.1 by disabling lvmetad.
> 
> Current target milestone (for this bug) is 4.0.7. If it's intended to be
> fixed there, need clone, backport, etc.

This fix is also available in 4.0.7, so we should be good.

I don't think we can verified since we don't know how to reproduce this issue, it
is caused by race between multipath and lvm when new device is discovered.

(Originally by Nir Soffer)

Comment 25 Lilach Zitnitski 2017-03-06 12:31:42 UTC
(In reply to rhev-integ from comment #23)
> (In reply to Yaniv Kaul from comment #19)
> > (In reply to Nir Soffer from comment #18)
> > > Should be fixed in 4.1.1 by disabling lvmetad.
> > 
> > Current target milestone (for this bug) is 4.0.7. If it's intended to be
> > fixed there, need clone, backport, etc.
> 
> This fix is also available in 4.0.7, so we should be good.
> 
> I don't think we can verified since we don't know how to reproduce this
> issue, it
> is caused by race between multipath and lvm when new device is discovered.
> 
> (Originally by Nir Soffer)

According to this comment - this bug is verified even though it can't be reproduced. 
moving to VERIFIED!

Comment 27 errata-xmlrpc 2017-03-16 15:36:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0544.html