Bug 516773

Summary:	vgextend does not take blocking lock and fails if run concurrently with lvextend
Product:	Red Hat Enterprise Linux 5	Reporter:	Ayal Baron <abaron>
Component:	lvm2	Assignee:	Milan Broz <mbroz>
Status:	CLOSED ERRATA	QA Contact:	Cluster QE <mspqa-list>
Severity:	high	Docs Contact:
Priority:	high
Version:	5.4	CC:	acathrow, agk, cmarthal, dwysocha, edamato, heinzm, iheim, jbrassow, mbroz, prockai, pvrabec, rluxenbe
Target Milestone:	rc	Keywords:	ZStream
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-03-30 09:01:23 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	523440

Description Ayal Baron 2009-08-11 13:38:46 UTC

trying to extend vg (add a pv) fails when there are outstanding lvextend request instead of blocking on the vg lock (as do other lvm command inc. lvextend).

In our setup lvextend is run automatically whenever a virtual machine runs low on space.  The VMs run on many different hosts and all lvextend requests are sent to a single host.
If the vg has no free space left then many lvextend commands are continuously spawn (depending on the number of running VMs with low disk space in the entire system).
A user wishing to extend the VG will fail almost every time because there will be at least one outstanding lvextend command.

The correct behavior is to block on the lock as do the rest of the LVM commands.

Can be easily reproduced - spawn many lvextend commands when there is no free space on VG and try to extend VG.

Comment 4 Alasdair Kergon 2009-08-11 16:33:27 UTC

As far as LVM is concerned, vgextend is behaving as designed.   We have never
guaranteed that any commands that modify the consitution (ie PV list) of
VGs will succeed if other processes are run in parallel.  All we
guarantee is integrity, not success - running parallel processes will
never lead to corruption.  Commands that can't get locks should simply
be retried - this is our mechanism for avoiding deadlocks.  We do already intend to enhance the tools so that the retrying happens internally, and some preparatory changes are already in place, but until now, there has been no reason to give that work any particular priority.  I have never seen a report of a problem due to this before, even to the extent that I don't think I've needed to document it anywhere until now!

Comment 7 Alasdair Kergon 2009-08-11 16:57:57 UTC

Tools like lvextend are optimised on the assumption that they will normally succeed.  But here it is being used in a mode where it is commonly expected to fail and that gives suboptimal performance.  (It uses an expensive write lock, but if you expected it normally to fail, you'd use a cheap read lock instead, and only get the write lock later.)

Comment 8 Alasdair Kergon 2009-08-11 17:23:25 UTC

"The correct behavior is to block on the lock as do the rest of the LVM
commands."

There are two classes of LVM commands, ones which need just one lock, which block like you describe, and ones that need multiple locks at once, which behave as I described in comment #4.

The sequence in which multiple locks are obtained could be changed, but that means fixing one scenario (single VG) by breaking another (multiple independent VGs).


None of the ideas proposed so far (here or elsewhere) represent complete solutions.

Comment 9 Alasdair Kergon 2009-08-11 17:25:40 UTC

Please also provide the actual command lines used for the vgextend and the lvextends.

Since they are launched in parallel, presumably they use absolute sizes not relative sizes?

Comment 18 Yaniv Kaul 2009-09-14 15:01:51 UTC

We have not seen the past issues with this LVM on the Westford cluster for ~2 weeks now, so I guess we can say this is fixed.

Comment 19 Milan Broz 2009-09-15 12:38:00 UTC

Patch in lvm2-2_02_46-9_el5.

Comment 26 Corey Marthaler 2010-02-15 22:50:14 UTC

Marking this bug verified based on comment #18.

Comment 27 Ayal Baron 2010-02-16 11:16:58 UTC

Actually we fixed this in vdsm as well (comment #13) so Yaniv's comment does not verify this bug.

Comment 29 errata-xmlrpc 2010-03-30 09:01:23 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0298.html